Lunr.js - Documentation

What is Lunr.js?

Lunr.js is a powerful, lightweight JavaScript library for building a search engine within your web application. It allows you to index your own data and provide a fast and relevant search experience to your users without needing a separate backend search server. Lunr.js is designed to be easily integrated into existing projects and provides a simple, yet flexible API for building sophisticated search functionality. It’s particularly useful for client-side search where you need to search data already loaded in the browser, or for applications that prioritize a fast, low-latency search experience.

Key Features and Benefits

Client-side Search: All indexing and searching happens within the user’s browser, providing a fast and responsive search experience.
Lightweight and Fast: Lunr.js is designed for performance and has a small footprint, minimizing impact on your application’s load time.
Simple API: The library offers a clean and intuitive API, making it easy to integrate and use.
Flexible Indexing: You can index various data types and customize the indexing process to fit your specific needs.
Stemming and Stop Word Removal: Lunr.js supports stemming (reducing words to their root form) and stop word removal (ignoring common words like “the” and “a”) to improve search accuracy.
Customizable Similarity Algorithms: You have control over how search results are ranked, allowing for fine-tuning of relevance.
No external dependencies: Lunr.js is self-contained and requires no other libraries.

Setting up Lunr.js: Installation and Setup

The easiest way to incorporate Lunr.js into your project is via a CDN or by using a package manager like npm or yarn.

Using a CDN:

Simply include the Lunr.js script in your HTML file:

<script src="https://cdn.jsdelivr.net/npm/lunr@2.3.9/lunr.min.js"></script>

Using npm:

If you’re using npm, install Lunr.js with:

npm install lunr

Then, you can import it into your JavaScript code:

import lunr from 'lunr';

Using yarn:

If you are using yarn, install Lunr.js with:

yarn add lunr

Then, you can import it into your JavaScript code:

import lunr from 'lunr';

Basic Usage Example

This example demonstrates how to index some data and perform a simple search:

import lunr from 'lunr';

const idx = lunr(function () {
  this.ref('id');
  this.field('title');
  this.field('body');

  this.add({
    id: 1,
    title: 'Example Document 1',
    body: 'This is the body of the first example document.'
  });

  this.add({
    id: 2,
    title: 'Another Example Document',
    body: 'This is the second example document. It is quite different.'
  });
});

const results = idx.search('example');

console.log(results); // Output: Array of search results with relevant information (e.g., score and ref)

// Accessing the document from the results:
const document = results[0].ref;
console.log(idx.document(document)); // Output: The document object with id: document

This code first creates a Lunr index, defining which fields to index (title and body). Then, it adds two documents to the index. Finally, it performs a search for the term “example” and logs the results. Remember to replace "https://cdn.jsdelivr.net/npm/lunr@2.3.9/lunr.min.js" with the correct CDN path if the version changes.

Core Concepts

Indexes and Documents

Lunr.js organizes data into indexes and documents. An index is essentially a searchable database built from your data. Documents are the individual data items you add to the index. Each document is a JavaScript object containing fields of data you want to be searchable. When you build an index, Lunr processes the data in each document’s fields, creating an inverted index structure optimized for fast search. This structure maps words (tokens) to the documents containing those words, allowing Lunr to quickly retrieve relevant documents based on a search query. The ref field is particularly important; it acts as a unique identifier for each document within the index and is used to retrieve the original document object after a search.

Fields and Boosting

Documents consist of multiple fields. Each field represents a specific attribute of a document (e.g., title, body, author). You specify which fields to include in your index when you create it. Fields can be boosted to increase their relative importance in search results. A higher boost value will give documents with matching terms in that field a higher ranking in search results. This allows you to prioritize certain fields (like the title) over others (like the body) when determining relevance. Boosting is specified when defining the index and is a numeric value (higher is more important).

Tokenization and Stemming

Before indexing, Lunr tokenizes the text in each field. Tokenization is the process of breaking down text into individual words or terms (tokens). This is crucial for building the inverted index. Lunr also offers stemming, which reduces words to their root form (e.g., “running” becomes “run”). Stemming improves recall by matching words with different suffixes, but may reduce precision if it groups dissimilar words together. Lunr’s built-in stemming algorithm can be adjusted or replaced with custom logic to cater to specific language needs. The tokenization process itself can also be customized through options provided within the index builder.

Search Queries and Query Parsing

Users interact with the index through search queries. These queries are strings that Lunr parses and uses to find relevant documents. Lunr supports various query syntax features, including:

Keyword searches: Simple searches based on individual keywords. Results are returned if any of the keywords are present in indexed fields.
Phrase searches: Searching for exact phrases by enclosing the phrase in quotes (“exact phrase”).
Boolean operators: Using AND, OR, and NOT operators to combine search terms and refine results. (e.g., "term1" AND "term2", "term1" OR "term2", "term1" NOT "term2").
Prefix searches: Searching for words that begin with a certain prefix (e.g., comp* to find “computer”, “complete”, etc.). (Support for prefixes depends on the tokenizer used).

Lunr’s query parser handles these operators and constructs an efficient query plan to fetch the most relevant documents quickly and accurately. The results returned include information such as the document’s ref and a relevance score reflecting how well the document matches the search query.

Building an Index

Creating a New Index

A new Lunr index is created using a builder function. This function allows you to define the fields to be indexed and their properties. The basic structure is as follows:

import lunr from 'lunr';

const idx = lunr(function () {
  // Configuration of the index goes here.
});

Inside the function passed to lunr(), you use methods like this.ref(), this.field(), etc., to configure the index.

Adding Documents to the Index

Documents are added to the index using the this.add() method within the index builder function. Each document is a JavaScript object with fields corresponding to those defined during index creation.

this.add({
  id: 1, //this is usually the ref field
  title: 'Example Document',
  body: 'This is the body text.'
});

You can add multiple documents by calling this.add() multiple times. It’s generally more efficient to add all documents at once, rather than adding them individually in a loop, especially for large datasets.

Defining Fields and their Properties

Fields are defined using the this.field() method. This method takes the field name as an argument. Additional options can be passed to customize the field’s behavior. For example:

this.field('title', {boost: 10}); // Boosts the 'title' field, giving it more weight in search results.
this.field('body'); // No special settings for the 'body' field.

Field Types and Data Handling

Lunr primarily works with text fields. While you can technically add other data types to your documents, Lunr will only index text within those fields. If you have a numeric field, for example, you would need to convert it into a string representation before indexing. Lunr’s core functionality is focused on full-text search and is not designed for direct numerical or other data type comparison.

Using Ref, Boost, and other field options

ref: This field is crucial. It’s a unique identifier for each document. It’s how you retrieve the original document object after a search. This field must be defined using this.ref('fieldName').
boost: As discussed previously, this option allows you to assign weights to fields, affecting their importance in search relevance ranking. Higher boost values give that field more weight.
Other options: Depending on the chosen tokenizer, additional options might be available for specific fields to customize tokenization behavior within that field. Check the Lunr.js documentation for the most up-to-date information on available options.

Index Serialization and Deserialization

For improved performance and to avoid rebuilding the index every time, Lunr allows you to serialize the index to JSON and deserialize it later. Serialization converts the index to a JSON representation that can be saved to storage (local storage, server, etc.). Deserialization reconstructs the index from the saved JSON data.

// Serialization
const serializedIndex = idx.toJSON();
localStorage.setItem('lunrIndex', JSON.stringify(serializedIndex));


// Deserialization
const serializedIndex = JSON.parse(localStorage.getItem('lunrIndex'));
const idx = lunr.Index.load(serializedIndex);

This avoids the time-consuming process of re-indexing your data, improving the load time and responsiveness of your application. Remember that the version of Lunr used for serialization must match the version used for deserialization.

Searching with Lunr.js

Performing Simple Searches

The simplest way to search an index is using the search() method. This method takes a query string as an argument and returns an array of search results. Each result is an object containing the ref (the identifier of the matched document) and a score representing the relevance of the match.

const results = idx.search('search term');
console.log(results);

This will return an array of objects, each with a ref property indicating the ID of the matching document and a score reflecting how well it matches the search term.

Using Boolean Operators

Lunr supports Boolean operators (AND, OR, NOT) to combine search terms and refine results:

AND: Matches documents containing all specified terms. term1 AND term2
OR: Matches documents containing at least one of the specified terms. term1 OR term2
NOT: Excludes documents containing a specified term. term1 NOT term2

const results = idx.search('term1 AND term2'); // Matches documents containing both 'term1' and 'term2'.
const results = idx.search('term1 OR term2');  // Matches documents containing either 'term1' or 'term2' or both.
const results = idx.search('term1 NOT term2'); // Matches documents containing 'term1' but not 'term2'.

Note that the boolean operators must be uppercase.

Phrase Searching

To search for exact phrases, enclose the phrase in double quotes:

const results = idx.search('"exact phrase"');

This will only return documents containing the exact phrase “exact phrase”.

Wildcard Searching

Lunr supports prefix searches using the asterisk (*) wildcard character. The asterisk must be at the end of the search term.

const results = idx.search('comp*'); // Matches "complete", "computer", etc.

Note that wildcard support depends on the tokenizer used. The default English tokenizer supports prefix searches.

Fuzzy Searching

Lunr does not directly support fuzzy searching (allowing for minor spelling errors). If you need fuzzy search capabilities, you’ll need to implement it yourself or use a separate library in conjunction with Lunr. One approach would be to use a Levenshtein distance algorithm to find documents with similar terms to the query before searching with Lunr, although this impacts performance.

Advanced Query Syntax

Lunr’s query parsing is fairly robust, but very complex queries might not be supported. It is best practice to keep queries relatively simple for optimal performance. Consult the Lunr documentation for details on the full capabilities of the query parser.

Handling Search Results

The search() method returns an array of objects. Each object has a ref property (the document identifier) and a score indicating the relevance. Use the ref to retrieve the original document object from the index using idx.document(ref).

const results = idx.search('search term');
results.forEach(result => {
  const doc = idx.document(result.ref);
  console.log(doc); // Access the original document object.
});

Sorting and Ranking Results

Lunr automatically ranks results based on relevance. The score property reflects this ranking. The results are already sorted by relevance score (highest first) by default. You cannot directly sort results based on other criteria within Lunr; you would need to handle this sorting separately after retrieving the results.

Pagination and Limiting Results

To limit the number of results returned, you can use the slice() method on the results array:

const results = idx.search('search term').slice(0, 10); // Returns the top 10 results.

Pagination is handled by using slice() with different start and end indices to fetch specific result pages. For example, to get results 11-20, you’d use .slice(10, 20). This is done client-side and is not a feature of the Lunr index itself.

Working with Data

Data Formats Supported

Lunr.js accepts data in the form of JavaScript objects. Each object represents a document, and the object’s properties represent the fields of the document. There’s no specific, rigid format beyond this; you can use any structure that suits your application, as long as the data is organized into JavaScript objects with fields you’ve defined in your index. For example, you can have nested objects within your document objects, but Lunr will only index the text content of the fields specified during index creation.

Preprocessing and Cleaning Data

The quality of your data directly impacts the effectiveness of your search engine. Preprocessing your data before indexing is highly recommended:

Cleaning Text: Remove irrelevant characters, HTML tags, and excessive whitespace from your text fields. Consider using regular expressions for this task.
Lowercasing: Convert text to lowercase to ensure consistent matching regardless of capitalization. Lunr itself performs lowercasing, but it’s good practice to ensure consistency in your source data.
Stemming (Optional): Lunr provides stemming, but you might need to perform additional stemming or lemmatization (reducing words to their dictionary form) depending on your requirements and language.
Stop Word Removal (Optional): While Lunr handles stop words (common words like “the”, “a”, “is”), you might want to remove them beforehand for more efficient indexing, particularly with very large datasets.

Proper preprocessing ensures that your index is efficient and produces relevant search results.

Handling Large Datasets

Indexing large datasets can be computationally expensive and impact performance. Here are some strategies for handling them:

Batching: Instead of adding documents one by one, add them in batches using multiple this.add() calls within the index builder. This can significantly improve the speed of index creation.
Asynchronous Indexing: For very large datasets, consider using asynchronous operations to avoid blocking the main thread. This approach might require more complex code, but it keeps your application responsive while indexing is underway.
Chunking: Break down the data into smaller chunks and index them individually. You could then create a master index that references the individual chunks. This is useful for very large datasets that cannot fit into memory at once.
Index Serialization: As mentioned earlier, serialize the index to JSON and store it for later use. This removes the need to rebuild the index each time the application starts.
Data Filtering: Before indexing, filter out irrelevant or duplicate data to reduce the size of the index and improve search speed.

Optimizing Search Performance

Several strategies can improve search performance:

Field Selection: Only index the fields truly necessary for searching. Including unnecessary fields increases index size and slows down searches.
Boosting: Use boosting effectively to prioritize important fields. This helps Lunr focus on more relevant parts of your documents.
Stop Word Removal: Removing stop words before indexing can reduce index size and improve search speed.
Efficient Data Structures: Use efficient data structures (like those used internally by Lunr) for handling your data if you are pre-processing or creating custom indexing logic.
Index Serialization and Deserialization: Use serialization to avoid repeated index creation, as discussed previously.
Regular Maintenance: For dynamic content, implement a strategy for regularly updating and optimizing your index. This might include periodic rebuilding or incremental updates to your index.

Careful consideration of data handling and index optimization techniques is critical for maintaining performance, particularly as your dataset grows.

Advanced Techniques

Custom Tokenizers and Stemmers

Lunr.js provides a degree of flexibility in how it processes text. You can replace the default tokenizer and stemmer with your custom implementations to tailor the library to specific languages or requirements. This allows for better handling of complex words, compound words, or specific linguistic rules not covered by the default English tokenizer and stemmer. You would create classes that implement the appropriate interfaces and then pass these to the index builder. Refer to the Lunr.js documentation for detailed instructions on implementing and integrating custom tokenizers and stemmers.

Extending Lunr.js Functionality

Lunr.js is designed to be extensible. You can add new functionality by creating custom plugins or extending its core classes. This allows you to integrate additional features, such as support for different data types, custom scoring algorithms, or integration with external services. This usually involves creating a new module that interacts with the Lunr.js API and adds the desired functionality. More complex extensions might require deeper understanding of Lunr’s internal workings.

Integration with Other Libraries

Lunr.js can be integrated with other JavaScript libraries to enhance its capabilities or integrate it into larger applications. For instance, you might combine it with a UI library (like React, Vue, or Angular) to create a user-friendly search interface or integrate it with a framework for handling large datasets and data persistence. The possibilities here depend heavily on your project’s requirements. Careful consideration must be given to managing potential conflicts between libraries and ensure that data is passed correctly between them.

Building a Search UI

While Lunr.js handles the core search logic, you’ll need a separate user interface to display search results and allow users to interact with the search functionality. This UI could range from a simple input field and a list of results to a more complex interface with filters, facets, and pagination. Many JavaScript UI frameworks and libraries are well-suited to building this user interface; the best choice will depend on your project’s existing technology stack and design requirements. Effective UI design is key to maximizing the user experience of your search functionality.

Troubleshooting and Common Issues

No Results: Ensure that your index is correctly built and that your search queries match the terms in your indexed data. Check for typos in your queries and verify that the fields you’re searching are correctly indexed. Also check for casing inconsistencies (though Lunr lowercases).
Poor Relevance: If search results are not relevant, review your field boosting, consider adding more relevant fields, or refining your preprocessing steps to clean and normalize data effectively.
Performance Issues: For large datasets, consider using techniques like index serialization, batching, or asynchronous indexing as mentioned previously. Profile your code to identify performance bottlenecks.
Unexpected Behavior: Consult the Lunr.js documentation and GitHub issues for solutions to specific problems. If you can’t find a solution, create a new issue detailing your problem. Include a code snippet showing how you’re using Lunr and any error messages you receive.
Version Compatibility: Always check for compatibility between the Lunr version you use and any other libraries or dependencies in your project. Using incompatible versions can lead to unpredictable behavior.

Remember to always consult the official Lunr.js documentation for the most up-to-date information and detailed explanations of API methods and advanced usage patterns.

API Reference

This section provides a concise overview of the Lunr.js API. For complete and up-to-date details, always refer to the official Lunr.js documentation. The API is subject to change between versions.

Lunr Object

The lunr object is the main entry point to the library. It’s primarily used to create a new index using the builder function:

import lunr from 'lunr'; // Assuming you are using ES modules

const idx = lunr(function () { /* ... index builder function ... */ });

The lunr object also contains other utility functions and static methods, including lunr.Index.load() for deserializing an index from JSON.

Builder Object

The builder object is an instance created by calling the lunr() function. It’s used to configure and build a new index. Its methods are only accessible within the function passed to lunr():

this.ref(fieldName): Specifies the field to use as a unique identifier for each document. This is required.
this.field(fieldName, options): Defines an indexed field. options can include boost to specify the field’s weight in search ranking.
this.add(document): Adds a document to the index.
other methods which might be relevant depending on the chosen tokenizer.

Once all fields are defined and documents are added, the builder implicitly creates the index when it goes out of scope.

Index Object

The Index object represents the built search index. It’s created implicitly by the builder and is accessible after the builder function has executed. Key methods include:

search(query): Performs a search and returns an array of results.
toJSON(): Serializes the index into a JSON representation.
document(ref): Retrieves a specific document from the index using its reference (ref).

The Index object also has properties providing information about the index itself (though accessing internal properties is generally discouraged and may change).

Query Object

Lunr’s query parser internally creates a Query object to represent a search query. You do not directly interact with Query objects; they are handled internally by the search() method. Understanding how queries are parsed is helpful for optimizing search strategies but isn’t typically part of direct API interaction.

Methods and Properties

The specific methods and properties available depend on the object (Lunr, Builder, Index). This section only gives a high-level overview. Consult the official Lunr.js documentation for a complete and detailed list of all available methods and properties along with their respective parameters, return types, and usage examples for each version of the library. The documentation is crucial due to the potential for API changes and for understanding more advanced usage such as custom tokenizers and stemming.

Examples and Use Cases

This section provides examples and use cases to illustrate how to apply Lunr.js in different scenarios. Remember that these are simplified examples; real-world implementations might require more sophisticated error handling, data preprocessing, and UI integration. Always refer to the official Lunr.js documentation for the most up-to-date information and detailed explanations.

Simple Search Application Example

This example demonstrates a basic search application with a text input field and a list to display results:

<!DOCTYPE html>
<html>
<head>
<title>Simple Lunr.js Search</title>
</head>
<body>
  <input type="text" id="searchInput" onkeyup="search()">
  <ul id="results"></ul>

  <script src="https://cdn.jsdelivr.net/npm/lunr@2.3.9/lunr.min.js"></script>
  <script>
    const index = lunr(function () {
      this.ref('id');
      this.field('title');
      this.field('content');

      // Add your data here...
      this.add({ id: 1, title: 'Document 1', content: 'This is the first document.' });
      this.add({ id: 2, title: 'Document 2', content: 'This is the second document.' });
      // ... more documents ...
    });

    function search() {
      const query = document.getElementById('searchInput').value;
      const results = index.search(query);
      const resultsList = document.getElementById('results');
      resultsList.innerHTML = ''; // Clear previous results

      results.forEach(result => {
        const doc = index.document(result.ref);
        const li = document.createElement('li');
        li.textContent = doc.title;
        resultsList.appendChild(li);
      });
    }
  </script>
</body>
</html>

Implementing Autocomplete

To add autocomplete, you’d need to integrate Lunr.js with a JavaScript autocomplete library. You’d use Lunr to index your data, then use the autocomplete library to suggest matches as the user types. The library would call index.search() with prefixes of the user’s input to suggest completions.

Building a Faceted Search

Faceted search allows users to filter results based on different attributes (facets). You’d need to index relevant facet fields and build UI elements (e.g., checkboxes or dropdown menus) to let users select facet values. After each selection, you’d filter the results retrieved from Lunr to show only documents matching the selected facets. This requires careful design of both your indexing schema and UI to ensure a user-friendly experience.

Integrating with a Frontend Framework (React, Vue, Angular)

Integration with frontend frameworks involves creating components that handle user input, interact with the Lunr index, and display results. The framework’s data binding capabilities would streamline the update process, and the component structure makes managing the UI elements easier. Specific implementation details vary depending on the framework; refer to that framework’s documentation and examples for best practices.

Real-world examples and case studies

Real-world applications of Lunr.js are numerous:

E-commerce product search: Indexing product information (name, description, category) for fast and relevant product searches.
Documentation search: Enabling quick searches within a website’s help or documentation section.
Internal knowledge bases: Building a searchable index of internal documents for employees.
Client-side filtering and searching within applications: Many applications use Lunr.js to provide advanced search capabilities within the browser.
Offline search functionality: Lunr’s client-side nature makes it ideal for applications needing offline search capabilities.

These examples showcase the versatility of Lunr.js across a range of applications where fast, relevant, client-side search is required. Remember to design the index structure and data handling appropriately to optimize performance for the specific requirements of your application.