Our search engine is designed to index and query documents across a wide variety of formats.
When a user performs a search, several techniques are combined to present the most relevant results based on the query.
đ Relevance Score Calculation
Each result is assigned a relevance score (or match score) by the engine, which reflects how closely the document matches the user's query.
This score is determined by multiple factors, including:
Textual Match: How well the search terms appear in the document.
Word Frequency: The more a keyword appearsâespecially in important fieldsâthe higher the score.
Field Weighting (Boosts): Some fields, such as the title, carry more weight than others (e.g., body content).
đŻ Prioritizing Search Fields
To improve relevance, our engine gives more importance to certain fields:
Title: Words found in the title can be boosted up to 10x more than those found in the content.
Content and Metadata: Other fields like content body or metadata (e.g., author name, email address) have lower weights (e.g., 1 to 3).
đ This means a document with the keyword in the title will be ranked higher than one where the keyword appears only in the body.
đ Word Frequency Impact
The number of times a keyword appears in a document also affects the score:
The more a word appearsâespecially in a highly weighted fieldâthe more it contributes to the relevance score.
Rare words in the overall corpus are also given more importance (a "rare terms" strategy).
All of these factors are used to calculate a global score that determines the ranking of results.
đ« Handling Stopwords
To prevent common, low-value words (e.g., "the", "and", "of") from skewing the relevance calculation, we apply stopword filters.
These words are:
Excluded from text analysis.
Not highlighted in results, allowing users to better focus on relevant terms.
đ§ Semantic Search & Language Processing
Outmind integrates several Natural Language Processing (NLP) techniques to improve query matching:
Stemming and Normalization:
Words are reduced to their root form. For example, âgĂ©nĂ©ralâ and âgĂ©nĂ©rauxâ are treated as the same root. This supports multilingual contexts, including French and other European languages.Synonyms and Variations:
Synonym filters help capture linguistic and spelling variations.Phrase and Proximity Matching:
In addition to standard keyword search, phrase queries are used to boost documents where search terms appear close to one another, increasing contextual relevance.
đ Freshness & Date-Based Ranking
Beyond text content, our algorithm also considers document freshness:
A decay function (specifically, a Gaussian decay) is applied to the date field.
This favors more recent documents in the rankingâespecially when their content is also relevant.
đ Summary
Score Calculation: Combines keyword match, frequency, and field importance.
Field Boosting: Title matches are much more heavily weighted than body content.
Frequency Matters: More occurrences = higher relevance, especially for rare terms.
Stopword Filtering: Common low-value words are excluded from scoring and highlighting.
Language Intelligence: Stemming and synonym handling enhance multilingual and variant matching.
Document Freshness: More recent content is prioritized when relevant.
This ranking system ensures that the most relevant, recent, and contextually accurate documents are displayed firstâmatching both the user's intent and the linguistic subtleties of their query.