Skip to main content

šŸ”¢ Data vectorization

M
Written by Maxime Renault
Updated over 3 weeks ago

Outmind relies on Ragie (third-party provider) to index these vectors and synchronize updates.

  • Data processed: text extracted from files + metadata.

  • Encryption: data encrypted in transit (TLS) and at rest (AES-256) on Ragie’s side.

  • Hosting: Ragie hosts its services in the United States.

  • Activation: vectorization is disabled by default and can be enabled source by source in Outmind.

  • Immediate effects: better understanding of imprecise queries, multi-document retrieval, more relevant excerpts.


🧩 What does vectorization do in Outmind?

Vectorization enables meaning-based (semantic) search rather than strict keyword matching.

It also makes it possible to find items even if the user doesn’t use the same terminology as the document.

⇒ Better tolerance for fuzziness (synonyms, spoken formulations, partial recollection).

Finally, it makes it possible to aggregate excerpts from multiple documents that are relevant to answering a question.

⇒ Enriched answers with precise contextual excerpts (top chunks), with the ability to assemble passages from different sources.

Concrete examples:

ā€œCan you find me the file where we had issues with the client about vibrated concrete?ā€
Without vectorization: exact keyword search; risk of missing if the exact term doesn’t appear.
With vectorization: understanding of the topic and semantic neighbors, better surfacing of relevant correspondence.
Searching for letters / correspondence in large folders (1,000–3,000 docs) starting from fuzzy phrasing.
Preparing proposal responses: find similar projects and produce a quick summary (scope, method, project lead, etc.).

āš™ļø How it works (Outmind)

Creating a vectorized source

Two types of sources exist:

  • Personal & shared sources:

    • A ā€œVectorizationā€ option is available in the creation modal for administrators on Outmind for this first version only.

For shared sources, permissions are not taken into account by the shared vectorized source.

The permissions dropdown is set to ā€œNo permissionsā€ and disabled (not editable).

By default: the option is disabled. Enabling it is an explicit admin choice (compliance: transfer to a third party).

Usage and experience

  • Vectorized sources coexist with non-vectorized ones:

    • selection in a conversation, in an assistant, or in DocChat with no visual difference;

    • can be used alone or in combination with other sources.

Deleting a vectorized source

  • Same process as for other sources.


šŸ“Š Data processed

What is sent to Ragie?

  • The full file

  • Metadata (tree structure, parent path, dates, etc.)

What does Ragie store?

  • The file binaries.

Where do the data go?

  • Hosting: Ragie’s services are located in the United States.

  • International transfer: enabling vectorization entails a transfer to a third country. Admins must verify the legal basis and appropriate transfer mechanisms (e.g., SCCs) in light of their obligations (GDPR).

Security and compliance

  • Encryption: TLS in transit, AES-256 at rest.

  • Certifications: SOC 2 Type I and CASA; SOC 2 Type II announced.

  • Best practices: partition-based virtual isolation.

Retention & deletion

  • Retention: as long as the source is active, its chunks/embeddings remain indexed with Ragie.

  • Deletion: via source deletion or an on-demand erasure tool (purge). Erasure at Ragie is irreversible and may require a propagation delay.

Useful references

Did this answer your question?