add

Work on Context Together

To get started, you need to purchase Unacog credits which can be found in profile options or by clicking below.

Usage is paid by the session owner. The default limit per session is current set to 1,000 and can change be changed in document options.

Anonymous users to verify their account to purchase credits and create sessions. This can be done by providing an email in profile options or re-logging with your credentials.

Semantic Retrieval Examples


Learn the secret to optimizing embedded context

Experience firsthand how adjusting the prompt template and retrieval pipeline can lead to more precise outcomes. Experiment with various chunking and embedding options to adjust the quality of generated responses.

AI Research Library

Discover how chunk size impact results.

  • Over 2800 research papers on AI and LLM
  • 4 chunk sizes for more than 1.5 million vectors indexed
Loading the Pinecone Index
 


Chunking Methodology:

AI research papers are lengthy and feature complex formatting. By scraping and chunking these papers, we create a substantial data source for semantic analysis, with the example demonstrating the impact of four different chunking sizes on search results.

Tokens Overlap Vectors
100 15 778k
200 20 374k
300* ? 241k
400 30 171k
* ai-arxiv2-chunks already chunked (approximately 300 tokens).
Embedding Options:

Two embedding options are provided, varying according to the selected chunk size:

  1. Embed Top K Document Chunks: Aggregates the top X chunk results across multiple documents, facilitating a broad yet relevant context. Optionally only include unique documents.
  2. Embed Top K Chunk with J Additional Context: Focuses on a single document source, embedding the matched chunk along with surrounding chunks to enrich the LLM's contextual understanding.

Experiment with different options and rerun queries to gain insights on the how chunk size and retrieval method affect search results and generated responses.

The ai-arxiv2 dataset contains a selection of papers on the topics of AI and LLMs (arxiv.org).

Scraped data from https://huggingface.co/datasets/jamescalam/ai-arxiv2.

English Bible

Tinker with content evaluation and moderation

  • Templates that perform subjective analysis
  • Run semantic query to assess verses and chapters
 
Querying the Pinecone Index


The bible is a well-known hierarchy of books, chapters and verses. In this example, verses were converted to vectors using text-embedding-3-small with metadata for book and chapter index. A second vector index for chapters was created to compare semantic search results.

The example has four retrieval pipelines options for embedding text based on query specifics.

  • Subjective RAG Pipeline: send search results to LLM for subjective analysis.
  • Small-to-Big Embedding: return a larger contextual segment around the matched chunk

Templates provided demonstrate evaluation mechanisms, systematically assessing and summarizing content, enhancing the flexibility of AI agents and chatbot systems. The approach allows for more tailored interactions, as it equips AI and chatbot systems with the capability to judge and adapt responses based on specific content criteria.

We utilized the Basic English Version of the Bible, made available through this GitHub repository, for our demo.

Covid Research

Compare text chunk splitting techniques

  • Over 400 research papers on Covid from science.org
  • Explore how different segmentation methods impact results
 
Preparing data for embed


Data was scraped from the COVID-19 dataset on Science.org/collections/coronavirus.

  1. Prepare document list: Use a spreadsheet to configure title, url, and other metadata.
  2. Load document list: configure chunking options and embedding model.
  3. Scraping documents: fetch and scrape text content using web urls.
Splitting text into chunks
  1. Sentences: blocks of 15 sentences with 3 overlap
  2. Chunk Size: blocks of sentences sized less than 300 tokens
  3. Recursive: syntactic paragraph and sentence chunking

In some cases overlap is used to ensure no context is lost. Compare the techniques to assess the relevance of chunks.

Song Search

Music Discovery with Subjective Metrics

  • Run semantic queries with 4 chunk size options
  • Filter by custom generated subjective metrics
  • 301 Top Billboard songs from 2024
 
implementing computed metric scores


Subjective Content Evaluation with LLMs

This demo showcases the power of large language models in evaluating subjective aspects of song lyrics from a curated dataset of 301 top Billboard songs from 2024. Through carefully crafted prompts, we leverage LLMs to generate ratings for metrics like profanity, violence, sexual content, and comedic elements.

Metric Prompt
Motivational Assess motivational/inspirational content by evaluating the presence of uplifting language, the ability to evoke resilience, ambition, and positive change...
Inappropriate Language For language content, evaluate the presence and context of profanity, slurs, and vulgar expressions...
Violent Evaluate violent content by identifying acts of aggression, harm, or destruction...

These subjective ratings are then integrated into the semantic search system, enabling users to filter and discover songs aligned with their preferences. Set thresholds for different metrics to curate a personalized music experience tailored to your content criteria.

The lyrics are chunked into different sizes - entire song, stanza, large (double stanza), and verse - to explore the impact on search results. This allows users to find relevant lyrics at various granularities, from complete songs to individual verses.

Chunk Size Description
Entire Song All lyrics of the song are used as a single chunk
Stanza 6 lines/verses in total, with 4 lines for the main chunk and 2 lines for the preceding and post overlap
Large (Double Stanza) 12 lines in total, with 8 lines for the main chunk and 2 lines each for the preceding and post line overlap
Verse A single line of lyrics is used as a chunk

Explore how AI-driven content analysis can enhance music discovery, providing a seamless fusion of language understanding, subjective content moderation, and flexible lyric retrieval.

AI Data Ingestation with No-Code

Explore tools that turn documents into semantic databases for chatbot integration. Enjoy flexible data management with serverless options and precise chunking to boost your project’s efficiency and accuracy. Streamline your data-to-decision process with our embedding technology.

Learn More

Connect with us for a no-charge discovery consultation.