Work on Context Together
To get started, you need to purchase Unacog credits which can be found in profile options or by clicking below.
Usage is paid by the session owner. The default limit per session is current set to 1,000 and can change be changed in document options.
Anonymous users to verify their account to purchase credits and create sessions. This can be done by providing an email in profile options or re-logging with your credentials.
Semantic Retrieval Examples
Learn the secret to optimizing embedded context
Experience firsthand how adjusting the prompt template and retrieval pipeline can lead to more precise outcomes. Experiment with various chunking and embedding options to adjust the quality of generated responses.
AI Research Library
Discover how chunk size impact results.
- Over 2800 research papers on AI and LLM
- 4 chunk sizes for more than 1.5 million vectors indexed
Chunking Methodology:
AI research papers are lengthy and feature complex formatting. By scraping and chunking these papers, we create a substantial data source for semantic analysis, with the example demonstrating the impact of four different chunking sizes on search results.
Tokens | Overlap | Vectors |
---|---|---|
100 | 15 | 778k |
200 | 20 | 374k |
300* | ? | 241k |
400 | 30 | 171k |
Embedding Options:
Two embedding options are provided, varying according to the selected chunk size:
- Embed Top K Document Chunks: Aggregates the top X chunk results across multiple documents, facilitating a broad yet relevant context. Optionally only include unique documents.
- Embed Top K Chunk with J Additional Context: Focuses on a single document source, embedding the matched chunk along with surrounding chunks to enrich the LLM's contextual understanding.
Experiment with different options and rerun queries to gain insights on the how chunk size and retrieval method affect search results and generated responses.
Scraped data from https://huggingface.co/datasets/jamescalam/ai-arxiv2.
English Bible
Tinker with content evaluation and moderation
- Templates that perform subjective analysis
- Run semantic query to assess verses and chapters
The bible is a well-known hierarchy of books, chapters and verses. In this example, verses were converted to vectors using text-embedding-3-small with metadata for book and chapter index. A second vector index for chapters was created to compare semantic search results.
The example has four retrieval pipelines options for embedding text based on query specifics.
- Subjective RAG Pipeline: send search results to LLM for subjective analysis.
- Small-to-Big Embedding: return a larger contextual segment around the matched chunk
Templates provided demonstrate evaluation mechanisms, systematically assessing and summarizing content, enhancing the flexibility of AI agents and chatbot systems. The approach allows for more tailored interactions, as it equips AI and chatbot systems with the capability to judge and adapt responses based on specific content criteria.
We utilized the Basic English Version of the Bible, made available through this GitHub repository, for our demo.
Covid Research
Compare text chunk splitting techniques
- Over 400 research papers on Covid from science.org
- Explore how different segmentation methods impact results
Data was scraped from the COVID-19 dataset on Science.org/collections/coronavirus.
- Prepare document list: Use a spreadsheet to configure title, url, and other metadata.
- Load document list: configure chunking options and embedding model.
- Scraping documents: fetch and scrape text content using web urls.
Splitting text into chunks
- Sentences: blocks of 15 sentences with 3 overlap
- Chunk Size: blocks of sentences sized less than 300 tokens
- Recursive: syntactic paragraph and sentence chunking
In some cases overlap is used to ensure no context is lost. Compare the techniques to assess the relevance of chunks.
Song Search
Music Discovery with Subjective Metrics
- Run semantic queries with 4 chunk size options
- Filter by custom generated subjective metrics
- 301 Top Billboard songs from 2024
Subjective Content Evaluation with LLMs
This demo showcases the power of large language models in evaluating subjective aspects of song lyrics from a curated dataset of 301 top Billboard songs from 2024. Through carefully crafted prompts, we leverage LLMs to generate ratings for metrics like profanity, violence, sexual content, and comedic elements.
Metric | Prompt |
---|---|
Motivational | Assess motivational/inspirational content by evaluating the presence of uplifting language, the ability to evoke resilience, ambition, and positive change... |
Inappropriate Language | For language content, evaluate the presence and context of profanity, slurs, and vulgar expressions... |
Violent | Evaluate violent content by identifying acts of aggression, harm, or destruction... |
These subjective ratings are then integrated into the semantic search system, enabling users to filter and discover songs aligned with their preferences. Set thresholds for different metrics to curate a personalized music experience tailored to your content criteria.
The lyrics are chunked into different sizes - entire song, stanza, large (double stanza), and verse - to explore the impact on search results. This allows users to find relevant lyrics at various granularities, from complete songs to individual verses.
Chunk Size | Description |
---|---|
Entire Song | All lyrics of the song are used as a single chunk |
Stanza | 6 lines/verses in total, with 4 lines for the main chunk and 2 lines for the preceding and post overlap |
Large (Double Stanza) | 12 lines in total, with 8 lines for the main chunk and 2 lines each for the preceding and post line overlap |
Verse | A single line of lyrics is used as a chunk |
Explore how AI-driven content analysis can enhance music discovery, providing a seamless fusion of language understanding, subjective content moderation, and flexible lyric retrieval.
AI Data Ingestation with No-Code
Explore tools that turn documents into semantic databases for chatbot integration. Enjoy flexible data management with serverless options and precise chunking to boost your project’s efficiency and accuracy. Streamline your data-to-decision process with our embedding technology.
Learn MoreConnect with us for a no-charge discovery consultation.