Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7

Posted on August 25, 2023August 25, 2023 by Aritra Sen

In the last few blogposts, we have gone through the basics of LLMs and different fine-tuning approaches and basics of LangChain. In this post we will mainly work with the embeddings from LLM, how we can store these LLM embeddings in Vector Store and using this persistent vector db how we can do Semantic search. Below are the high-level steps which we will do to perform the require operations –

Semantic Search using Vector Store (Credit: Author)

Before going into the coding, let’s go through the steps in details-

  1. Loading Document:
    Using LangChain we can load different types of documents like pdf , csv , html etc. Follow this page to get more detailed understanding of the different document loaders – https://python.langchain.com/docs/modules/data_connection/document_loaders/
    For some of the document loaders like HTMLLoader and PDF Loader we need to install dependent libraires like BeautifulSoup, pypdf.
  2. Transform Documents to Chunks:
    Once we load the document, we can access the document using page_content , however sometimes this page contents can be very large to be feed into the model (every LLM has a max input token limitations). So, we can create document chunks using below mentioned processes –
    1. By using a chunk-size based on character length.
    2. By using the size of input tokens.
  3. Create Embeddings:
    Using LangChain we can create numeric embeddings of the text chunks. LangChain supports different LLM embeddings like OpenAI embedding, Sentence Transformer embedding etc.
  4. Vector Store:
    Using Vector Store, we can store these document embeddings (persistent storage) for future uses like Semantic search. A user can send a search text and using LLM first we can convert that text to embeddings, using this query embedding and Vector Store embedding we can perform semantic search and retrieve the most relevant document/text using Vector Store. For this tutorial we will use the open-source vector store named chromadb (Vector stores | 🦜️🔗 Langchain). Using vector store we can easily add, update, or delete new vectors.

    Now let’s get our hands dirty.

Do like, share and comment if you have any questions or suggestions.

Category: Aritra Sen, Machine Learning, Python

Post navigation

← Generative AI: LLMs: LangChain + Llama-2-chat on Amazon mobile review dataset 1.6
Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 →

1 thought on “Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7”

  1. Pingback: Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • All about iPad
  • Generative AI: LLMs: Getting Started 1.0
  • Deep Learning with Pytorch – Custom Weight Initialization – 1.5
  • Google+ All you need to know
  • How to Share in Orkut

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme