Exercise#2 CacheBackedEmbedding class

Objective

Use CacheBackedEmbedding class to manage the embeddings. You must avoid hash collisions as a common file system-based cache will be used for multiple models.

  1. Go through the documentation of the class CacheBackedEmbeddings
  2. Create the embedding model (you may use a different model)
from langchain_community.embeddings import CohereEmbeddings

# Create the embeddings model

model_name = "embed-english-light-v3.0"
  1. Create the instance of Local File Cache
  2. Create the instance of CacheBackedEmbeddings
  3. Generate embeddings for the following corpus
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]
  1. Check the file system for generated embeddings
  2. Change the namespace & generate embedding again
  3. You should see another set of embedding as earlier embeddings are for a different namespace !!

References

LangChain Caching example

Cache Backed Embeddings class

Solution

The solution to the exercise is available in section#2 and #3 in the notebook:

ex-2-caching

Google Colab

  • Follow instructions in notebook to setup required packages
Open In Colab