Gen AI Guide > Embeddings > Exercise#2 CacheBackedEmbedding class

Exercise#2 CacheBackedEmbedding class

Objective

Use CacheBackedEmbedding class to manage the embeddings. You must avoid hash collisions as a common file system-based cache will be used for multiple models.

Go through the documentation of the class CacheBackedEmbeddings
Create the embedding model (you may use a different model)

from langchain_community.embeddings import CohereEmbeddings

# Create the embeddings model

model_name = "embed-english-light-v3.0"

Create the instance of Local File Cache
Create the instance of CacheBackedEmbeddings
Generate embeddings for the following corpus

corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]

Check the file system for generated embeddings
Change the namespace & generate embedding again
You should see another set of embedding as earlier embeddings are for a different namespace !!

References

LangChain Caching example

Cache Backed Embeddings class

Solution

The solution to the exercise is available in section#2 and #3 in the notebook:

ex-2-caching

Google Colab

Follow instructions in notebook to setup required packages