In this self-paced fun exercise, you will have the opportunity to try out the various similarity search algorithms for searching movies. There are 2 parts in the exercise. In part-1 your focus will be on selecting the index for optimal recall, in part-2 you will compare the search performance of the index.
The notebooks require good amount of CPU/Memory to run. In case you have a problem in running the notebooks on local machine, use Google Colab.
Go through the movie dataset hosted on HuggingFace. We will use the embeddings already available in the dataset.
acloudfan/embedded_movies_small
You will be given a notebook with code for creating multiple types of indexes. Your objective is to go through the index setups to adjust the parameters to get the best results from the index. Focus will be on the Recall. The baseline will be the results from the Flat index run.
Objective is to compare the query latency (performance) of the indexes.
Use the notebook: