Project: Movie Recommendation

Objective

In this self-paced fun exercise, you will have the opportunity to try out the various similarity search algorithms for searching movies. There are 2 parts in the exercise. In part-1 your focus will be on selecting the index for optimal recall, in part-2 you will compare the search performance of the index.

The notebooks require good amount of CPU/Memory to run. In case you have a problem in running the notebooks on local machine, use Google Colab.

Dataset

Go through the movie dataset hosted on HuggingFace. We will use the embeddings already available in the dataset.

acloudfan/embedded_movies_small

Part-1

You will be given a notebook with code for creating multiple types of indexes. Your objective is to go through the index setups to adjust the parameters to get the best results from the index. Focus will be on the Recall. The baseline will be the results from the Flat index run.

project-1-index-setup

Open In Colab
Tasks
  1. Run through the cells to read the dataset
  2. Checkout the results for the Flat L2 Index.
  3. Adjust the parameters for each index
  4. Run and try out different test movie indices

Part-2

Objective is to compare the query latency (performance) of the indexes.

Use the notebook:

project-1-index-setup

Open In Colab
Tasks
  1. If you have adjusted the index parameters in part-1, make sure to adjust the parameters in this notebook to same values.
  2. Review the setup
  3. Run the code and check the index performance results in the last cell
  4. Adjust the parameters & run all cells to see the change