Exercise#1 Summarization PoC

Objective

Your team has a requirement to build a application feature to summarize documents. Team agreed to build it with Generative AI - Summarization Task. Your manager has asked you to suggest a few models for the team to try out. Your task:

exercise-1-contract-summarization

  • Identify candidate models for the task
  • Suggest how the model should be invoked (Endpoint, Pipeline)

In this exercise you will be carrying out a POC. Then you will use the results to make a recommendation. Exercise is the POC :) there are 2 parts in it.

Part-1

  1. Start by exploring models on the Hugging Face Hub. Identify 2 models, one for each of the following
    • Extractive summarization (Hint: facebook/bart…)
    • Abstractive summarization (Hint: Falconsai/text_…)
  2. Use the Inference client class to summarize the text with both models
    • Compare the results
    • Which model worked out better for you?
  3. Measure the inference time for the better model
    • Place the client.summarize(text) call in its own cell, with %%time on top of cell

Part-2

  1. Use the better model with Pipeline class to summarize the text
    • Place the inference invocation call in its own cell i.e., summarization_pipeline(text)
    • Use %%time to measure the time it took to summarize the document
  2. Which mechanism worked out better from performance perspective?
    • Pipeline?
    • Inference Client i.e., Endpoint?

(Optional) Share your results with others in the Q&A :)

New to %%time?

Check out what it does

Text to Summarize

text = """Fueling your body with a healthy diet isn't just about looking good, it's about building a vibrant
foundation for a life well-lived. Imagine a symphony of vibrant vegetables, a chorus of juicy fruits, and a 
rhythm of whole grains and lean protein – that's the melody of a healthy diet, 
nourishing your cells and empowering your spirit. Ditch the processed sirens and sugary temptations, 
and embrace the natural orchestra of flavors and textures that nature offers. Let leafy greens be 
your verdant bassline, their antioxidants dancing on your tongue. Citrus fruits, like sunshine-
kissed trumpets, add a bright, refreshing melody. Whole grains, the sturdy timpani, provide 
sustained energy, while lean protein, a soulful cello, builds and repairs your body's instruments. 
Don't forget the playful percussion of nuts and seeds, bursting with essential nutrients, and the 
creamy, comforting oboe of legumes, rich in fiber and protein. This symphony of flavors isn't just 
delicious, it's a powerful conductor, harmonizing your gut, boosting your immunity, and keeping your 
energy levels soaring. So, ditch the culinary cacophony of processed foods, and embrace the healthy 
concerto of a vibrant, colorful diet. Let your body be the maestro, conducting a symphony of well-
being, one delicious bite at a time."""

Note: If you are running on Google Colab:

  1. Install the packages in runtime
  2. Restart the kernel before running the cells

Additional libraries

  • Some Models require addtional libraries to be installed e.g., google/pegasus-large has a dependency on sentencepiece package !!
  • If you come across a model like that, either install dependency or try another model

Solution

Candidate model for EXTRACTIVE_SUMMARIZATION

facebook/bart-large-cnn. Pre-trained model, fine tuned for comprehension. It has 406M parameters.

Candidate model for ABSTRACTIVE_SUMMARIZATION

Falconsai/text_summarization is a Hugging Face fine-tuned model for text summarization. It has 60.5M parameters.

Other alternatives:

google/flan-t5-small. Model is good on multiple tasks. It has 77M parameters.

google/pegasus-large

Solution Notebook in course repository

To run on your own machine
  • Open the sample notebook in Jupyter lab

exercise-1-solution-notebook

Open in Google Colab
  • Must install packages before running the code cells
pip install transformers torch huggingface_hub
Open In Colab