Exercise#4 Explore NLP Tasks

Objective

Understand the common NLP tasks and try them out using HuggingFace. Exercise involves finding the right model for the given task and then using the playground-widget to try it out. For each task you will find an explanation of the task and ONE possible solution. Keep in mind there are multiple models (solutions) for each task. You will be trying out the following tasks:

  1. Classification
  2. Fill mask
  3. Sentence similarity
  4. Summarization
  5. Question-answering

Common steps:

Objective is to search for and try out an open-source HuggingFace model for a given use case.

  1. Use HuggingFace hub to identify binary classification models by using the filters:
    • Select “TASK” filter e.g., Text classification
    • Under other filter select *warm, *cold, and inference endpoint
  2. Review the model documentation to understand if it is good for a given task
  3. Use the playground-widget to try out the selected model’s capability to classify
  4. (optional) If possible find another model (or 2) and compare their performances
  5. Try out the provided solution

Classification task

A classification task in machine learning and NLP involves categorizing input data into predefined classes or labels. The goal is to assign a category or label to a given piece of data (textual in case of LLM) based on its features.

The objective of a classification task is to predict the correct label for an input from a set of possible labels. For example, in sentiment analysis, the task is to classify a text as having positive, negative, or neutral sentiment. Here are different types of classification:

  • Binary Classification: The task involves two classes (e.g., spam vs. not spam).

  • Multiclass Classification: The task involves more than two classes (e.g., classifying emails into categories like work, personal, promotions).

  • Multilabel Classification: The task involves assigning multiple labels to each input (e.g., tagging an image with multiple labels like “cat,” “outdoor,” “daytime”).

1. Use case Sentiment analysis

The model should be able to identify a comment or a statement as positive or negative

One of the solutions

https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english

2. Multi-class

One of the solutions

https://huggingface.co/facebook/bart-large-mnli

3. Multi-label

One of the solutions

https://huggingface.co/SamLowe/roberta-base-go_emotions

Fill mask task

The “fill mask” task, also known as masked language modeling (MLM), is a type of language modeling task where the goal is to predict missing or masked words in a sentence. This task is commonly used in the training of transformer-based models like BERT (Bidirectional Encoder Representations from Transformers).

The objective is to predict the original word that has been replaced with a special token (typically [MASK]) in a given sentence. The model must use the context provided by the surrounding words to accurately fill in the masked position.

One of the solutions

https://huggingface.co/distilbert/distilbert-base-uncased

Sentence similarity

The sentence similarity task in NLP involves measuring how similar two sentences are in meaning. The goal is to assign a similarity score or label to a pair of sentences, indicating how closely related they are in terms of semantic content.

The main objective is to determine whether two sentences convey the same or similar ideas, or how closely related their meanings are. The output is typically a similarity score (e.g., between 0 and 1) or a label (e.g., “similar” or “dissimilar”).

One of the solutions

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Summarization

The summarization task in NLP involves condensing a long piece of text into a shorter version that captures the main points or essential information. The goal is to generate a summary that retains the most important content and meaning of the original text while reducing its length.

The primary objective is to produce a concise and coherent summary of a given text. The summary should reflect the main ideas, key points, and relevant information from the original document without introducing new information or altering the original meaning.

One of the solutions

https://huggingface.co/facebook/bart-large-cnn

Question-Answering

The question-answering (QA) task in NLP involves building systems that can automatically answer questions posed by humans in natural language. The goal is to provide precise, relevant, and contextually appropriate answers to user queries, which can be based on a given text, a database, or even general knowledge.

The primary objective of a QA system is to understand the user’s question and retrieve or generate the correct answer. This involves both understanding the question’s intent and finding the most relevant information that answers the query.

One of the solutions

https://huggingface.co/deepset/roberta-base-squad2