Exercise#1 Datasets library

Objective

You are asked to carry out an experiment for text summarization task. Your team lead told you to use the Hugging Face dataset cnn_dailymail. Your task is to create a very-small dataset from the original dataset and share it with your team via a shared folder (or email). In addition you need to convert the data to CSV for some analysis.

Before proceeding, visit the HF hub and explore the dataset.

Hints:

  1. Load the dataset

  2. Split the dataset test into test (30%) - train (70%) split

  3. Save the dataset to local disk

  4. Convert the dataset to CSV

Solution

Check the solution at the end of the notebook:

Datasets/datasets-library.ipynb

ext-1-datasets-library

Google colab

  • Make sure to follow instructions for setting up packages
Open In Colab