The objective is to learn how to prepare a dataset for fine-tuning.
Open AI fine tuning requirements
The data prep-processing code will:
Dataset for Open AI 4o mini model needs to be in chat format. Here is a sample:
{
"messages":[
{"role":"system","content":"you will categorize the user's input into one or more categories: ['toxic', 'severe_toxic', 'threat', 'insult', 'identity_hate']"},
{"role":"user","content":"\"\"\"Nazi filth\"\" is impolite 04:27, 20 Jan 2004 (UTC)\n\n\""},
{"role":"assistant","content":"[\"toxic\", \"insult\"]"}
]}
Default hyperparameters were used for model fine-tuning. No attempt was made to make the model more accurate. You will learn how to adjust hyperparameters in a later lesson.
You are responsible for content moderation for a public forum.
For the given comment you will categorize the user's input into one or more categories: ['toxic', 'severe_toxic', 'threat', 'insult', 'identity_hate']
Ech, you silly Mensans, you have IQ points to spar
Actual: [“toxic”,“insult”]
Look, you're a pedant, and Fetzer is a Jew-hater. On Press TV in the UK today - September 2nd 2011 - he said that the Israelis were behind 9\/11. The man is a complete fool
[“toxic”,“insult”,“identity_hate”]