Exercise#3 CoT

Objective

Learn to use Chain of Thought approach. You are given multiple logical/reasoning prompts. Your task is to try out these prompts in playground for one or more models. They may or may not work ou for the given model. If the prompt doesn’t work out appropriately then your task is to use COT to fix it. Keep in mind that there are multiple ways to fix the issues. Solutions are at the end of this page.

Note

  • Set Temperature, Randomness = 0, as we want same answer each time
  • Use text-generation and NOT the chat interface
  • All prompts tested on Cohere Command model
  • Each model is different, you may see different results

Prompt#1

Source of Prompt: Appendix in the paper on CoT

Take the last letters of the words in “Waldo Schmidt” and concatenate them. what will be the answer?

Answer should be: ‘ot’

Prompt#2

Source of prompt : GSM8K dataset

Q: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

A:

Answer should be: $10

Prompt#3

What is the total cost of purchasing equipment for all sixteen players on the football team, considering that each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80?

Answer should be: $752

Prompt#4

Source of prompt : Appendix in COT paper on arxiv.org

Hitnt : sometime you may need to nudge the model with a minor tweak to Zero-Shot COT

 James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total
meters does he run a week?

Prompt#5

Try out the prompt below in ‘Cohere Command’ model, you will get an inconsistent response. You need to use few-shots-cot to address the inconcistency.

Q:The odd numbers in this group add up to an even number: 2, 2, 4, 16, 12, 2.
A:

Solutions

Prompt#1

Take the last letters of the words in “Waldo Schmidt” and concatenate them. what will be the answer?

When this prompt is used with Cohere Command model, it gives an incoherent (pun intended) answer. To fix the issue, you can provide a single shot.

Take the last letters of the words in “Waldo Schmidt” and concatenate them. what will be the answer?

1. first word is "waldo", the last letter in it is 'o'
2. next word is "Schmidt", the last letter in it is 't'
3. Concatenate the letters, which is "ot"

answer is "ot"

Take the last letters of the words in “Walter  sling madden” and concatenate them. what will be the answer?

Prompt#2

When this prompt is run in Cohere Claude, it shows a non sensical thinking leading to a wrong answer. It was fixed with a 1-shot approach. Suggest that you try out additional problems from the GSM8K dataset.

Q: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

A:

To solve the challenge, I copied & pasted Q & A for row 0 in GSM8K dataset.

Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

A: Natalia sold 48/2 = <<48/2=24>>24 clips in May. Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May. #### 72

Q: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

A:

Prompt#3

Applied Zero-Shot COT by adding ’think step by step’

What is the total cost of purchasing equipment for all sixteen players on the football team, considering that each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80?

think step by step

Prompt#4

I first tried adding ’think step by step’, it did not give the asnweer so I asked model to validate the response in each step ’think step by step and validate each step’.

 James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total
meters does he run a week?

think step by step and validate each step

Prompt#5

You will need to provide few shots to get this one going. Suggest that you copy 1 example at a time and then see if you can get the right answers for the additional tests questions below. Continue to add examples till you are able to get correct answer for all test cases!!

Some of my test cases were still failing with few-shots. I could have given more examples, but I decided to add ‘zero-shot COT’ and that did the trick.

Q: The odd numbers in this group add up to an even number: 6, 8, 11, 15, 12, 2, 3.
A: Adding all the odd numbers (11,15,3) gives 29. The answer is False.

Q:The odd numbers in this group add up to an even number: 1,  10, 23, 4,1, 12, 3.
A: Adding all the odd numbers (1,23,1,3) gives 28. The answer is True.

Q:The odd numbers in this group add up to an even number: 8,  31, 14, 4, 8, 3, 6.
A: Adding all the odd numbers (31, 3) gives 34. The answer is True.

Q:The odd numbers in this group add up to an even number: 7,  11, 10, 12, 14, 7.
A: Adding all the odd numbers (7,11,7) gives 25. The answer is False.

Q:The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:

think step by step
Test cases
Q:The odd numbers in this group add up to an even number: 15, 3, 8, 13, 9
A:

Answer : Odd numbers(15,3,13,9) add up to 40, answer is True

Q:The odd numbers in this group add up to an even number: 2, 31, 81, 13, 92, 37,101
A:

Answer : Odd numbers(31,81,13,37,101) add up to 263, answer is False

Q:The odd numbers in this group add up to an even number: 2, 1, 1, 13, 12, 2.
A:

Answer : Odd numbers(1,1,13) add up to 15, answer is False

Generate your own training data

  • You will observe failures for large numbers
  • You will see failures with different number of elements in the group
  • You will see failures in case there are all odd or all even numbers

To build a robust prompt we will need more than 4 shots. One way to address this issue would be to synthetically generate data with a few 100 examples to train & test our solution. I will leave this as an exercise for you at this time.