Exercise#4 Explore hyperparameters

Objective

The objective is to understand the hyperparameters supported for fine-tuning by various models.

Steps

  1. Go through the fine-tuning documentation for the model (links are provided)

  2. Explore the hyperparameters recommendations for the models

You may go through the documentation manually or you may use an AI tool like ChatGPT, Google NotebookLLM to carry out this exercise.

References

Google NotebookLM

1. Gemini

Review the documentation on fine-tuning Gemini family of models or add the documentation as source/context to ground the AI tool.

List the adjustable hyperparameters for fine-tuning Gemini models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Gemini family of models. Include specific values or ranges, and explain their impact on model performance where applicable.

2. Open AI

Review the documentation on fine-tuning Open AI models or add the documentation as souce/context to ground the AI tool.

Google NotebookLLM is unable to read the content from Open AI website. To address this use the Paste text option. Copy documentation content to clipboard and paste it as source in the notebookLLM.

List the adjustable hyperparameters for fine-tuning Open AI models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Open AI family of models. Include specific values or ranges, and explain their impact on model performance where applicable.

3. Cohere

Review the documentation on fine-tuning Cohere family of models or add the documentation as source/context to ground the AI tool.

List the adjustable hyperparameters for fine-tuning Cohere models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Cohere family of models. Include specific values or ranges, and explain their impact on model performance where applicable.

4. Common theme for fine-tuning

Objective is to learn the common themes across the recommendations for the Gemini. Open AI and Cohere models.

Identify common themes in the fine-tuning recommendations provided by Gemini, OpenAI, and Cohere. Highlight shared practices, strategies, or guidelines, and explain how these align across the different frameworks.

Solution

Gemini

Parameter Description
Epochs Number of complete passes through the training dataset
Batch size Number of examples used in one training iteration
Learning rate Controls the adjustment of model parameters during each iteration
Learning rate multiplier Modifies the original learning rate

Open AI

Hyperparameter Description Adjustment Recommendations
Epochs Number of complete passes through the training dataset Increase by 1-2 for underfitting, decrease by 1-2 for overfitting
Learning Rate Multiplier Modifies the default learning rate Increase for convergence issues, decrease for stability
Batch Size Number of training examples processed together No explicit recommendations provided

Cohere

Hyperparameter Description Range Default
epochCount Number of epochs 1-100 1
batchSize Number of samples processed per iteration Command: 8; Light: 8-32 Command: 8; Light: 8
learningRate Learning rate 5.00E-6 to 0.1 1.00E-5
earlyStoppingThreshold Minimum improvement required to continue training 0-0.1 0.01
earlyStoppingPatience Tolerance for stagnation in loss 1-10 6
evalPercentage Percentage of dataset used for evaluation 5-50 20

Common themes

Feature OpenAI Cohere
Starting Point Default Hyperparameters Default Hyperparameters
Iterative Adjustment Recommended Recommended
Epochs Adjust based on model behavior Higher for larger, complex datasets
Learning Rate Adjust multiplier for convergence and stability Dynamic adjustment with validation dataset
Batch Size No explicit recommendations Model-specific limits and defaults
Data Quality Prioritize quality over quantity Implicitly emphasized through validation dataset usage