The objective is to understand the hyperparameters supported for fine-tuning by various models.
Steps
Go through the fine-tuning documentation for the model (links are provided)
Explore the hyperparameters recommendations for the models
You may go through the documentation manually or you may use an AI tool like ChatGPT, Google NotebookLLM to carry out this exercise.
References
Review the documentation on fine-tuning Gemini family of models or add the documentation as source/context to ground the AI tool.
List the adjustable hyperparameters for fine-tuning Gemini models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Gemini family of models. Include specific values or ranges, and explain their impact on model performance where applicable.
Review the documentation on fine-tuning Open AI models or add the documentation as souce/context to ground the AI tool.
Google NotebookLLM is unable to read the content from Open AI website. To address this use the Paste text option. Copy documentation content to clipboard and paste it as source in the notebookLLM.
List the adjustable hyperparameters for fine-tuning Open AI models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Open AI family of models. Include specific values or ranges, and explain their impact on model performance where applicable.
Review the documentation on fine-tuning Cohere family of models or add the documentation as source/context to ground the AI tool.
List the adjustable hyperparameters for fine-tuning Cohere models. Keep the response concise.
Provide recommendations for optimal hyperparameter settings for the Cohere family of models. Include specific values or ranges, and explain their impact on model performance where applicable.
Objective is to learn the common themes across the recommendations for the Gemini. Open AI and Cohere models.
Identify common themes in the fine-tuning recommendations provided by Gemini, OpenAI, and Cohere. Highlight shared practices, strategies, or guidelines, and explain how these align across the different frameworks.
Gemini
Parameter | Description |
---|---|
Epochs | Number of complete passes through the training dataset |
Batch size | Number of examples used in one training iteration |
Learning rate | Controls the adjustment of model parameters during each iteration |
Learning rate multiplier | Modifies the original learning rate |
Open AI
Hyperparameter | Description | Adjustment Recommendations |
---|---|---|
Epochs | Number of complete passes through the training dataset | Increase by 1-2 for underfitting, decrease by 1-2 for overfitting |
Learning Rate Multiplier | Modifies the default learning rate | Increase for convergence issues, decrease for stability |
Batch Size | Number of training examples processed together | No explicit recommendations provided |
Cohere
Hyperparameter | Description | Range | Default |
---|---|---|---|
epochCount | Number of epochs | 1-100 | 1 |
batchSize | Number of samples processed per iteration | Command: 8; Light: 8-32 | Command: 8; Light: 8 |
learningRate | Learning rate | 5.00E-6 to 0.1 | 1.00E-5 |
earlyStoppingThreshold | Minimum improvement required to continue training | 0-0.1 | 0.01 |
earlyStoppingPatience | Tolerance for stagnation in loss | 1-10 | 6 |
evalPercentage | Percentage of dataset used for evaluation | 5-50 | 20 |
Common themes
Feature | OpenAI | Cohere |
---|---|---|
Starting Point | Default Hyperparameters | Default Hyperparameters |
Iterative Adjustment | Recommended | Recommended |
Epochs | Adjust based on model behavior | Higher for larger, complex datasets |
Learning Rate | Adjust multiplier for convergence and stability | Dynamic adjustment with validation dataset |
Batch Size | No explicit recommendations | Model-specific limits and defaults |
Data Quality | Prioritize quality over quantity | Implicitly emphasized through validation dataset usage |