Parameter | Description | Type | Default |
---|---|---|---|
candidate_count |
Number of generated responses to return. | Integer | Model-dependent |
stop_sequences |
The set of character sequences (up to 5) that will stop output generation. | List of Strings | None |
max_output_tokens |
The maximum number of tokens to include in a candidate. | Integer | Model-dependent |
temperature |
Controls the randomness of the output. | Float (0.0 - 1.0) | Model-dependent |
top_p |
Maximum cumulative probability of tokens to consider when sampling. | Float (0.0 - 1.0) | Model-dependent |
top_k |
Maximum number of tokens to consider when sampling. | Integer | 40 |
response_mime_type |
Output response mimetype. | String (text/plain, application/json) | text/plain |
response_schema |
Specifies the format of the JSON response. | JSON Schema | None |
candidate_count is a model specific parameter. It instructs the model to generate specified number of responses per query.
The presence_penalty & frequency_penalty are NOT supported
Models support JSON output via a decoder hyperparameter response_schema
Parameter | Description | Minimum | Maximum | Default |
---|---|---|---|---|
temperature |
Controls randomness of response. | 0 | 1 | 0.5 |
topP |
Controls probability threshold for token selection. | 0 | 1 | 0.5 |
maxTokens |
Maximum number of tokens in the generated response. | 0 | 8,191 (Mid, Ultra, Large) | 2,048 (Other) |
stopSequences |
Sequences to terminate generation. | N/A | N/A | N/A |
presencePenalty.scale |
Penalizes tokens that appear in the prompt or completion. | 0 | 5 | 0 |
countPenalty.scale |
Penalizes tokens based on frequency of appearance. | 0 | 1 | 0 |
frequencyPenalty.scale |
Penalizes tokens based on normalized frequency. | 0 | 500 | 0 |
countPenalty.applyToWhitespaces , countPenalty.applyToPunctuations , countPenalty.applyToNumbers , countPenalty.applyToStopwords , countPenalty.applyToEmojis |
Applies penalty to special characters. | N/A | N/A | True |
countPenalty.applyToWhitespaces |
Applies penalty to whitespaces and new lines. | N/A | N/A | True |
countPenalty.applyToPunctuations |
Applies penalty to punctuation. | N/A | N/A | True |
countPenalty.applyToNumbers |
Applies penalty to numbers. | N/A | N/A | True |
countPenalty.applyToStopwords |
Applies penalty to stop words. | N/A | N/A | True |
countPenalty.applyToEmojis |
Excludes emojis from penalty. | N/A | N/A | True |
Parameter | Description | Type | Default | Minimum | Maximum |
---|---|---|---|---|---|
max_tokens |
Maximum number of tokens in the generated response. | Integer | Model-dependent | 1 | Model-dependent |
stop |
List of stop sequences to terminate generation. | List of Strings | 0 | 0 | 10 |
temperature |
Controls randomness of predictions. | Float (0.0 - 1.0) | Model-dependent | 0 | 1 |
top_p |
Controls diversity of generated text. | Float (0.0 - 1.0) | Model-dependent | 0 | 1 |
top_k |
Controls the number of most-likely candidates considered. | Integer | Model-dependent | 1 | 200 |
Parameter | Description | Type | Default | Minimum | Maximum |
---|---|---|---|---|---|
temperature |
Controls randomness of predictions. | Float | 0.9 | 0 | 5 |
p |
Top P; controls probability threshold for token selection. | Float | 0.75 | 0 | 1 |
k |
Top K; controls number of token choices for next token. | Integer | 0 | 0 | 500 |
max_tokens |
Maximum number of tokens in the generated response. | Integer | 20 | 1 | 4096 |
stop_sequences |
List of sequences to terminate generation. | List of Strings | N/A | N/A | N/A |
return_likelihoods |
Specifies if and how token likelihoods are returned. | String (GENERATION, ALL, NONE) | NONE | N/A | N/A |
stream |
Indicates if the response should be streamed. | Boolean | False | N/A | N/A |
num_generations |
Maximum number of generations to return. | Integer | 1 | 1 | 5 |
logit_bias |
Prevents or encourages certain tokens. | Dictionary (token_id: bias) | N/A | -10 | 10 |
truncate |
Specifies how to handle inputs longer than the maximum token length. | String (NONE, START, END) | END | N/A | N/A |
Parameter | Description | Type | Required | Default | Minimum | Maximum |
---|---|---|---|---|---|---|
max_tokens_to_sample |
Maximum number of tokens to generate (recommended limit: 4,000). | Integer | Yes | 200 | 0 | 4096 (Model-dependent) |
stop_sequences |
Optional sequences to stop generation (default includes “\n\nHuman:”). | List of Strings | No | N/A | N/A | N/A |
temperature |
Controls randomness of response (0: less random, 1: more creative). | Float | No | 1 | 0 | 1 |
top_p |
Probability threshold for nucleus sampling (use only one of temperature or top_p). | Float | No | 1 | 0 | 1 |
top_k |
Sample only from the top K most likely tokens. | Integer | No | 250 | 0 | 500 |
These parameters apply to multiple HuggingFace models exposed as endpoints on HuggingFace platform.
Parameter | Description | Type | Default |
---|---|---|---|
min_length |
Minimum length of the output summary in tokens. | Integer | None |
max_length |
Maximum length of the output summary in tokens. | Integer | None |
top_k |
Top tokens considered within the sample operation. | Integer | None |
top_p |
Cumulative probability threshold for token selection. | Float | None |
temperature |
Controls randomness of predictions. | Float (0.0-100.0) | 1.0 |
repetition_penalty |
Penalizes tokens based on frequency of appearance. | Float (0.0-100.0) | None |
max_time |
Maximum time for query execution in seconds. | Float (0-120.0) | None |
You will use Google Gemini for testing, but feel free to use any other model of your choice. Intent is to learn the changes in model behavior with the adjustment of the parameters.
Understanding the API in use is optional. You may use any model for checking out the behavior changes with changes in the hyperparameters.
Review the API request body for Google models. You will find that there are multiple configuration parameters supported by the endpoint. Our interest is in the decoder parameters which are specified by an object referred to as GenerationConfig, shown below for your conveneince.
# DO NOT COPY to notebook
# This is for reference only
# Notice that presence_penalty & frequency_penalty are NOT supported
google.generativeai.types.GenerationConfig(
candidate_count: (int | None) = None,
stop_sequences: (Iterable[str] | None) = None,
max_output_tokens: (int | None) = None,
temperature: (float | None) = None,
top_p: (float | None) = None,
top_k: (int | None) = None,
response_mime_type: (str | None) = None,
response_schema: (protos.Schema | Mapping[str, Any] | None) = None
)
Use the path in your template folder for creating the notebook.
Your-template-folder/Gen-AI-Fundamentals/decoding-hyper-parameters.ipynb
import getpass
import google.generativeai as genai
from google.generativeai import GenerationConfig, GenerativeModel
google_api_key = getpass.getpass()
genai.configure(api_key=google_api_key)
Use code below to setup the LLM client with default values for the parameters.
model = "gemini-1.5-flash"
# Defaults
print(genai.get_model(name="models/"+model))
# Create the model with default parameter set values
llm = GenerativeModel(model)
query = "Explain LLM briefly"
response = llm.generate_content([query]) #, generation_config=generation_config)
# Extract the content from the response
response_text = response.candidates[0].content.parts[0].text
print(response_text)
print(len(response_text))
1. max_output_tokens, stop_sequences
Change the values of various parameters to see a change in the responses.
# Set the parameters and
generation_config = GenerationConfig(
temperature=None,
top_p=None,
top_k=None,
max_output_tokens=20,
stop_sequences=None #["**", "/n/n"]
)
# Generate a response
response = llm.generate_content([query], generation_config=generation_config)
# Extract the content from the response
response_text = response.candidates[0].content.parts[0].text
# Print the results
print(response_text)
print(len(response_text))
2. temperature, top_p, top_k
Change the values of various parameters to see a change in the responses.
query = "describe an LLM with 5 sentences"
# Switch to defaults
generation_config = GenerationConfig(
temperature=None,
top_p=None,
top_k=None,
max_output_tokens=None,
stop_sequences=None
)
llm = GenerativeModel(model,generation_config=generation_config)
# print response with defaults
response = llm.generate_content([query])
response_text = response.candidates[0].content.parts[0].text
print(response_text)