Chat completions
POST/v1/chat/completions
Given a list of messages forming a conversation, the model generates a response. See available models at this pricing table.
Request
Header Parameters
ID of a team to run request as.
- application/json
Body
- Array [
- ]
Code of the model to use. See available model list.
messages object[]required
A list of messages comprising the conversation so far.
The contents of the message.
The role of the messages author.
Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.
Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text.
The maximum number of tokens to generate. For decoder-only models like GPT, the length of your input tokens plus max_tokens
should not exceed the model's maximum length (e.g., 2048 for OpenAI GPT-3). For encoder-decoder models like T5 or BlenderBot, max_tokens
should not exceed the model's maximum output length. This is similar to Hugging Face's max_new_tokens
argument.
The number of independently generated results for the prompt. Not supported when using beam search. Defaults to 1. This is similar to Hugging Face's num_return_sequences
argument.
When one of the stop phrases appears in the generation result, the API will stop generation. The stop phrases are excluded from the result. Defaults to empty list.
Whether to stream generation result. When set true, each token will be sent as server-sent events once generated.
Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1
) sampling. defaults to 1.0. this is similar to hugging face's temperature
argument.
Tokens comprising the top top_p
probability mass are kept for sampling. Numbers between 0.0 (exclusive) and 1.0 (inclusive) are allowed. Defaults to 1.0. This is similar to Hugging Face's top_p
argument.
Request timeout. Gives the HTTP 429 Too Many Requests
response status code. Default behavior is no timeout.
Responses
- 200
Successfully generated a chat response. When streaming mode is used (i.e., stream
option is set to true
), the response is in MIME type text/event-stream
. Otherwise, the content type is application/json
.
- application/json
- text/event-stream
- Schema
- Example (No Streaming)
Schema
- Array [
- ]
choices object[]
The index of the choice in the list of generated choices.
message object
Role of the generated message author, in this case assistant
.
The contents of the assistant message.
Termination condition of the generation. stop
means the API returned the full chat completion generated by the model without running into any limits. length
means the generation exceeded max_tokens
or the conversation exceeded the max context length.
usage object
Number of tokens in the prompt.
Number of tokens in the generated completion.
Total number of tokens used in the request (prompt_tokens
+ completion_tokens
).
The Unix timestamp (in seconds) for when the generation completed.
No streaming example
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
- Schema
- Example (Streaming)
Schema
- Array [
- ]
choices object[]
The index of the choice in the list of generated choices.
delta object
Role of the generated message author, in this case assistant
.
The contents of the assistant message.
Termination condition of the generation. stop
means the API returned the full chat completion generated by the model without running into any limits. length
means the generation exceeded max_tokens
or the conversation exceeded the max context length.
The Unix timestamp (in seconds) for when the token sampled.
Streaming example
data: {"created":1694268190,"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"created":1694268190,"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"created":1694268190,"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
....
data: {"created":1694268190,"choices":[{"index":0,"delta":{"content":" today"},"finish_reason":null}]}
data: {"created":1694268190,"choices":[{"index":0,"delta":{"content":"?"},"finish_reason":null}]}
data: {"created":1694268190,"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
[DONE]