post https://api.platform.a15t.com/v1/chat/completions
Guide: Chat Completions
This guide explains how to use the chat completions API and interpret the gateway information found under platform_extensions
.
Retry information
Responses include a platform_extensions
object from the gateway. It contains a
routing_results
object describing routing attempts as has the following fields.
latency
- total latency across all attempts in milliseconds.private_endpoint_enabled
- whether a private endpoint was used.retry_info
- Nested retry information
The retry_info
object has the following fields:
retry_count
- total number of retries attempted.fallback_model
- the model ultimately used after retries.retries
- array describing each attempt. Each entry contains:index
- the order of the attempt starting at 0.model
- model that was called for that attempt.code
- error code that triggered the retry.message
- text describing the failure.latency
- how long the attempt took in milliseconds.
Each element in retries
corresponds to a routing attempt that failed before the final successful request.
Example response
{
"id": "chatcmpl-abc",
"choices": [
{
"message": {
"role": "assistant",
"content": "Hello!"
}
}
],
"platform_extensions": {
"routing_results": {
"private_endpoint_enabled": false,
"latency": 1717,
"retry_info": {
"retry_count": 2,
"fallback_model": "openai/gpt-4o-2024-11-20",
"retries": [
{
"index": 0,
"model": "anthropic/claude-3-sonnet",
"code": 503,
"message": "Service unavailable",
"latency": 812
},
{
"index": 1,
"model": "anthropic/claude-3-sonnet",
"code": 429,
"message": "Rate limit exceeded",
"latency": 905
}
]
}
}
}
}
Parsing retry information in Python
response = client.chat.completions.create(
model="anthropic/claude-3-sonnet",
messages=[{"role": "user", "content": "Hello"}]
)
resp_info = response.platform_extensions
routing_results = resp_info.get("routing_results", {})
print("total latency", routing_results.get("latency"))
print("private endpoint", routing_results.get("private_endpoint_enabled"))
retry_info = routing_results.get("retry_info", {})
retry_count = retry_info.get("retry_count", 0)
fallback = retry_info.get("fallback_model")
for attempt in retry_info.get("retries", []):
print(
attempt["index"],
attempt["model"],
attempt["code"],
attempt["message"],
attempt["latency"],
)