Extracting Provider Metrics

To send accurate telemetry to the OTM API, you need to grab real-time metrics (like prompt tokens, completion tokens, and timing) directly from your LLM provider’s response.

Every SDK hides these metrics in slightly different places. This guide shows you exactly where to find them so you don’t have to spend hours digging through documentation or console logging response objects.

Pro Tip: Always capture the startTime right before you make the API call and firstTokenTime when you receive the first chunk if you’re streaming. This ensures your TTFT (Time To First Token) metrics are spot on.

OpenAI

OpenAI makes it relatively easy to get token usage, provided you aren’t streaming. For streaming, you’ll need to look at the final chunk.

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
 
// Here is the gold mine:
const { prompt_tokens, completion_tokens, total_tokens } = response.usage;
 
console.log(`Used ${prompt_tokens} input and ${completion_tokens} output tokens.`);

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role": "user", "content": "Hello!"}]
)
 
# Accessing usage metadata
usage = completion.usage
input_tokens = usage.prompt_tokens
output_tokens = usage.completion_tokens
 
print(f"Tokens: {input_tokens} in / {output_tokens} out")

resp, err := client.Chat.Completions.New(ctx, openai.ChatCompletionNewParams{
    Model: openai.F(openai.ChatModelGPT4o),
    Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
        openai.UserMessage("Hello!"),
    }),
})
 
// Grab the usage block
if resp.Usage != nil {
    input := resp.Usage.PromptTokens
    output := resp.Usage.CompletionTokens
    fmt.Printf("Usage: %d in, %d out\n", input, output)
}

Anthropic

Anthropic calls their usage fields input_tokens and output_tokens. If you use prompt caching, they also provide cache_creation_input_tokens and cache_read_input_tokens.

const message = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20240620",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hi Claude!" }],
});
 
// Standard usage extraction
const { input_tokens, output_tokens } = message.usage;
 
// If you use caching, keep an eye on these too:
const cacheRead = message.usage.cache_read_input_tokens || 0;

message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi Claude!"}]
)
 
# Usage is a simple attribute
input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens
 
print(f"Input: {input_tokens}, Output: {output_tokens}")

resp, err := client.Messages.New(ctx, anthropic.MessageNewParams{
    Model:     anthropic.F(anthropic.ModelClaude3_5Sonnet),
    MaxTokens: anthropic.Int(1024),
    Messages:  anthropic.F([]anthropic.MessageParam{
        anthropic.NewUserMessage(anthropic.NewTextBlock("Hi Claude!")),
    }),
})
 
if err == nil {
    fmt.Printf("Tokens: %d input / %d output\n",
        resp.Usage.InputTokens, resp.Usage.OutputTokens)
}

Google Gemini

Google wraps their usage in a usageMetadata object (CamelCase in JS/Go, snake_case in Python).

const result = await model.generateContent("Explain telemetry.");
const response = await result.response;
 
// Gemini calls them promptTokenCount and candidatesTokenCount
const { promptTokenCount, candidatesTokenCount } = response.usageMetadata;
 
console.log(`Prompt: ${promptTokenCount}, Output: ${candidatesTokenCount}`);

response = model.generate_content("Explain telemetry.")
 
# Accessing the usage_metadata
usage = response.usage_metadata
prompt_tokens = usage.prompt_token_count
candidate_tokens = usage.candidates_token_count
 
print(f"Usage: {prompt_tokens} prompt / {candidate_tokens} response")

resp, err := client.Models.GenerateContent(ctx, "gemini-1.5-flash", contents, nil)
 
if err == nil && resp.UsageMetadata != nil {
    fmt.Printf("Tokens - Prompt: %d, Candidates: %d\n", 
        resp.UsageMetadata.PromptTokenCount, 
        resp.UsageMetadata.CandidatesTokenCount)
}

Next Steps

Now that you’ve got the raw numbers, you’re ready to format them for the Telemetry Usage API. Just map these values to the tokens.input and tokens.output fields in your payload, and you’re good to go!

Telemetry Usage Security