OpenAI Non-Streaming
OpenAI Non-Streaming are requests made to the OpenAI API where the entire response is delivered in a single payload rather than in a series of streamed chunks. For these non-streaming requests, OpenAI provides ausage
tag in the response, which includes data such as the number of prompt tokens, completion tokens, and total tokens used.
Here is an example of how the usage
tag might look in a response:
OpenAI Streaming
To calculate cost using OpenAI streaming please look at enabling the stream usage flag docsAnthropic Requests
In the case of Anthropic requests, there is no supported method for calculating tokens in Typescript. So, we have to manually calculate the tokens using a Python server. For more discussion and details on this topic, see our comments in this thread: https://github.com/anthropics/anthropic-sdk-typescript/issues/16Developer
For a detailed look at how we calculate LLM costs, please follow this link: https://github.com/Helicone/helicone/tree/main/costsIf you want to calculate costs across models and providers, you can use our
free, open-source tool with 300+ models: LLM API Pricing
Calculator
Please note that these methods are based on our current understanding and may
be subject to changes in the future as APIs and token counting methodologies
evolve.
Need more help?
Need more help?
Additional questions or feedback? Reach out to
help@helicone.ai or schedule a
call with us.