# Rate Limits

Understanding API rate limits and how to manage them effectively.

***

## Current Limits

| Resource                      | Limit     |
| ----------------------------- | --------- |
| **Requests per minute (RPM)** | 60        |
| **Tokens per minute (TPM)**   | 2,000,000 |

These limits apply **per API key**. Each key you create has its own independent rate limit.

***

## Need Higher Limits?

Our standard limits support most development and production workloads. We're committed to supporting growing projects while ensuring platform stability for all users.

**For enterprise workloads requiring higher limits:**

📧 **Contact us**: <support@tensorix.ai>

Include in your request:

* Your use case and expected scale
* Current bottlenecks you're experiencing
* Your account email

{% hint style="info" %}
**Enterprise clients** can receive custom rate limits tailored to their workload. We'll work with you to find the right balance for your needs.
{% endhint %}

***

## What Happens When You Hit a Limit

When you exceed rate limits, the API returns a `429 Too Many Requests` error:

```json
{
  "error": {
    "message": "Rate limit exceeded for api_key: xxx...xxx. Limit type: requests. Current limit: 60, Remaining: 0. Limit resets at: 2026-03-21 20:38:07 UTC",
    "type": "None",
    "param": "None",
    "code": "429"
  }
}
```

The error message includes:

* Which limit you hit (requests or tokens)
* Your current limit and remaining count
* When the limit resets

***

## Checking Your Rate Limit Status

Every API response includes headers showing your current usage:

| Header                                   | Example Value | Description                    |
| ---------------------------------------- | ------------- | ------------------------------ |
| `x-ratelimit-api_key-limit-requests`     | 60            | Max requests per minute        |
| `x-ratelimit-api_key-remaining-requests` | 45            | Requests remaining this minute |
| `x-ratelimit-api_key-limit-tokens`       | 2000000       | Max tokens per minute          |
| `x-ratelimit-api_key-remaining-tokens`   | 1850000       | Tokens remaining this minute   |

Check these headers to monitor your usage before hitting limits.

***

## Handling Rate Limits

### Retry with Exponential Backoff

The best practice is to retry with increasing delays:

{% tabs %}
{% tab title="Python" %}

```python
import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tensorix.ai/v1"
)

def call_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="deepseek/deepseek-chat-v3.1",
                messages=messages
            )
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
async function callWithRetry(messages, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'deepseek/deepseek-chat-v3.1',
        messages
      });
    } catch (error) {
      if (error.status !== 429 || attempt === maxRetries - 1) {
        throw error;
      }
      const waitTime = Math.pow(2, attempt) * 1000;
      console.log(`Rate limited. Waiting ${waitTime}ms...`);
      await new Promise(r => setTimeout(r, waitTime));
    }
  }
}
```

{% endtab %}
{% endtabs %}

### Spread Out Requests

If you're making many requests, add small delays between them:

```python
import time

for item in items:
    response = make_api_call(item)
    time.sleep(1)  # Wait 1 second between requests
```

***

## Tips to Stay Under Limits

| Tip                             | How It Helps                        |
| ------------------------------- | ----------------------------------- |
| **Batch similar requests**      | Fewer API calls                     |
| **Cache responses**             | Don't repeat identical queries      |
| **Use streaming**               | One request for long outputs        |
| **Set appropriate max\_tokens** | Avoid generating unnecessary tokens |
| **Queue requests**              | Smooth out traffic spikes           |

***

## Monitoring Your Usage

Check your request patterns in your [Usage Dashboard](https://app.tensorix.ai/dashboard/usage):

* See request counts over time
* Identify peak usage periods
* Spot patterns that might cause rate limiting


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorix.ai/api-reference/rate-limits.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
