# Audio API

Tensorix provides OpenAI-compatible audio endpoints for both Text-to-Speech (TTS) and Speech-to-Text (STT) transcription.

## Available Models

| Model                             | Type | Description                            |
| --------------------------------- | ---- | -------------------------------------- |
| `chatterbox-turbo`                | TTS  | High-quality text-to-speech generation |
| `Systran/faster-whisper-large-v3` | STT  | Fast, accurate speech transcription    |

## Text-to-Speech (TTS)

Convert text into natural-sounding audio using the `chatterbox-turbo` model.

### Endpoint

```
POST https://api.tensorix.ai/v1/audio/speech
```

### Basic Usage

```bash
curl https://api.tensorix.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatterbox-turbo",
    "input": "Hello! Welcome to Tensorix.",
    "voice": "Emily.wav"
  }' \
  --output speech.mp3
```

### Parameters

| Parameter         | Type   | Required | Description                                                                                                                                         |
| ----------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`           | string | Yes      | Model ID: `chatterbox-turbo`                                                                                                                        |
| `input`           | string | Yes      | Text to convert to speech. For very long inputs, split into smaller chunks and concatenate the resulting audio.                                     |
| `voice`           | string | Yes      | Voice ID, using the filename of one of the predefined voices below (e.g. `Emily.wav`). OpenAI-style names like `alloy` or `nova` are not supported. |
| `response_format` | string | No       | Audio format: `mp3`, `wav`, `opus` (default: `mp3`)                                                                                                 |
| `speed`           | number | No       | Playback speed multiplier (default: 1.0). Values around 0.5 to 2.0 give the most natural results.                                                   |

### Available Voices

`chatterbox-turbo` ships with 28 predefined English voices. Pass the filename (including the `.wav` suffix) as the `voice` parameter:

|               |                |                 |               |
| ------------- | -------------- | --------------- | ------------- |
| `Abigail.wav` | `Adrian.wav`   | `Alexander.wav` | `Alice.wav`   |
| `Austin.wav`  | `Axel.wav`     | `Connor.wav`    | `Cora.wav`    |
| `Elena.wav`   | `Eli.wav`      | `Emily.wav`     | `Everett.wav` |
| `Gabriel.wav` | `Gianna.wav`   | `Henry.wav`     | `Ian.wav`     |
| `Jade.wav`    | `Jeremiah.wav` | `Jordan.wav`    | `Julian.wav`  |
| `Layla.wav`   | `Leonardo.wav` | `Michael.wav`   | `Miles.wav`   |
| `Olivia.wav`  | `Ryan.wav`     | `Taylor.wav`    | `Thomas.wav`  |

The TTS engine is a hosted instance of the open-source [Chatterbox-TTS-Server](https://github.com/devnen/Chatterbox-TTS-Server) project, which is in turn built on [Resemble AI's Chatterbox](https://github.com/resemble-ai/chatterbox) model. See the upstream documentation for details on the voices, voice characteristics, and how voice cloning works.

### Python Example

```python
import requests

response = requests.post(
    "https://api.tensorix.ai/v1/audio/speech",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "chatterbox-turbo",
        "input": "Welcome to Tensorix! This is a test of our text-to-speech API.",
        "voice": "Emily.wav"
    }
)

# Save audio file
with open("output.mp3", "wb") as f:
    f.write(response.content)

print(f"Audio saved: {len(response.content)} bytes")
```

### JavaScript/Node.js Example

```javascript
const fs = require('fs');

async function textToSpeech(text) {
  const response = await fetch('https://api.tensorix.ai/v1/audio/speech', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'chatterbox-turbo',
      input: text,
      voice: 'Emily.wav'
    })
  });

  const buffer = await response.arrayBuffer();
  fs.writeFileSync('output.mp3', Buffer.from(buffer));
  console.log('Audio saved!');
}

textToSpeech('Hello from Tensorix!');
```

### Pricing

**TTS Cost**: $0.000015 per character

| Text Length       | Cost    |
| ----------------- | ------- |
| 100 characters    | $0.0015 |
| 1,000 characters  | $0.015  |
| 10,000 characters | $0.15   |

***

## Speech-to-Text (STT)

Transcribe audio files to text using the `Systran/faster-whisper-large-v3` model.

### Endpoint

```
POST https://api.tensorix.ai/v1/audio/transcriptions
```

### Basic Usage

```bash
curl https://api.tensorix.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file="@audio.mp3" \
  -F model="Systran/faster-whisper-large-v3"
```

### Parameters

| Parameter                 | Type   | Required | Description                                                                   |
| ------------------------- | ------ | -------- | ----------------------------------------------------------------------------- |
| `file`                    | file   | Yes      | Audio file to transcribe (mp3, mp4, mpeg, mpga, m4a, wav, webm)               |
| `model`                   | string | Yes      | Model ID: `Systran/faster-whisper-large-v3`                                   |
| `language`                | string | No       | Language code (e.g., `en`, `es`, `fr`). Auto-detected if not specified        |
| `response_format`         | string | No       | Output format: `json`, `text`, `srt`, `vtt`, `verbose_json` (default: `json`) |
| `timestamp_granularities` | array  | No       | `["word"]` or `["segment"]` for timestamps                                    |

### Response

```json
{
  "text": "Hello, this is a test of the Tensorix Audio Gateway.",
  "language": "en",
  "duration": 5.44,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test"
    },
    {
      "start": 2.5,
      "end": 5.44,
      "text": "of the Tensorix Audio Gateway."
    }
  ]
}
```

### Python Example

```python
import requests

# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
    response = requests.post(
        "https://api.tensorix.ai/v1/audio/transcriptions",
        headers={
            "Authorization": "Bearer YOUR_API_KEY"
        },
        files={
            "file": audio_file
        },
        data={
            "model": "Systran/faster-whisper-large-v3"
        }
    )

result = response.json()
print(f"Transcription: {result['text']}")
print(f"Language: {result.get('language', 'auto')}")
print(f"Duration: {result.get('duration', 'N/A')} seconds")
```

### JavaScript/Node.js Example

```javascript
const fs = require('fs');
const FormData = require('form-data');

async function transcribeAudio(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('model', 'Systran/faster-whisper-large-v3');

  const response = await fetch('https://api.tensorix.ai/v1/audio/transcriptions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: form
  });

  const result = await response.json();
  console.log('Transcription:', result.text);
  return result;
}

transcribeAudio('audio.mp3');
```

### Get Timestamps

Request word or segment-level timestamps:

```bash
curl https://api.tensorix.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file="@audio.mp3" \
  -F model="Systran/faster-whisper-large-v3" \
  -F response_format="verbose_json" \
  -F 'timestamp_granularities[]=segment'
```

### Pricing

**STT Cost**: $0.0001 per second of audio

| Audio Duration | Cost   |
| -------------- | ------ |
| 1 minute       | $0.006 |
| 10 minutes     | $0.06  |
| 1 hour         | $0.36  |

***

## Supported Audio Formats

Both TTS and STT support the following audio formats:

| Format | Extension        | Notes                          |
| ------ | ---------------- | ------------------------------ |
| MP3    | `.mp3`           | Most common, good compression  |
| WAV    | `.wav`           | Uncompressed, highest quality  |
| M4A    | `.m4a`           | Apple audio format             |
| WEBM   | `.webm`          | Web-optimized                  |
| MPEG   | `.mpeg`, `.mpga` | Legacy format                  |
| MP4    | `.mp4`           | Video format (audio extracted) |
| OGG    | `.ogg`           | Open source format             |

### File Size Limits

* Maximum file size: **25 MB**
* For larger files, split them into smaller segments

***

## Use Cases

### Voice Assistants

```python
# Complete voice assistant flow
import requests

def voice_assistant(audio_input_path):
    # 1. Transcribe user's speech
    with open(audio_input_path, "rb") as f:
        transcription = requests.post(
            "https://api.tensorix.ai/v1/audio/transcriptions",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            files={"file": f},
            data={"model": "Systran/faster-whisper-large-v3"}
        ).json()
    
    user_text = transcription["text"]
    print(f"User said: {user_text}")
    
    # 2. Get AI response (using chat completions)
    # Use any chat model from /v1/models. `openrouter/openai/gpt-oss-20b`
    # is a fast, low-cost default; swap in another model if you prefer.
    ai_response = requests.post(
        "https://api.tensorix.ai/v1/chat/completions",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "model": "openrouter/openai/gpt-oss-20b",
            "messages": [{"role": "user", "content": user_text}]
        }
    ).json()
    
    assistant_text = ai_response["choices"][0]["message"]["content"]
    print(f"Assistant: {assistant_text}")
    
    # 3. Convert response to speech
    audio = requests.post(
        "https://api.tensorix.ai/v1/audio/speech",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "model": "chatterbox-turbo",
            "input": assistant_text,
            "voice": "Emily.wav"
        }
    )
    
    with open("response.mp3", "wb") as f:
        f.write(audio.content)
    
    return "response.mp3"
```

### Podcast Transcription

```python
def transcribe_podcast(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.tensorix.ai/v1/audio/transcriptions",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            files={"file": f},
            data={
                "model": "Systran/faster-whisper-large-v3",
                "response_format": "srt"  # Get SRT subtitles
            }
        )
    
    # Save as subtitle file
    with open("podcast.srt", "w") as f:
        f.write(response.text)
    
    return "podcast.srt"
```

### Content Narration

```python
def narrate_article(article_text):
    # Split long articles into chunks. There is no hard cap on `input` length,
    # but chunking on sentence boundaries gives more natural pacing and lets
    # you stream playback while later chunks are still being generated.
    max_chars = 4000
    chunks = [article_text[i:i+max_chars] for i in range(0, len(article_text), max_chars)]
    
    audio_parts = []
    for i, chunk in enumerate(chunks):
        response = requests.post(
            "https://api.tensorix.ai/v1/audio/speech",
            headers={
                "Authorization": "Bearer YOUR_API_KEY",
                "Content-Type": "application/json"
            },
            json={
                "model": "chatterbox-turbo",
                "input": chunk,
                "voice": "Emily.wav"
            }
        )
        
        filename = f"part_{i}.mp3"
        with open(filename, "wb") as f:
            f.write(response.content)
        audio_parts.append(filename)
    
    return audio_parts
```

***

## Error Handling

### Common Errors

| Error Code | Description          | Solution                              |
| ---------- | -------------------- | ------------------------------------- |
| 400        | Invalid audio format | Use supported format (mp3, wav, etc.) |
| 400        | File too large       | Split into chunks under 25MB          |
| 401        | Invalid API key      | Check your API key                    |
| 413        | Payload too large    | Reduce file size                      |
| 429        | Rate limit exceeded  | Reduce request frequency              |

### Python Error Handling

```python
import requests

def safe_transcribe(file_path):
    try:
        with open(file_path, "rb") as f:
            response = requests.post(
                "https://api.tensorix.ai/v1/audio/transcriptions",
                headers={"Authorization": "Bearer YOUR_API_KEY"},
                files={"file": f},
                data={"model": "Systran/faster-whisper-large-v3"}
            )
        
        response.raise_for_status()
        return response.json()
    
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e.response.status_code}")
        print(f"Details: {e.response.text}")
        return None
    except FileNotFoundError:
        print(f"File not found: {file_path}")
        return None
```

***

## Resources

* [OpenAI Audio API Reference](https://platform.openai.com/docs/api-reference/audio)
* [Tensorix API Overview](/api-reference/overview.md)
* [Tensorix Models](/api-reference/models.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorix.ai/api-reference/audio.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
