Speech-to-Text

Listen to what the user says with session.transcription. Get real-time speech-to-text from the glasses microphone, with support for multiple languages and speaker diarization.

Basic Usage

session.transcription.on((data) => {
  if (data.isFinal) {
    session.display.showTextWall(data.text);
  }
});

Every call to .on() returns a cleanup function. Call it when you want to stop receiving transcriptions:

const cleanup = session.transcription.on((data) => {
  session.display.showTextWall(data.text);
});

// Later, when you're done:
cleanup();

How It Works

User speaks into the glasses microphone
Audio streams to MentraOS Cloud
Cloud runs speech recognition (via Soniox)
Transcription events are sent to your app
You receive interim results (while speaking) and final results (when a sentence completes)

Transcription Data

Each event gives you a TranscriptionData object:

session.transcription.on((data) => {
  data.text;          // The transcribed text
  data.isFinal;       // true when the sentence is complete
  data.language;      // Language code (e.g. "en")
  data.speakerId;     // Speaker ID when diarization is enabled
  data.utteranceId;   // Unique ID for this utterance
});

Interim vs Final Results

While the user is speaking, you receive interim results that update in real time. When the user finishes a sentence, you get a final result.

session.transcription.on((data) => {
  if (data.isFinal) {
    // Sentence complete - safe to process, save, or act on
    session.display.showTextWall(data.text);
  } else {
    // Still speaking - text may change as more audio arrives
    session.display.showTextWall(`${data.text}...`);
  }
});

A common pattern for live captions is to show both interim and final results:

session.transcription.on((data) => {
  session.display.showTextWall(data.text);
});

Language-Specific Subscriptions

Subscribe to transcriptions in a specific language:

const cleanup = session.transcription.forLanguage("en", (data) => {
  // Only English transcriptions
  session.display.showTextWall(data.text);
});

You can subscribe to multiple languages simultaneously. Each call is independent and returns its own cleanup function:

const cleanupEn = session.transcription.forLanguage("en", (data) => {
  session.display.showText(`EN: ${data.text}`);
});

const cleanupEs = session.transcription.forLanguage("es", (data) => {
  session.display.showText(`ES: ${data.text}`);
});

Configuration

Configure transcription behavior before or after subscribing:

session.transcription.configure({
  languageHints: ["en", "es"],   // Help the recognizer with expected languages
  diarization: true,              // Enable speaker identification
});

Language Hints

Language hints tell the speech recognizer which languages to expect. This improves accuracy, especially in multilingual environments.

// Expect English and Spanish
session.transcription.configure({
  languageHints: ["en", "es"],
});

If no hints are set, the recognizer uses automatic language detection.

Speaker Diarization

When diarization is enabled, each transcription event includes a speakerId that identifies which person is speaking. This is useful for meeting transcription, group conversations, and any scenario with multiple speakers.

session.transcription.configure({ diarization: true });

session.transcription.on((data) => {
  if (data.isFinal) {
    const speaker = data.speakerId || "unknown";
    session.display.showTextWall(`[${speaker}]: ${data.text}`);
  }
});

Stopping Transcription

Call the cleanup function returned by .on() or .forLanguage():

const cleanup = session.transcription.on((data) => { ... });

// Stop receiving transcriptions
cleanup();

Or stop all transcription streams:

session.transcription.stop();

Permissions

Your app needs the microphone permission to receive transcriptions. Add it in the Developer Console when creating or editing your app. Without microphone permission, session.transcription.on() will not receive any data.

Complete Example

A simple live captions app:

import { MiniAppServer, type MentraSession } from "@mentra/sdk";

const app = new MiniAppServer({
  packageName: "com.example.captions",
  apiKey: process.env.API_KEY!,
  port: 3000,
});

app.onSession((session: MentraSession) => {
  session.transcription.configure({
    languageHints: ["en"],
    diarization: true,
  });

  session.transcription.on((data) => {
    session.display.showTextWall(data.text);
  });

  session.display.showTextWall("Captions ready - start speaking");
});

await app.start();

Migrating from v2

// v2
session.events.onTranscription((data) => { ... });

// v3
session.transcription.on((data) => { ... });

The TranscriptionData object shape is the same. Only the access pattern changed. See the Migration Guide for the full list of changes.

Getting Started

v3 (SDK 3.x)

v2 (Legacy)

Basic Usage

How It Works

Transcription Data

Interim vs Final Results

Language-Specific Subscriptions

Configuration

Language Hints

Speaker Diarization

Stopping Transcription

Permissions

Complete Example

Migrating from v2

Getting Started

v3 (SDK 3.x)

v2 (Legacy)

​Basic Usage

​How It Works

​Transcription Data

​Interim vs Final Results

​Language-Specific Subscriptions

​Configuration

​Language Hints

​Speaker Diarization

​Stopping Transcription

​Permissions

​Complete Example

​Migrating from v2

Basic Usage

How It Works

Transcription Data

Interim vs Final Results

Language-Specific Subscriptions

Configuration

Language Hints

Speaker Diarization

Stopping Transcription

Permissions

Complete Example

Migrating from v2