Read selected macOS text aloud from Raycast using MiniMax Speech 2.8.
MiniMax is a strong fit for a modern Raycast reading workflow:
The goal is not just to "play audio", but to make 1,000-5,000 character selections pleasant to listen to: quick start, voice selection, speed control, chunk-level resume, stop/restart, menu-bar playback status, and custom or cloned voices.
MiniMax documents two valid ways for Token Plan users to access speech generation:
mmx-cli for a ready-made terminal and agent workflowThis extension intentionally uses the direct HTTP API instead of requiring mmx-cli:
mmx dependency for Raycast users.So the design choice here is not "Token Plan versus API". It is "CLI wrapper versus direct API integration". For a Raycast extension, direct API integration is the better fit.
The CLI is still useful for local setup and smoke tests:
mmx auth status
mmx speech synthesize --text "测试。" --voice "Chinese (Mandarin)_Radio_Host" --out test.mp3
mmx speech voices --language chinese --output json
MiniMax currently exposes two account-side key types for this workflow:
For TTS, these are not two different wire protocols. Both use the same MiniMax HTTP API and the same Authorization: Bearer <API Key> scheme. The important difference is which key type you create in the MiniMax console and how usage is billed or quota-limited.
This extension now supports both explicitly in Raycast preferences:
Auto Detect, Token Plan Key, or Open Platform API KeyAuto Detect prefers the Token Plan key for HD speech models and automatically uses the Open Platform API key when you choose a Turbo speech model.
MiniMax's current Token Plan docs say Token Plan supports TTS HD models, specifically speech-2.8-hd, speech-2.6-hd, and speech-02-hd. Turbo speech models should be used with the Open Platform API key.
afplay process; surfaces a "Resume Last Reading" action when nothing is playing but a paused session exists.Synth N/M / Play N/M or paused position with Stop / Resume / Restart / Speed Up / Slow Down / Read / Pick Voice controls.api.minimaxi.com) and Global endpoint (api.minimax.io).api.minimaxi.com endpoint.MiniMax notes that Token Plan keys are separate from pay-as-you-go keys and cannot be used interchangeably. Use the key type that matches your account and choose the matching region in Raycast.
Open the extension preferences in Raycast and set:
| Setting | Description |
|---|---|
| Authentication Mode | Auto detect, Token Plan, or Open Platform API key |
| Token Plan Key | Key created from Create Token Plan Key; Token Plan currently supports HD speech models only |
| Open Platform API Key | Key created from Create new secret key |
| Region | China or Global API endpoint |
| Model | HD models work with Token Plan; Turbo models require Open Platform API Key |
| Default Voice | Built-in quick-read voice |
| Default Custom Voice ID | Optional cloned/generated voice ID; overrides Default Voice and is tagged Default in pickers |
| Extra Custom Voice IDs | Comma-separated cloned/generated voice IDs to surface in voice pickers (tagged Unverified until MiniMax acknowledges them) |
| Language Boost | auto, Chinese, English, etc. |
| Speech Rate | 0.5× to 2.0× |
The voice picked from "Read with Voice Selection" is stored as a local Quick Read override and takes precedence over the static Default Voice preference.
If you keep both key types configured, Auto Detect uses the right key for the selected speech model. If you force Token Plan Key mode while a Turbo model is selected, the extension stops and shows a configuration error instead of sending an invalid request.
If a cloned/generated voice does not appear in MiniMax's voice lookup response, add its voice_id to Default Custom Voice ID or Extra Custom Voice IDs in preferences. The extension sends that ID directly as voice_setting.voice_id. Manually added IDs surface at the top of the voice pickers under a Custom section and carry an Unverified tag until MiniMax's voice lookup confirms them.
mp3, m4a, or wav format.voice_id and a preview text sample.voice_id, or set it as the Quick Read voice.MiniMax's current docs say the source clone audio should be 10 seconds to 5 minutes and under 20 MB. Optional prompt audio should be under 8 seconds and under 20 MB. If the extension is effectively using Token Plan, the clone form only shows HD-compatible preview models.
This is designed for short papers, article excerpts, documentation pages, and other medium-length selections rather than full audiobook production.
Recommended Mandarin voices for paper listening:
Chinese (Mandarin)_Radio_Host: relaxed long-form host tone, now the built-in default for new installs.Chinese (Mandarin)_Sincere_Adult: sincere peer-style explanation.Chinese (Mandarin)_Gentleman: warmer, more scholarly mentor tone.hunyin_6: bright, brisk male voice for a more energetic paper walkthrough.male-qn-jingying: clear younger professional voice.Chinese (Mandarin)_Wise_Women: knowledgeable female voice for a senior-guide tone.Chinese (Mandarin)_Gentle_Senior: warm storytelling female voice for soft lecture-style listening.Chinese (Mandarin)_Warm_Bestie: soft, clear, comforting female voice for relaxed listening.Chinese_sweet_girl_vv1: bright, expressive young female voice for lighter notes and short passages.Recommended English voices for paper listening:
English_CalmWoman: warm, clear, guided voice for audiobooks, documentaries, and education.English_captivating_female1: bright, clear, energetic voice for explainers and knowledge sharing.English_AttractiveGirl: natural, conversational voice for lighter articles and notes.English_nursery_teacher_vv2: sweet, clear, encouraging voice for language learning and short teaching passages.After setup:
Language Boost on auto for mixed Chinese/English text, or set it manually when a document is mostly one language.npm install
npm run dev
npm run build
npm run lint
POST /v1/t2a_v2POST /v1/get_voicePOST /v1/files/uploadPOST /v1/voice_cloneAuthorization: Bearer <API Key> for both Token Plan keys and Open Platform API keysspeech-2.8-hd, speech-2.6-hd, speech-02-hd)afplay$TMPDIR/minimax-tts.pid