v2.3.2

USER_MANUAL

Complete guide to installing and using AudioBash on Windows and macOS.

AudioBash main window with voice input and quick navigation

01 INTRODUCTION

AudioBash is a voice-controlled terminal application designed to work seamlessly with Claude Code. Instead of typing commands, you can speak them directly into your terminal. The application transcribes your voice input using AI-powered speech recognition and executes commands in a real terminal environment.

Key features

Push-to-talk voice input - Hold a hotkey to record, release to transcribe and execute
Multi-provider transcription - Choose from Gemini, OpenAI Whisper, Claude, or Groq
Real terminal environment - Full PTY (pseudo-terminal) with shell access
Agent mode - AI-powered command generation for complex tasks
Cross-platform - Works on Windows 10/11 and macOS (Intel & Apple Silicon)

02 INSTALLATION

Windows installation

System requirements

Windows 10 or Windows 11
4GB RAM minimum (8GB recommended)
200MB disk space
Microphone for voice input

Installation steps

1
Download the installer
Visit the releases page and download AudioBash.Setup.2.1.1.exe
2
Run the installer
Double-click the .exe file. If Windows SmartScreen appears, click "More info" then "Run anyway"
3
Launch AudioBash
Find AudioBash in your Start Menu or use the desktop shortcut
4
Grant microphone access
Windows will prompt for permission on first use - click "Yes" to allow

macOS installation

System requirements

macOS 11 (Big Sur) or later
Apple Silicon (M1/M2/M3/M4) or Intel processor
4GB RAM minimum (8GB recommended)
200MB disk space
Microphone for voice input

Installation steps

1
Download the DMG
Visit the releases page and download:
- Apple Silicon (M1/M2/M3/M4): AudioBash-2.1.1-arm64.dmg
- Intel Macs: AudioBash-2.1.1.dmg
2
Install the app
Double-click the DMG, drag AudioBash.app to Applications, then eject the DMG
3
First launch (important!)
Because AudioBash is not notarized by Apple, you must bypass Gatekeeper:

Method 1 - Right-click (recommended)

Open Finder → Applications → Right-click AudioBash.app → Select "Open" → Click "Open" in the dialog
Method 2 - Terminal command
```
xattr -cr /Applications/AudioBash.app
```
4
Grant permissions
macOS will request:
- Microphone access: System Settings → Privacy & Security → Microphone
- Accessibility access: System Settings → Privacy & Security → Accessibility (required for global hotkeys)

03 FIRST-TIME SETUP

Step 1: Choose your transcription provider

AudioBash supports multiple AI providers for speech-to-text:

Provider	Best for	Get API key
Gemini (recommended)	Fast, accurate, free tier	aistudio.google.com
ElevenLabs Scribe v2	Real-time (~150ms), VAD auto-commit	elevenlabs.io
OpenAI Whisper	High accuracy	platform.openai.com
Claude	Anthropic ecosystem	console.anthropic.com
Groq	Very fast	console.groq.com

Step 2: Enter your API key

1. Click the gear icon in the top-right corner
2. Select your transcription provider from the dropdown
3. Paste your API key in the text field
4. Click "Save"

Step 3: Test voice recording

1. Press Option+S (Mac) or Alt+S (Windows) to start recording
2. Say a simple command: "list files in current directory"
3. Press the hotkey again to stop
4. Watch the transcription appear in the terminal

04 USING AUDIOBASH

Keyboard shortcuts

Action	Windows	macOS
Start/stop recording	`Alt+S`	`Option+S`
Cancel recording	`Alt+A`	`Option+A`
Toggle raw/agent mode	`Alt+M`	`Option+M`
Show/hide window	`Alt+H`	`Option+H`
Clear terminal	`Alt+C`	`Option+C`
Cycle layouts	`Alt+L`	`Option+L`
New tab	`Ctrl+T`	`Cmd+T`
Close tab	`Ctrl+W`	`Cmd+W`

Voice recording

AudioBash uses a push-to-talk model:

1. Press the hotkey to start recording
2. Speak your command clearly
3. Release (or press again) to stop and transcribe

Voice input panel with waveform visualization and RAW/AGENT mode toggle

Recording states

🔴

Recording

🟡

Processing

🟢

Ready

⚪

Disabled

Terminal modes

AudioBash has two modes for handling voice input:

Raw mode (default)

Transcribed text is sent directly to the terminal. What you say is exactly what gets typed.

Best for: Simple commands, when you know exactly what to type

Say: "cd projects"
Types: cd projects

Agent mode

An AI interprets your intent and generates the appropriate command. Handles natural language requests.

Best for: Complex tasks, when you're unsure of exact syntax

Say: "show JavaScript files modified this week"
Generates: find . -name "*.js" -mtime -7

Toggle between modes with Alt+M (Windows) or Option+M (Mac).

Multi-tab interface

AudioBash supports multiple terminal tabs:

New tab: Click the + button or press Ctrl/Cmd+T
Switch tabs: Click on tabs or use Ctrl+Tab
Close tab: Click the × on the tab or press Ctrl/Cmd+W
Rename tab: Double-click the tab title

Each tab maintains its own working directory, command history, and shell session.

Quick navigation with favorites and recent directories

05 SETTINGS

Access settings by clicking the gear icon in the top-right corner.

API keys and visual themes

Transcription model selection

Custom instructions and vocabulary

All keyboard shortcuts

API keys

Setting	Purpose
Transcription API key	Speech-to-text (Gemini, OpenAI, etc.)
Claude API key	Agent mode intelligence
ElevenLabs API key	Text-to-speech responses

Transcription providers

Gemini 2.0 Flash - Google's fast, accurate model (recommended)
ElevenLabs Scribe v2 - Real-time WebSocket streaming (~150ms latency)
OpenAI Whisper - Industry-standard accuracy
Groq Whisper - Ultra-fast processing
Claude - Uses Anthropic's model

Mobile remote control

Control AudioBash from your phone using the web-based remote interface:

1. Open Settings and scroll to "Mobile Remote Control"
2. Set a password for secure remote access
3. On your phone, visit remote.audiobash.app
4. Enter your tunnel URL and password to connect
5. Use voice commands or type text directly

For local network only: use the IP-based WebSocket URL shown in Settings (requires same WiFi network)

06 TROUBLESHOOTING

Windows issues

"Windows protected your PC" SmartScreen warning

Click "More info" → "Run anyway". This appears because the app isn't signed with an expensive certificate.

Global shortcuts not working

Check for conflicts with other apps, try running AudioBash as Administrator, or restart the app.

macOS issues

"AudioBash is damaged and can't be opened"

This is Gatekeeper blocking an unsigned app. Fix with:

xattr -cr /Applications/AudioBash.app

Global shortcuts (Option+S) not working

Go to System Settings → Privacy & Security → Accessibility. Find AudioBash, toggle it off and on, then restart the app.

General issues

"API key is invalid" error

Double-check you copied the entire key, verify it's for the correct provider, and check that billing is set up if required.

Transcription is inaccurate

Speak more slowly and clearly, reduce background noise, try a different provider, or check your internet connection.

07 FAQ

Is AudioBash free?

Yes! AudioBash is free and open source. The AI transcription services require API keys, which may have usage costs depending on the provider. Gemini offers a generous free tier.

Does AudioBash work offline?

No. Voice transcription requires internet connectivity to reach the AI provider APIs.

Can I use AudioBash with any shell?

Yes. On Windows, it defaults to PowerShell. On macOS, it uses your default shell (usually zsh or bash). You can start any shell by typing its name.

Can I use AudioBash with Claude Code CLI?

Absolutely! That's exactly what it's designed for. Just type claude in the terminal to start Claude Code, then use voice commands to interact with it.

Where are settings stored?

Windows: %APPDATA%\AudioBash\
macOS: ~/Library/Application Support/AudioBash/

How do I report bugs or request features?

Visit github.com/jamditis/audiobash/issues

How is it connected to the terminal? Is it linked to external terminal windows 1, 2, 3, etc.?

No, AudioBash doesn't link to external terminal windows. It has its own embedded terminal built in. The app uses xterm.js for the terminal display and node-pty to spawn real shell processes (PowerShell on Windows, zsh/bash on macOS). When you speak, the transcribed text is sent directly to the active terminal tab within the app. Each tab runs its own independent shell session - there's no connection to Terminal.app, Windows Terminal, or any other external terminal program.

AudioBash v2.3.2 - Voice-controlled terminal for Claude Code

About GitHub Issues @jamditis