Voice Assistant
🛠️ In ProgressVoice Assistant Using XMini-C3 Board
Required Devices
Used Components (8)
Native API
api Communication protocol for connecting ESPHome to Home Assistant and other clients
Image
image Display bitmap images on screens and displays
I2C Bus
i2c Inter-Integrated Circuit communication bus for connecting multiple devices
Over-The-Air Updates (OTA)
ota Update ESPHome firmware wirelessly over the network
GPIO Output
gpio Simple on/off digital output on GPIO pins
Script
script Create reusable automation sequences
WiFi
wifi Configure WiFi connectivity for ESP devices
Template Switch
template Create custom virtual switches with programmable behavior
Project Overview
A simple voice assistant built with the Xmini-C3 board, featuring a microphone, speaker, and OLED display. The device uses on-device wake word detection and connects to Home Assistant Cloud for voice processing, enabling hands-free control of smart home devices.
Features
- 🎤 Built-in I2S microphone (ES8311 DAC) for voice commands
- 🔊 I2S audio speaker for voice responses and announcements
- 📺 OLED display (SSD1306) showing assistant status icons
- 🌈 RGB LED status indicator with various effects (listening, processing, error states)
- 🎯 On-device wake word detection (supports “Okay Nabu”, “Hey Mycroft”, “Hey Jarvis”)
- ⏲️ Timer support with audio notifications
- 🔘 Physical button for stopping the alarm timer and factory reset
- 🔌 Powered via USB-C
Progress
- ✅ Set up on-device wake word detection
- ✅ Configure Home Assistant Cloud voice pipeline
- ✅ Test voice commands (lights, timers)
- Future improvements (see below)
Future Improvement Ideas
- Add more display pages showing additional information
- Implement custom LED effects for different states
- Add local voice processing option (instead of cloud)
- Local pipeline
- Volume control adjustments
Reusability Note
This project uses the Xmini-C3 board which has built-in I2S audio components. If using a different ESP32 board, you’ll need to add external audio hardware. The configuration is straightforward with minimal customization needed - mostly updating WiFi credentials and API keys in your secrets file.
What You’ll Need
Hardware
- 1x Xmini-C3 - ESP32-C3 board with built-in I2S audio (ES8311 DAC, microphone, speaker)
- 1x USB-C cable (data capable for programming)
- 1x Power supply (USB charger, 5V/1A minimum)
Software
- ESPHome installed
- Home Assistant with Cloud subscription (for voice processing)
- Home Assistant Voice Assistant configured
Setup Instructions
- Flash the device - Use the provided YAML configuration to flash your Xmini-C3
- Configure secrets - Set your WiFi credentials and API encryption keys
- Add to Home Assistant - The device should auto-discover via ESPHome integration
- Configure Voice Pipeline - In Home Assistant, set up your preferred voice assistant pipeline
- Expose Entities to Voice Assistant - Select some entities you want configured by the voice assistant. Less entities selected - faster the assistant. Possibly add aliases for easier control
- Test wake word - Say “Okay Nabu” (or other configured wake words) to trigger the assistant
- Try commands - Test with simple commands like “Turn on the lights” or “Set a timer”
How It Works
- Wake Word Detection: Runs locally on the device using micro wake word models (configurable)
- Voice Processing: Audio sent to Home Assistant Cloud for speech-to-text and intent recognition
- Feedback: RGB LED and OLED display show current state (idle, listening, processing, speaking)
- Button Control: Press the boot button to cancel the timer notification or hold 10s for factory reset
Status
This project is completed and working. The voice assistant successfully:
- Detects wake words locally on the device
- Responds to voice commands via Home Assistant Cloud
- Controls lights and other smart home devices
- Handles timers with audio notifications
- Provides visual feedback via LED display
There’s significant room for improvement in terms of customization, additional features, and local processing options.
Acknowledgments
Configuration inspired by the M5Stack Atom Echo voice assistant from the ESPHome wake word voice assistants repository. It is actually mostly the copy of it with configuration specific for Xmini.
Main Configuration File
If you’re using
ESPHome Device Builder
create your New Device. Or if you’re using
command line create your yaml file
(e.g. xmini-voice-assistant.yaml)
Then use the following file as a guide (details on how to customize it are below).
Download the full configuration: xmini-voice-assistant.yaml
esphome:
name: my-mini-voice-assistant
friendly_name: My Mini Voice Assistant
esp32:
variant: esp32c3
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESPTOOLPY_FLASHMODE_DIO: y
flash_size: 16MB
# Enable Home Assistant API
api:
encryption:
key: !secret mini_voice_assistant_api
ota:
- platform: esphome
password: !secret mini_voice_assistant_ota
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
logger:
substitutions:
boot_btn_pin: GPIO09
i2c_sda_pin: GPIO03
i2c_scl_pin: GPIO04
neopixel_pin: GPIO02
i2s_ws_pin: GPIO06
i2s_bck_pin: GPIO08
i2s_mck_pin: GPIO10
i2s_do_pin: GPIO05
i2s_di_pin: GPIO07
mute_pin: GPIO11
i2c:
sda: ${i2c_sda_pin}
scl: ${i2c_scl_pin}
output:
- platform: gpio
pin: ${mute_pin}
id: mute_control
inverted: true
audio_dac:
- platform: es8311
id: my_dac
use_microphone: false
bits_per_sample: 16bit
#sample_rate: 48000
sample_rate: 16000
address: 0x18
button:
- platform: factory_reset
id: factory_reset_btn
name: Factory reset
i2s_audio:
- id: i2s_audio_bus
i2s_lrclk_pin: ${i2s_ws_pin}
i2s_bclk_pin: ${i2s_bck_pin}
i2s_mclk_pin: ${i2s_mck_pin}
microphone:
- platform: i2s_audio
id: external_mic
adc_type: external
i2s_din_pin: ${i2s_di_pin}
#https://esphome.io/components/speaker/i2s_audio/
speaker:
- platform: i2s_audio
id: my_speaker
dac_type: external
i2s_dout_pin: ${i2s_do_pin}
sample_rate: 16000
channel: mono
bits_per_channel: 16bit
buffer_duration: 100ms
media_player:
- platform: speaker
name: None
id: my_media_player
announcement_pipeline:
speaker: my_speaker
format: WAV
codec_support_enabled: false
buffer_size: 6000
volume_min: 0.4
files:
- id: timer_finished_wave_file
file: https://github.com/esphome/wake-word-voice-assistants/raw/main/sounds/timer_finished.wav
on_announcement:
- if:
condition:
- microphone.is_capturing:
then:
- script.execute: stop_wake_word
- light.turn_on:
id: my_indicator
blue: 100%
red: 0%
green: 0%
brightness: 100%
effect: none
on_idle:
- script.execute: start_wake_word
- script.execute: reset_led
voice_assistant:
id: va
micro_wake_word:
microphone:
microphone: external_mic
channels: 0
gain_factor: 4
media_player: my_media_player
noise_suppression_level: 2
auto_gain: 31dBFS
on_listening:
- light.turn_on:
id: my_indicator
blue: 100%
red: 0%
green: 0%
effect: "Slow Pulse"
on_stt_vad_end:
- light.turn_on:
id: my_indicator
blue: 100%
red: 0%
green: 0%
effect: "Fast Pulse"
on_tts_start:
- light.turn_on:
id: my_indicator
blue: 100%
red: 0%
green: 0%
brightness: 100%
effect: none
on_end:
# Handle the "nevermind" case where there is no announcement
- wait_until:
condition:
- media_player.is_announcing:
timeout: 0.5s
# Restart only mWW if enabled; streaming wake words automatically restart
- if:
condition:
- lambda: |-
return id(wake_word_engine_location).current_option() == "On device";
then:
- wait_until:
- and:
- not:
voice_assistant.is_running:
- not:
speaker.is_playing:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- script.execute: reset_led
on_error:
- light.turn_on:
id: my_indicator
red: 100%
green: 0%
blue: 0%
brightness: 100%
effect: none
- delay: 2s
- script.execute: reset_led
on_client_connected:
- delay: 2s # Give the api server time to settle
- script.execute: start_wake_word
on_client_disconnected:
- script.execute: stop_wake_word
on_timer_finished:
- script.execute: stop_wake_word
- wait_until:
not:
microphone.is_capturing:
- switch.turn_on: timer_ringing
- light.turn_on:
id: my_indicator
red: 0%
green: 100%
blue: 0%
brightness: 100%
effect: "Fast Pulse"
- wait_until:
- switch.is_off: timer_ringing
- light.turn_off: my_indicator
- switch.turn_off: timer_ringing
binary_sensor:
- platform: gpio
pin:
number: ${boot_btn_pin}
inverted: true
mode:
input: true
pullup: true
name: Button
id: boot_btn
disabled_by_default: true
entity_category: diagnostic
on_multi_click:
- timing:
- ON for at least 50ms
- OFF for at least 50ms
then:
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
else:
- script.execute: start_wake_word
- timing:
- ON for at least 10s
then:
- button.press: factory_reset_btn
light:
- platform: esp32_rmt_led_strip
id: my_indicator
name: None
disabled_by_default: true
entity_category: config
pin: ${neopixel_pin}
default_transition_length: 0s
chipset: ws2812
num_leds: 1
rgb_order: GRB
restore_mode: ALWAYS_OFF
effects:
- pulse:
name: "Slow Pulse"
transition_length: 250ms
update_interval: 250ms
min_brightness: 50%
max_brightness: 100%
- pulse:
name: "Fast Pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 50%
max_brightness: 100%
script:
- id: reset_led
then:
- if:
condition:
- lambda: |-
return id(wake_word_engine_location).current_option() == "On device";
- switch.is_on: use_listen_light
then:
- light.turn_on:
id: my_indicator
red: 100%
green: 89%
blue: 71%
brightness: 60%
effect: none
else:
- if:
condition:
- lambda: |-
return id(wake_word_engine_location).current_option() == "On device";
- switch.is_on: use_listen_light
then:
- light.turn_on:
id: my_indicator
red: 0%
green: 100%
blue: 100%
brightness: 60%
effect: none
else:
- light.turn_off: my_indicator
- id: start_wake_word
then:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: |-
return id(wake_word_engine_location).current_option() == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: |-
return id(wake_word_engine_location).current_option() == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
- id: stop_wake_word
then:
- if:
condition:
lambda: |-
return id(wake_word_engine_location).current_option() == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop:
- if:
condition:
lambda: |-
return id(wake_word_engine_location).current_option() == "On device";
then:
- micro_wake_word.stop:
switch:
- platform: template
name: Use listen light
id: use_listen_light
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- script.execute: reset_led
on_turn_off:
- script.execute: reset_led
- platform: template
id: timer_ringing
optimistic: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Turn off the repeat mode and disable the pause between playlist items
- lambda: |-
id(my_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_OFF)
.set_announcement(true)
.perform();
id(my_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 0);
# Stop playing the alarm
- media_player.stop:
announcement: true
on_turn_on:
# Turn on the repeat mode and pause for 1000 ms between playlist items/repeats
- lambda: |-
id(my_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_ONE)
.set_announcement(true)
.perform();
id(my_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 1000);
- media_player.speaker.play_on_device_media_file:
media_file: timer_finished_wave_file
announcement: true
- delay: 15min
- switch.turn_off: timer_ringing
select:
- platform: template
entity_category: config
name: Wake word engine location
id: wake_word_engine_location
optimistic: true
restore_value: true
options:
- In Home Assistant
- On device
initial_option: On device
on_value:
- if:
condition:
lambda: return x == "In Home Assistant";
then:
- micro_wake_word.stop:
- delay: 500ms
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
- if:
condition:
lambda: return x == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop:
- delay: 500ms
- micro_wake_word.start:
micro_wake_word:
microphone: external_mic
on_wake_word_detected:
- voice_assistant.start:
wake_word: !lambda return wake_word;
vad:
models:
- model: okay_nabu
- model: hey_mycroft
- model: hey_jarvis
image:
- file: mdi:robot
id: va_listening
type: binary
resize: 48x48
- file: mdi:robot-happy
id: va_idle
resize: 48x48
type: binary
display:
- platform: ssd1306_i2c
id: my_display
model: "SSD1306 128x64"
address: 0x3C
update_interval: never
pages:
- id: idle_page
lambda: |-
it.fill(COLOR_OFF);
it.image((it.get_width() / 2), (it.get_height() / 2) + 8, id(va_idle), ImageAlign::CENTER);