VERA is an ongoing personal research project.
The trailer highlights system concepts and interaction goals rather than a finished product.
VERA is a conversational AI that demonstrates real-time speech recognition, reasoning, and voice synthesis. It listens, understands, and responds through speech or actions (though mostly speech). While inspired by fictional assistants like JARVIS from the Ironman series, VERA is designed as a human-in-the-loop system, with user control and bounded capabilities.
VERA lets you talk naturally and get spoken responses back in real time, similar to a human conversation. Users can switch between continuous listening, push-to-talk, and keyboard mode depending on the environment.
Users can interrupt VERA mid-response to correct, redirect, or add context. This creates a more fluid and natural conversational flow, similar to speaking with another person rather than a turn-based conversation.
VERA uses context like habits and preferences to give more helpful responses. Personal data is planned to stay local on the user’s device.
VERA can answer utility questions like time, date, weather, countdowns, news, and stock prices through voice. For richer interactions, the prototype currently includes dedicated panels for news and music.
A usable conversational AI using speech as both input and output. Turn-based interaction: user speaks → AI responds. Transcriptions appear as chat bubbles in real time.
Core Stack: ASR (Whisper-large) · LLM (LLaMA 3.2) · TTS (fine-tuned)
This version focuses on making interaction with VERA feel more natural, flexible, and intentional. Building on the stable voice pipeline from Version 1.0, this release expands input modes, extends actions (e.g., news summaries, weather checks), introduces early interruptibility and conversational pacing strategies, and begins deeper persona refinement.
Version 2.5 is a refinement release mainly focused on response reliability, latency reduction, and conversational correctness. Rather than introducing new features, this version addresses systemic weaknesses observed during extended testing.
Version 3.0 is a major upgrade focused on deeper conversational awareness and stronger action handling. This release improves multi-turn reasoning, ambiguity handling, and side-panel support.
Version 4.0 focuses on work-focused usability and command-driven workflow expansion. This release adds music querying, a dedicated music panel, and a hidden Work mode that is now activated by command instead of a visible UI toggle.
BMO adds a dedicated character page with an SVG-driven mouth that reacts while TTS plays. The face isn’t tied to parsing the words on screen; it follows the audio (loudness and short-term energy) so motion feels aligned with prosody (syllables and pauses), not the literal phrase.
#bmo-audio is playing, Web Audio analysis updates
data-bmo-tts-emotion on the smile SVG so CSS can show the right layer.
bmo-emotions-test.html - standalone layout check for the three mouth slots.My name is Nam. I'm a third-year undergraduate student majoring in Data Science at the University of California, Irvine. Over the past few years, I have developed a strong interest in machine learning and artificial intelligence, which led me to participate in Kaggle competitions and work on transformer models. These experiences ultimately brought me to this project VERA. My goal with VERA is to build a functional conversational AI that integrates modern machine learning techniques with software engineering best practices to deliver a seamless user experience.
Experiments with transformer training pipelines, including custom collators and fine-tuning.
Applied ML projects focused on feature engineering and modeling.
Data analyses exploring pricing, experience, and predictive modeling across datasets.
Music
Moog City - C418
Trailer
Created and edited by me.
Watch trailer again here
Inspiration
Red Barrels - cinematic UI direction and tone.