LIP-TRAC is a revolutionary real-time visual speech recognition system, empowering individuals with hearing and speech impairments through advanced AI lipreading technology.
Discover LIP-TRACWe believe in a world where communication is accessible to everyone. The LIP-TRAC Initiative is dedicated to leveraging artificial intelligence to break down barriers for individuals with hearing and speech impairments, fostering understanding, independence, and inclusion through innovative visual speech recognition technology.
Live with hearing loss globally (WHO).
Projected to have disabling hearing loss by 2050 (WHO).
Millions with Aphonia or Aphasia face similar challenges.
Traditional lipreading is difficult, with accuracy rarely exceeding 30%. Existing assistive technologies can be costly, ineffective in noisy environments, or unsuitable for conditions like Aphonia. Audio-based speech recognition (ASR) also struggles in noise and requires audible speech.
LIP-TRAC offers a visual solution, immune to noise and effective even in silence.
LIP-TRAC (Lipreading through a Temporal Recurrent and Convolutional network) is an advanced, real-time system that translates lip movements into text, bridging communication gaps effectively and efficiently.
LIP-TRAC prototype on Raspberry Pi 5
LIP-TRAC uses a sophisticated yet efficient AI pipeline to understand speech visually:
A camera captures the speaker's face in real-time.
Advanced algorithms detect the face and precisely crop the mouth region.
Frames are normalized to handle variations in lighting and appearance, enhancing lip movement details.
Our lightweight CRNN model analyzes lip patterns and transcribes them into text using CTC loss.
LIP-TRAC is more than just technology; it's a commitment to improving lives. By providing an accurate, real-time, and accessible lipreading solution, we aim to:
Enhance communication for millions with hearing or speech impairments.
Enable greater participation in conversations and daily activities.
Offer a low-cost alternative to incredibly expensive assistive devices.
Facilitate understanding in noisy or silent environments for everyone.
Our Real-Time Performance Score (RTPS) of 0.10683 highlights LIP-TRAC's optimal balance of speed and accuracy for practical use.
LIP-TRAC leverages cutting-edge deep learning techniques, trained on the diverse BBC LRS2 dataset. Our lightweight Convolutional Recurrent Neural Network (CRNN) architecture is specifically designed for efficiency without compromising heavily on accuracy.
Dataset: BBC LRS2 (683 training videos, 456 testing videos)
Core Architecture: Lightweight CRNN with 3D Convolutions and Bidirectional GRUs.
Training: Connectionist Temporal Classification (CTC) Loss.
Key Performance: WER 32.7%, CER 14%, Inference ~6.3s on Raspberry Pi 5.
Innovation: The Real-Time Performance Score (RTPS) to evaluate practical usability.
LIP-TRAC is built upon rigorous research. You can learn more about the foundational work in our research paper or the research poster. Additionally, you can explore the broader vision and market context in our analysis.
We are continuously working to enhance LIP-TRAC. Future developments include: