ALISON HUANG

Final Project: LipNet and Emotion Tracking

12/10/2019

Presentation: docs.google.com/presentation/d/1i5VF3bMfu_h_d0PYLIdovnfsvY7o4k0QndkRG1NJxhc/edit?usp=sharing

LipNet Fail Documentation:
https://drive.google.com/open?id=1Q7S_WT5TUi6HIvtflmn9Lu9mLy-UNANS

p5 Sketch:
https://editor.p5js.org/[email protected]/sketches/sY7K7UNQe

One Sentence Description:
A study on LipNet including: a failed setup process, application ideas, and thoughts.

Summary:
A study on LipNet detailing my failed installation process, emotion tracking, future application ideas, and overall thoughts. LipNet is a recurrent network with the “first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model” meaning it can lipread based on where the users mouth moves visually (Cornell 1). Some codes were tweaked to fit a PC as the original documentation was done on a Mac. To run python, I used PowerShell. I attempted to follow the Keras implementation of LipNet where the model was trained on multiple images and videos of multiple different people mouth words. LipNet then outputs the text as what it believes the user mouthed-the result that I wanted. The step that caused me to discontinue my setup process was when I was unable to get the system to recognize the ffmpeg that had been installed and previously working. Other setup difficulties are discussed. Based on the original LipNet, I built an application that classifies the user as “Mad,” “Happy,” or “Excited” based on the user’s facial expression. The project uses facial tracking points to train the model and the model then classifies the data-images-collected. The project uses ML5.js in p5.js to run. Future applications include ideas in the fields of technology, surveillance, and medical fields. My inspiration was a previous project I had done in my Code of Music class last semester. The project was based around helping those who are deaf or hard of hearing, be able to see music visually.

Inspiration:
In high school, I had a friend who wasn't deaf, but needed a hearing aid to hear. She was always afraid of going completely deaf and being isolated from the rest of the word. Last year for my Code of Music final project, I created a physical device that put intonation on braille. The device had an LCD for display and 4 LEDs for each amplitude, frequency, and waveform to display the results in real time. I wanted to try to find another way to visualize sound so she and people like her never have to worry about this again.

Process:
Setting up LipNet on a PC is painful, but very educational. Though I was unable to train a model on LipNet, setting it up in PowerShell taught me a lot about coding logic and python. At the end, I was unable to get the system to recognize the ffmpeg although I had downloaded it. I believe most of my problems were because I used a PC when a majority of the-minimally existing-documentation was done on a Mac.
While making my alternative project, a new set of problems arose. I found that the hardest part was getting emotion tracking to work. Because the human face is only so big, an alternative method may have been easier to work it.

Audience:
My audience is those who participate in the fields of technology, surveillance, and medicine. Lip reading can be used as a new way to communicate with digital assistants, military level spying, and for those who are deaf.

User Testing:
After having my friend user test my alternative project, I discovered that I needed more samples in my dataset. After adding more images, the project's accuracy improved-but, not significantly. This probably because facial emotion tracking is unreliable.

Code References:
https://editor.p5js.org/[email protected]/sketches/sY7K7UNQe
https://github.com/rizkiarm/LipNet/blob/master/README.md
https://github.com/keras-team/keras/blob/master/keras/optimizers.py
https://github.com/osalinasv/lipnet
https://github.com/ski-net/lipnet

Next Steps:
My next steps would be to try to install LipNet on a Mac and to add more emotion categories in my alternative project.

Machine Learning

Author

Archives

Categories