Multimodal Speech-to-Text Transcription

In this work, the creation of a speech transcript is to be expanded to include additional sources of information, primarily video. In this way, non-verbal elements such as gestures, lip-reading, visible speaker activity etc. are to be included for the optimisation of a speech transcript. Non-verbal expressions such as nodding or shaking the head should be recognised and transferred to a transcript.Students can actively contribute to the definition of the work.

Further information
- Semester or Master’s thesis for 1-2 people
- 40% theory, 60% realisation
- Prerequisites: Signal processing, Python
German is commonly spoken within the company. Basic proficiency is helpful and appreciated.

Have we sparked your interest?

I am interested in the study Multimodal Speech-to-Text Transcription and would like to find out more.

Multimodal Speech-to-Text Transcription

Further information

Have we sparked your interest?