Research (to be updated)

Automatic Speech Dubbing

Automatic dubbing can be regarded as an extension of the speech- to-speech translation (STST) problem, which is generally seen as the combination of three sub-tasks: (i) transcribing speech to text in a source language (ASR), (ii) translating text from a source to a target language (MT) and (iii) generating speech from text in a target language (TTS).  Independently from the implementation approach, the main goal of STST is producing an output that reflects the linguistic content of the original sentence. On the other hand, automatic dubbing aims to replace  speech contained in a video file with speech in a different language, so that the result sounds and looks as natural as the original. Hence, in addition to conveying the same content of the original utterance, dubbing should ideally also match the duration of the original utterance, lip movements and gestures in the video, timbre, emotion and prosody of the speaker, and finally back-ground noise and reverberation of the environment. 

Symbiotic integration of human and machine intelligence 

Machine and human intelligence can strongly benefit from each other, in multiple ways.  There has been increasing evidence that AI can boost productivity of human translators, by providing them with draft translations to post-edit. On the other hand, we recently proved that  human post-editing can be exploited to dynamically adapt  AI models.  This opens the way to interesting application scenarios  as well as to  new research challenges in the field of AI and beyond, such as learning and adapting from human feedback, and optimizing machine performance towards minimum human post-editing effort.