Research
Research (to be updated)
Automatic Speech Dubbing
Automatic dubbing can be regarded as an extension of the speech- to-speech translation (STST) problem, which is generally seen as the combination of three sub-tasks: (i) transcribing speech to text in a source language (ASR), (ii) translating text from a source to a target language (MT) and (iii) generating speech from text in a target language (TTS). Independently from the implementation approach, the main goal of STST is producing an output that reflects the linguistic content of the original sentence. On the other hand, automatic dubbing aims to replace speech contained in a video file with speech in a different language, so that the result sounds and looks as natural as the original. Hence, in addition to conveying the same content of the original utterance, dubbing should ideally also match the duration of the original utterance, lip movements and gestures in the video, timbre, emotion and prosody of the speaker, and finally back-ground noise and reverberation of the environment.
Symbiotic integration of human and machine intelligence
Machine and human intelligence can strongly benefit from each other, in multiple ways. There has been increasing evidence that AI can boost productivity of human translators, by providing them with draft translations to post-edit. On the other hand, we recently proved that human post-editing can be exploited to dynamically adapt AI models. This opens the way to interesting application scenarios as well as to new research challenges in the field of AI and beyond, such as learning and adapting from human feedback, and optimizing machine performance towards minimum human post-editing effort.