av-cult: Machine learning solutions for reducing exclusion of persons with hearing loss from cultural content


Hearing loss is a common health problem in today’s societies. For example there are around 50 millions of “hard of hearing” people in the EU (around 9% of the total population). One of the biggest challenges of mild or severe hearing impaired listeners is being able to understand speech, music and generally sounds that are contained in cultural works such as theatrical plays, movies and concerts (Sound can also be needed in other forms of cultural works that do not use audio as their main medium: a visual art exhibition, for example, does not use sound as a main channel but still audio information can be part of the human-human communication or a guidance procedure, etc). This obvious difficulty increases the level of exclusion of people with hearing loss from such activities and therefore significantly influences their quality of life.

Advances in the technology of hearing aids has only partly improved the ability of the people with mild hearing loss to participate in such cultural events, but it has not helped in severe hearing loss cases. At the same time, modern artificial intelligence applications have made it possible to automatically recognize auditory content and transcribe raw speech to text and present it to the users as enhanced subtitles in Augmented Reality (AR) systems or via haptic interfaces. In addition, deep-learning-based computer vision can help in improving communication of persons with hearing loss, through sign language recognition and facial expression analysis. These advances in automatic sound analysis, AR systems, computer vision, and haptic interfaces can lead to new applications that enable people with even complete hearing loss to follow a theatrical play, fully experience a museum tour or “feel” a live music performance.

This workshop invites contributions that focus on solutions that adopt: (a) modern audio/music/speech/image/video analysis methods and/or (b) innovative human-machine interaction pipelines, in order to boost the ability of the persons with hearing loss to follow theatrical plays, concerts, movies, museums, art exhibitions and any cultural content in general.


This workshop aims to attract an interdisciplinary group of researchers from fields such as auditory scene analysis, automatic speech recognition and speech analytics, affective computing, music perception, hearing aid technology, human-computer interaction and psychology, all focusing in finding solutions for making cultural content that contains sound as a basic medium, more easily accessible to people with hearing loss. As part of the larger theme of the PETRA conference, the participants will also have the opportunity to interact with top scientists working with pervasive assistive technologies to exchange valuable ideas that could advance the state-of-the-art in the field.

Topics of interest include, but are not limited to

  • TV/Cinema live transcriptioning and subtitling methods and applications
  • Music Perception Enhancement for Hearing Impaired Persons
  • Auditory scene analysis for hearing impaired applications
  • Cochlear implant technology for music perception
  • Live transcriptioning for improving human2human communication
  • Speech emotion recognition
  • Sign language translation
  • Sign language synthesis
  • Multimodal music experience
  • Auditory scene visualization
  • Visualization of prosodic aspects of speech, emotions and behaviors

Workshop Organizers

Dr. Theodoros Giannakopoulos
Researcher of Multimodal ML, NCSR Demokritos, Greece

Ms Sofia Eleftheriou
Research Assistant, NCSR Demokritos, Greece

Mr Panagiotis Koromilas
Research Assistant, NCSR Demokritos, Greece