Screen-to-Soundscape adopts an experimental approach to re-imaging screen readers, by addressing the current limitations for blind and visually impaired users. Our goal is to develop a free and open-source explorative tool that transforms a screen into an immersive soundscape, with a strong focus on providing rich, descriptive alt-text for images and maps. Using open-source computer vision algorithms, our system will analyze visual elements to generate detailed and customizable alt-text tailored to user preferences, offering a more comprehensive understanding of visual content. Additionally, the prototype will feature spatial audio, using multiple layered voices to read out the content, which ideally would enhance the users’ navigation and interaction with digital content.
Our motivation is to provide a more intuitive and engaging navigation experience. Traditional screen readers often skip images, videos, and maps, and offer limited customization, especially in voice diversity. By incorporating spatial audio, novel computer vision algorithms, diverse voice options, and a customizable alt-text tool, our tool ensures all content is accessible and allows users to personalize their auditory experience, making digital navigation more natural and comprehensive.
Screen-to-Soundscape is supported by the Constant Foundation, The Processing Foundation, and the Stimuleringsfonds.
Collaborators: Alyssa Gersony, Bruno Defalque, Chris Alexandre, Colette Aliman, Dan Xu, Raphaël Bascour, Vincent Leone, Vladimir Nani, Luis Morales-Navarro, Stefan Laureijssen
Read more about Screen-to-Soundscape on www.screentosoundscape.com
Try It: Hear This Page
Enable spatial audio, then hover over any text block below. You'll hear a spatialized tone from the block's position in 3D space, followed by the text being read aloud. Left blocks sound from your left ear, right from your right. Top blocks are farther away, bottom blocks are closer.
Screen readers often skip images, videos, and maps, leaving blind users with an incomplete picture of digital content.
Our tool uses computer vision to generate rich, descriptive alt-text for images, making visual content accessible through sound.
Spatial audio uses multiple layered voices positioned in 3D space, so content on the left of the screen sounds like it comes from your left ear.
Traditional screen readers offer a single monotone voice. We provide diverse voice options and customizable narration styles.
Screen-to-Soundscape is free and open-source, supported by Constant, The Processing Foundation, and the Stimuleringsfonds.
Best experienced with headphones for full spatial effect. Each block has a unique tone positioned in 3D space.
Phase 1B Prototype: Wikipedia Soundscape Generator
Enter a Wikipedia article to explore it as a 3D soundscape. Walk through sections with arrow keys, hear singing bowl beacons from each element's position, and listen to spatial text-to-speech. Best with headphones.
Open full-screen for the best experience.