EP06: A Product Developer, Blind Technology Advocate, and Computer Vision Researcher Discuss the Future for Visual Interpretation Technologies (Robin, Saqib, Marcus)

01/08/2022 35 min Temporada 1 Episodio 6

Listen "EP06: A Product Developer, Blind Technology Advocate, and Computer Vision Researcher Discuss the Future for Visual Interpretation Technologies (Robin, Saqib, Marcus)"

Episode Synopsis

Brief summary of the episode
Robin Christopherson, Saqib Shaikh, and Marcus Rohrbach share very diverse perspectives around the development of visual interpretation technologies to meet the interests and needs of people with vision impairments.
Questions asked in the episode

[02:22] Could you share about what has surprised you the most about progress that has taken place over the past 10-20 years around technologies that provide visual assistance to real-world users?
[08:10] What do you see as the current limiting factor or barriers in developing better visual interpretation technologies?
[11:52] Could you describe how you envision technology will work in 10 years for interpreting visual information for real-world users?  For example, what skills will the technology have?  Also, how will the technology deliver information, such as via a live video feed or augmented reality or something else?
[17:00] Could you discuss how you think we should decide what information to include in a visual description?
[22:17] I next want to dig into one of the issues that is critical for designing vision assistance technology, which is access to large datasets from people with vision impairments to support evaluation and training of computer vision models. What are your expectations about how such datasets can be built responsibly and any experience you have in building such datasets?
[30:48] Could you please share about to what extent each of you already have conversations with or collaborate with researchers, industry developers, and blind technology advocates to advance products and services that can advance visual assistance products and services?  What do you find works well versus does not work well in these collaborations or conversations?

Guest bios
Robin Christopherson is a co-founder and Head of Digital Inclusion at AbilityNet. His work has led to accessibility improvements in many organizations spanning industry, government, and universities.   Robin also has served as an expert technical witness around assistive technology in software, systems and websites.
Saqib Shaikh is an Engineer Manager at Microsoft, where he founded Seeing AI - an app which enables someone who is visually impaired to hold up their phone, and hear more about the text, people, and objects in their surroundings.
Marcus Rohrbach is a Research Scientist at Meta AI Research, with a PhD from the Max Planck Institute for Informatics. Marcus is most well-known for his work at the intersection of computer vision and natural language processing. Over his career, he has driven key progress in visual question answering, language grounding, and generating descriptions about image and videos, in particular movies - all of which is highly relevant for the blind/low-vision community.
Danna Gurari is an Assistant Professor at University of Colorado Boulder where she also leads the Image and Video Computing research group.
Links to resources mentioned

https://vizwiz.org/workshops/2022-workshop/