Listen "Computer Vision | Theory: How Computers See in the Real World"
Episode Synopsis
In this episode of Big Ideas Only, host Mikkel Svold takes a theoretical deep dive into how computers “see” with Andreas Møgelmose (Associate Professor of AI, Aalborg University; Visual Analysis & Perception Lab).We unpack the neural-network ideas behind modern vision, why 2012 was a turning point, how convolutional networks work, the difference between training, fine-tuning and adding context, plus explainability, bias traps, multimodality, and what still needs solving.In this episode, you’ll learn about:How a 2012 vision breakthrough reshaped speech and language research2. Neural networks explained simply — how they learn patterns from data3. CNNs: how computers spot shapes and textures in images4. Training, fine-tuning, and adding context to make models smarter5. From hand-crafted features to fully data-driven learning6. Explainability: the “ruler in skin-cancer photos” bias trap and what it teaches us7. Multimodal systems: models combining text, images, and tools8. Depth sensing with stereo, lidar, radar, and time-of-flight — and when 3D is essential9. Privacy and governance: why real risk lies in implementation, not vision itself10. Open challenges: fine-grained recognition, explainability, and machine unlearning11. The pace of progress: steady research with headline-making leapsEpisode Content01:09 How computer vision differs from other AI fields01:16 The 2012 breakthrough: neural networks in vision that spread to speech and text04:05 Neural networks 101: neurons, weights, and simple math scaled up to complex decisions07:06 Training at scale: millions of images, pretraining, and fine-tuning for specific tasks10:39 Fine-tuning vs. adding context in large language models; backpropagation explained16:52 Layered learning: from edges to shapes, faces, and full objects18:22 Before deep learning: feature engineering and why it hit its limits20:44 How it’s built: data collection, architecture design, training loops, and learning plateaus22:54 Bias pitfalls: the “ruler in skin-cancer photos” example and why explainability matters25:23 Regulation and trust: high-risk uses and the demand for transparency26:13 Connecting vision to action: from black-box outputs to robots with “vision in the loop”27:41 Ensemble systems: language models coordinating other models (e.g., text-to-image)29:03 True multimodality: training models jointly on text and images30:17 AGI reflections: embodiment, experience, and the limits of data32:44 Human vision vs. computer vision: depth of field, aperture, and why machines see everything in focus34:40 Is progress slowing or steady? Research milestones versus quiet, continuous work36:43 Public perception: many versions, but most still see “just ChatGPT”37:41 Why the research pace feels natural — more people means faster progressThis podcast is produced by Montanus.
More episodes of the podcast Big Ideas Only
Nuclear Power | Now or Never
03/01/2023
VR AR – Shortcomings and possibilities
30/11/2022
Virtual Reality - Fad or Future
23/11/2022
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.