Listen "Ethical AI Data"
Episode Synopsis
Ethical concerns about the use of AI have to start with training data. Too often, the primary concern is simply generating sufficient data, rather than understanding its nature. Emily Jasper and Abby Simmons are back to continue the conversation started in episode 198 with host Eric Hanselman. With generative AI, the data is the application in its most formative sense. Unlike traditional application development, where the expectation is that functionality will be expanded in later releases, GenAI applications require careful design of training data before training takes place. The perspectives contained in data age rapidly and model training doesn't differentiate between outdated and current indications. Old data can effectively poison model outputs. Businesses risk alienating customers with models that are trained with data that don't properly represent them. This is particularly true with marginalized communities, where language and context can change over shorter time frames. While there is research work on model retraining, work in AI today has to focus on effective data quality management. DeepSeek is causing a significant rethinking. Human data cleansing can be effective, but can't scale to AI demands. Data workbench tools and synthetic data approaches can help, but better automation is needed to ensure that data sets are truly representative. Data collection and data sourcing need much greater attention to ensure that model results can engage the target audience and not be a liability. It's a fundamental question of accountability that requires thinking in ways that are different than legacy development processes. Mentioned in this episode: https://transtechtent.com https://kevinguyan.com/queer-data/ More S&P Global Content: Webinar: AI in Action: Leveraging NLP to Answer Subjective Questions 2025 Trends in Data, AI & Analytics Take 5: Data quality and AI — a bidirectional relationship Compliance automation, Part 1: Governance, risk and compliance, or something new? Credits: Host/Author: Eric Hanselman Guests: Emily Jasper, Abby Simmons Producer/Editor: Kyle Cangialosi and Odesha Chan Published With Assistance From: Sophie Carr, Feranmi Adeoshun, Kyra Smith
More episodes of the podcast Next in Tech
A Wild Earnings Season
16/01/2026
The Agentic Enterprise
13/01/2026
AWS re:Invent conference
23/12/2025
SC25 Supercomputing Conference
16/12/2025
Security and Observability
09/12/2025
Context Engineering
02/12/2025
The Big Picture Reports
25/11/2025
Agentic Customer Experience
18/11/2025
Money 20/20
11/11/2025
Open Compute Project Summit
04/11/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.