Training AI

22/06/2024

Listen "Training AI"

Descargar episodio Ver en sitio original

Episode Synopsis

From John Gruber today:
It’s fair for public data to be excluded on an opt-out basis, rather than included on an opt-in one [...]
No, no it’s not. This is a critical thing about ownership and copyright in the world. We own what we make the moment we make it. Publishing text or images on the web does not make it fair game to train AI on. The “public” in “public web” means free to access; it does not mean it’s free to use.
Besides that, I’d also add what I’ve seen no one else mention so far: People post content on web that they don’t own all the time. No one has to prove ownership to post anything.
Someone who publishes my work as their own (theft) or republishes my work (like quoting or linking back) doesn’t have the right to make the choice for me to let my content be used for training AI. This is where I struggle the most with the “opt-out” style of AI training on the web.
Whether reposting my content elsewhere is in good faith or not, it is now up someone other than me to decide whether or not to disallow AI training webcrawlers in their robots.txt file. To add insult to injury, that person may not have the knowledge—or even the power—to do so if they’re posting content they don’t own on a site they also don’t own, like social media.
I can play whac-a-mole with those bots on servers I control—which I don’t like doing, for the record—but I have none of that control anywhere else.

More episodes of the podcast LMNT

Nob 14/01/2026

Star Wars 13/01/2026

Philly 05/01/2026

Plastic, Part 2 03/01/2026

That’s Why It’s a Red Flag 02/01/2026

2026 01/01/2026

Willow 31/12/2025

Spaghetti on a Bagel 30/12/2025

Grid 26/12/2025

Big Day 25/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Training AI

Listen "Training AI"

Episode Synopsis

More episodes of the podcast LMNT

Bandwidth: Broadband or Narrowband?

Prevent Attacks From Your Local Area Network

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD