Listen "#84- FineWeb, the best dataset to pre-train LLMs."
Episode Synopsis
Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results.
Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
More episodes of the podcast Life with AI
#99- GraphRAG.
05/12/2024
#98- On-device AI with SmolLM.
07/11/2024
#96- Maritaca AI, the brazilian LLM company.
24/10/2024
#95- Why Chain of Thought works?
26/09/2024
#94- OpenAI o1
19/09/2024
#93- Different types of AI.
12/09/2024
#92- Llama3 benchmarks, vision and speech.
22/08/2024
#91- Llama 3 training.
15/08/2024
#90- Llama 3 paper overview.
25/07/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.