Listen "034. Web Scraping and Data Science"
Episode Synopsis
Data collection is a crucial step for any data related projects. So much so that you might have encountered something along the lines of the “GIGO” (garbage in, garbage out) concept. Some might even say having the right data is more important than having tons of data that can’t be used.
As web scraping being one of the ways to collect data, for this episode, we invited Cliff, a data consultant, back to discuss his personal experience with web scraping. He shared topics such as the basics of web scraping, web scraping tools, the challenges that he faced while trying to scrape web contents, ethics of web scraping, learning materials, and more!
Resources:
Cliff's medium post 1: https://medium.com/codex/scraping-singapore-libraries-f74c541f1f94
Cliff's medium post 2: https://cliffy-gardens.medium.com/iterations-for-my-nlb-scraper-github-code-provided-b4e1f1bd422e
Selenium: https://www.selenium.dev/
BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
TagUI: https://github.com/kelaberetiv/TagUI
Web Scraping with Python: https://www.oreilly.com/library/view/web-scraping-with/9781491985564/
As web scraping being one of the ways to collect data, for this episode, we invited Cliff, a data consultant, back to discuss his personal experience with web scraping. He shared topics such as the basics of web scraping, web scraping tools, the challenges that he faced while trying to scrape web contents, ethics of web scraping, learning materials, and more!
Resources:
Cliff's medium post 1: https://medium.com/codex/scraping-singapore-libraries-f74c541f1f94
Cliff's medium post 2: https://cliffy-gardens.medium.com/iterations-for-my-nlb-scraper-github-code-provided-b4e1f1bd422e
Selenium: https://www.selenium.dev/
BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
TagUI: https://github.com/kelaberetiv/TagUI
Web Scraping with Python: https://www.oreilly.com/library/view/web-scraping-with/9781491985564/
More episodes of the podcast Symbolic Connection
039. What is Generative AI?
28/05/2023
038. AI ethics and who should be responsible
09/08/2022
031. Let talk about MLOps
10/09/2021
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.