4
aarfa
6y

What is the best way to scrape websites?

Comments
  • 5
    Python + BeautifulSoup
  • 1
    I've enjoyed PhantomJS/CasperJS but the project is essentially dead now, I know a ton of people use Selenium, and headless chrome is becoming a popular solution.
  • 1
    An implementation of curl and an HTML parser
  • 0
    Greasemonkey and nodejs
  • 1
    Don't mind me, I'm just waiting to shit on the person who's going to recommend regex 👀
  • 0
    Sites not using JS : Requests + BeautifulSoup

    if site using JS add :
    - Chromedriver + Selenium
    - Scrapy Splash (one beautiful piece of work)

    Or use scrapy, a crappy lib but okay
  • 0
    @justin-tamblyn

    When there are counter measures, you often need to run your script through a proxy to change up your IP address or you'll just get blocked.
Add Comment