Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
I've enjoyed PhantomJS/CasperJS but the project is essentially dead now, I know a ton of people use Selenium, and headless chrome is becoming a popular solution.
-
Kimmax111066yDon't mind me, I'm just waiting to shit on the person who's going to recommend regex 👀
-
vocuzi82576ySites not using JS : Requests + BeautifulSoup
if site using JS add :
- Chromedriver + Selenium
- Scrapy Splash (one beautiful piece of work)
Or use scrapy, a crappy lib but okay -
@justin-tamblyn
When there are counter measures, you often need to run your script through a proxy to change up your IP address or you'll just get blocked.
Related Rants
-
willbeddow23A client wanted me to make a website that compared the users face to that of a wrestler. We had done a lot of ...
-
jhole893My client is trying to force me to sign an ethics agreement that would allow them to sue me if found in breach...
-
Garbott7When your reworking a bot because they've realised your scraping their site and you spot this; GAME ON MF'ers
What is the best way to scrape websites?
question
scraping