Top 10 questions about web scraping — answered edition

Introduction

I’ll try to define the concept as I would explain it to my younger self. And let’s say I was not the sharpest pencil back then. So let’s get started:

What is web scraping?

If you are not impressed yet, just imagine the alternative: visit the web page -> copy -> paste -> repeat. Doesn’t it sound awful yet? How about visiting 100 web pages? Or even 1,000,000. If this doesn’t make your heart skip a beat, then you’re probably a web scraping robot yourself.

Why is web scraping so hot?

  • Saves time

For illustration purposes, let’s say it takes you more time to pronounce “data” than it takes the web scraper to collect it.

  • Tons of information

Web scraping allows you to harvest data in much larger quantities than you would do manually.

  • Cost-effective

A web scraper is a tool that never needs a break and won’t have you pay for through the nose. You do the math.

  • Customizable

This technology can be modified to fit your needs by making minor alterations to it.

What is web scraping good for?

Of all the players, the ones that really appreciate the power of web scraping are:

  • Marketing & sales
  • Price intelligence data collection
  • Fetching product data
  • Brand protection
  • Competition research
  • Lead generation
  • Content aggregation
  • Marketing communication verification
  • Monitoring consumer sentiment
  • SEO Audit & Keyword research
  • Public Relations
  • Brand monitoring
  • Data Analytics & Data science
  • AI Machine learning
  • Strategy
  • Building a product
  • Market research
  • HR: Collecting candidate data
  • Real Estate
  • Forecasting market direction
  • Property value tracking
  • Real estate aggregators
  • Monitoring vacancy rates

Is data extraction legal?

There is no precise yes/no answer to the issue of the legality of the process. Many variables influence this answer, and some can change depending on the country’s laws and regulations.

What you can do is check the Terms of Service, where websites usually specify if they prohibit scraping their content or not.

As a rule of thumb, it is recommended to stay away from Personal Data, and Copyright protected data. Although, sometimes, it is ok to scrape the second category as long as you don’t plan to republish it or claim it as yours.

The act of web scraping itself can be perfectly legal, but what information you choose to gather and what you intend to do with it can have legal ramifications.

Can I scrape data behind a login page?

Another trick you can do is check the robots.txt file, which mentions whether a website is ok with scraping its data or not. All you have to do is simply type “robots.txt” at the end of any URL (https://www.example.com/robots.txt) and follow the rules there.

What kind of web scraping tools are there?

Bowser extensions

Even though it can accomplish almost the same task, it usually lacks some of the anti-tracking features needed to get data from more complex websites.

APIs

The ease with which an API can be integrated into an application is one of its most appealing features. Basically, all you need is a set of credentials and a clear understanding of the API documentation.

Also, an API uses built-in solutions that make sure your scraper is not getting blocked.

If you want to learn more, I recommend taking a break of 7 Minutes to Decide What Web Scraping Tool Is Best for You

Can I make my own web scraper?

The Python tutorials are written by yours truly.

Can web scraping be detected?

If they want to, websites can detect web scrapers from real users by tracking browsers’ activity, examining the IP address, setting honeypots, adding CAPTCHAs, or even limiting the request rate.

Anti-detection methods go a long way, but they’re not infallible. It’s not a question of ‘if,’ but of ‘when’ you’ll be detected. Here’s what you can do, though:

How to protect the scraper from getting blocked?

1. A strong proxy pool

2. Geolocation options

3. Rotating proxies

4. Anti-fingerprinting measures

Where to start with web scraping?

I recommend reading this list of 20 Tools You Won’t Want to Miss.

I hope you have a good, stress-free day, with fewer unanswered questions about web scraping!

Passionate content writer and creative marketer 🖊️

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store