Smartproxy Help Center

Welcome to the Smartproxy developer hub. You'll find comprehensive guides and documentation to help you start working with proxies as quickly as possible, as well as support if you get stuck. Let's jump right in!

Documentation

Scraping & Crawling

Before you start...

Check what your target has to offer. Many websites will have a Public API available to prevent getting hit by thousands of different scrapers. Not only will it save you time, but also functionalities that APIs now provide will let you get clearer data with low-maintenance.

Expect the unexpected

As scraping & crawling are getting more and more popular, website owners tend to tighten their web security to prevent sites from going down due to the number of incoming requests. Ensure that you investigate how your target handles security, as this will be one of the biggest fallbacks if something happens wrong after your scraper/crawler is already in business.

Work with robots.txt

It's important to know what your target allows you to crawl since it can potentially show you where you will & will not meet additional security roadblocks. It can also save you tons of time by showing where the exact information can be found as these files are used for SEO. Each website should have this file available, which will be mostly found in a format like yourwebtarget.com/robots.txt such as https://smartproxy.com/robots.txt

Look for traps

The easiest way to detect a scraper or a crawler browsing through your website is by showing a link that cannot be seen by any other user on page load. It can only be checked by looking through the HTML code of the website. Ensure you inspect your target using built-in tools from Chrome / Firefox; you can simply hit F12 to open Developers Tools. In most cases, these links are going to be hidden with an additional CSS code.

Have your connection look human-like

Each website tracks what requests they get, some taking extreme security measures and tracking the request's whole fingerprint. When sending requests from scrapers or crawlers, make sure that you include a User-agent and, if needed, send all of the required Cookies. In other cases, you may need to follow a certain path for the requests to go through since asking for some links directly may give a clear indication that the request is not genuine.

Be responsible with request amounts

It's important to understand that you add to its current load when you send a target request. Sending too many rapid requests will slow down your process and result in the unavailable website for a longer time. Being smart about the number of your requests will help you achieve quality results faster and decrease the chance of the web owner investigating incoming requests and tightening web security, resulting in your scraper needing many additional changes.

Updated 3 months ago

Scraping & Crawling


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.