Captcha is a way for website owners to tell if the traffic on their website is genuine. It helps to distinguish human traffic from fake traffic and in some cases protects the data from website crawlers or any other botting software.
There are many ways to trigger Captcha and most of them depend on the security of the website. Often, Captcha is met when filling a registration form in the website, visiting certain domains from public networks, refreshing the same page constantly and so on.
There are many different types of Captcha you will or will not face while browsing the web. Most of these usually require to enter certain symbols seen on the screen, others require to select pictures or solve a puzzle. The most popular and most often seen Captcha is provided by Google as reCAPTCHA
There are many ways to identify whether you are getting Captcha or not, here are some common signs:
- You are not getting back the requested content or it comes partially.
- Your scraper/crawler returns a response with Captcha inside it.
- Your requests are timing out.
- Instead of 200 HTTP response code, you are getting codes such as 40x, 50x, etc.
There are many forms of Captcha you may face and a lot of combinations in your actions to trigger them. It all depends on your setup, here are some general tips to avoid Captchas while using a proxy network:
- If you are using a bot, try different Endpoints or rotating ports for our service.
- Try randomizing your request times on the application if possible. For example: Jarvee offers custom delays on certain actions which can make your traffic look more genuine.
- If you are writing custom code for a scraper/crawler type of application, make sure that you have a huge list of different User-Agents, which will help to cover your tracks while visiting the website. A User-Agent is a parameter sent with your request which gives you identity while visiting a certain website, usually, it looks like the following:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0
- Avoid or never use direct links in your bots that are not publicly available on the websites page without looking into its source code.
- If possible sway your traffic by visiting and following paths provided by the website itself rather than asking for a certain link directly constantly.
- Make sure that you limit your requests and not causing damage to the website itself, this will instantly trigger more safety features than your code or application is prepared to handle, such as Cloudflare shields, etc.
- If possible, use a headless browser, provided by such frameworks as Selenium.
- If writing custom code, check other headers that you are sending and ones that you are receiving, sometimes there are certain HTTP libraries used in your requests may give you away or other parameters sent by a target website to make sure that your requests are genuine, such as Cookies.
Will proxies help me to solve Captcha?
If a Captcha is provided by the website itself on such pages as checkout/registration/password-change forms etc. most likely avoiding it will not be possible, even with a proxy. For such occasions research for Captcha solver services or solve them on your own. Proxy network does not influence Captcha appearance on such occasions and is definitely not the tool to solve them.