Open Octoparse application.
In the top-left menu, click on the Create Task button. If you have a task created already you may skip to the Step 5 of this tutorial.
- For testing purposes, we are going to create a custom task, thus in the selection menu, click on the Advanced Mode.
- In the Website field type the website you would like to extract data from. For this test, we are going to use ip.smartproxy.com. Once you do that hit the Save URL button.
- You should now appear in your Task tab. To configure our proxies, select the Setting button.
- In the pop-up menu, scroll down to Anti-blocking settings and checkmark the option to Use IP proxies. You should now be able to click on the Settings button.
- In the Proxy Settings pop-up, define the proxy you would like to use. Octoparse unfortunately only offers IP:PORT based format to authenticate through a proxy network. For that particular reason, you will need to use our Whitelisted IP feature in order to skip the traditional username:password authentication when going through a proxy. To find the IP of the Endpoint you would like to use, make sure to check guidelines available here.
- Once you have your IP:PORT ready, select the Switch interval accodingly to your session type. If you are using a rotating session type, set the interval to 1, if you are using a sticky session, set it to 600. Lastly, hit the OK button.
- To verify if you did everything correctly, check if you are seeing a checkmark next to the Settings option under Anti-blocking settings. Once you verify that, click the Save button to continue.
- To extract data from our example page, click on the IP address which you can see at the bottom of the Octoparse application and select Extract text of the selected element.
- Once that is done, click on the Save and Run hyperlink.
- Depending on how you want to run your task, select one of the available extraction options. For testing purposes, you can use Local extraction.
- If done correctly, after task finishes running you should see our proxy IP in the extracted data table.
Updated 2 months ago