Step-by-step tutorials for you to get started with web scraping

Download Octoparse

How to exclude "Ads" items when creating a list?

Wednesday, December 04, 2019

The latest version for this tutorial is available here. Go to have a check now!


When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).


What should you do if you only want to scrape the non-ads items?

You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.

If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.


Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']

Enter the XPath into Octoparse, you will see the Ads being excluded.


If you are new to XPath, you might need to grab some basics of HTML and XPath first. Here are some tutorials for your reference: HTML basic | XPath basic

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
Diese Website verwendet Cookies um Ihnen ein besseres Internet-Erlebnis zu ermöglichen. Lesen Sie wie wir Cookies verwenden und Sie können sie kontrollieren, indem Sie auf Cookie-Einstellungen klicken. Wenn Sie die Website weiter nutzen, akzeptieren Sie unsere Verwendung von Cookies.
Akzeptieren Ablehnen