Q: Why does Octoparse only collect the first item from each page?

The updated version of this tutorial (based on the latest webpage) is available now. Go to have a check here!




I have been testing your software to try and data-mine some info.

The website is https://www.yelp.com/search?find_desc=car+audio&find_loc=Brooklyn%2C+NY

The problem is it will only collect the first item from each page.




In this case, you can check the "Loop Item" that used to extract all the items from the page and the XPath for the "Loop Item".

Please follow the steps to check your rule.

1. Open the task

2. In the "Design Overflow" step, you will see the rule in the Workflow Designer. Click each step/box one by one from the beginning to go through the rule. Make sure the order of the rule is correct.

3. When you click the "Loop Item" box, check if all items on the page are extracted by the XPath.

If not, you need to modify the XPath by using our Octoparse XPath tool or other tools like Firepath.

Check out these tutorials to learn how to edit XPath.

       Modify XPath Manually in Octoparse

       Get Started With XPath 1

       Get Started With XPath 2

4. Replace the original with the correct XPath.


Only after you create a correct 'loop item' that contains all the links to the detail pages can you move forward to the next step and collect data from websites.


Check out the tutorial to check your scraping task: Check The Extraction Rule When Errors Occur


Diese Website verwendet Cookies um Ihnen ein besseres Internet-Erlebnis zu ermöglichen. Lesen Sie wie wir Cookies verwenden und Sie können sie kontrollieren, indem Sie auf Cookie-Einstellungen klicken. Wenn Sie die Website weiter nutzen, akzeptieren Sie unsere Verwendung von Cookies.
Akzeptieren Ablehnen