Python scraper based on AI
-
Updated
Jun 2, 2026 - Python
Python scraper based on AI
Browser automation CLI built for AI agents. Break through anti-bot walls, hand off to humans across platforms when stuck. Parallel multi-task execution, independent multi-session operation, isolated multi-account browsing.
Python client library for Diffbot APIs
qcrawl - fast async web crawling & scraping framework for Python.
AI based web-wrapper for web-content-extraction
AI-based web extractor
Open-source web crawler
Lightfeed SDK to search and filter web data
This repository contains the code and data download links to reproduce the building process of the 2021 Schema.org Table Corpus.
Get and process multiple resources from web, using asyncio (aiohttp) to fetch the data and multiprocessing/multithreading for processing it.
Extract text from images using a robust OCR model designed for accuracy and efficiency in varied visual contexts.
Backlink crawler for web data
Dealroom data extraction tool
Python web scraper for monitoring online stores and tracking product updates in real-time. Standalone module and part of jirai_sweeties.
High-volume user agent string extractor built with Python and Playwright for enterprise-level web data collection.
Python-based desktop app for effortless web scraping
Track Amazon product prices over time using the Olostep API, the most reliable and cost effective Web search, scraping, and crawling API — with a Streamlit dashboard, CSV/JSON history, and support for both one-time and scheduled CLI runs.
A Comprehensive Script To Extract Customer Reviews For Machine Learning
Python SDK for MarkUDown (https://scrapetechnology.com/markudown) -- An AI data Infrastructure that turns any website into structured, ready-to-use data even if is behind a protected website.
Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."