Have some Issues Right now but mostly working
Advanced web scraping with CSS selectors, XPath, and customizable extraction options
The Web Scraper API allows you to extract data from web pages using flexible filtering options, including CSS selectors, XPath queries, and attribute-based filtering. It supports caching, custom headers, and metadata extraction.
/api/v1/webscrapper/scrape
Get request to the same endpoint will return usage instructions
Parameter | Type | Required | Description |
---|---|---|---|
url |
string | Yes | URL of the web page to scrape |
tag |
string | No | HTML tag to target (default: *) |
selector |
string | No | CSS selector for targeting elements (alternative to tag) |
class |
string | No | CSS class name to filter elements |
id |
string | No | Element ID to filter elements |
attribute |
string | No | Attribute name to filter elements |
attribute_value |
string | No | Attribute value to match |
nth |
integer | No | Get the nth element (0-based index) |
from |
integer | No | Start index for range filtering |
to |
integer | No | End index for range filtering |
limit |
integer | No | Maximum number of results (1-1000) |
contains_text |
string | No | Filter elements containing specific text |
regex_pattern |
string | No | Regex pattern to extract specific content |
strip_empty |
boolean | No | Remove empty results (default: true) |
raw_html |
boolean | No | Return raw HTML instead of text (default: false) |
extract_links |
boolean | No | Extract links from results (default: false) |
extract_images |
boolean | No | Extract images from results (default: false) |
timeout |
integer | No | Request timeout in seconds (5-60, default: 30) |
cache_minutes |
integer | No | Cache results for specified minutes (0-1440, default: 0) |
user_agent |
string | No | Custom User-Agent string |
headers |
array | No | Custom HTTP headers as key-value pairs |
follow_redirects |
boolean | No | Follow HTTP redirects (default: true) |
verify_ssl |
boolean | No | Verify SSL certificates (default: true) |
return_meta |
boolean | No | Include page metadata (title, description, etc.) |
encoding |
string | No | Character encoding (utf-8, iso-8859-1, windows-1252, default: utf-8) |
remove_scripts |
boolean | No | Remove script tags from HTML (default: true) |
remove_styles |
boolean | No | Remove style tags from HTML (default: true) |
{ "url": "https://example.com", "query": "//h1", "count": 1, "results": [ { "index": 0, "text": "Sample Heading", "tag": "h1", "raw_html": "Sample Heading
", "attributes": { "class": "main-title" }, "links": [], "images": [], "regex_matches": [] } ], "cached": false, "scraped_at": "2025-07-11T20:07:00Z", "meta": { "title": "Example Page", "description": "This is an example page", "keywords": "example, test", "charset": "utf-8", "language": "en" } }
{ "error": "Scraping failed: Failed to fetch URL. Status: 404", "url": "https://example.com", "timestamp": "2025-07-11T20:07:00Z" }
POST https://hishuanigami.com/api/v1/webscrapper/scrape { "url": "https://example.com", "tag": "h1" }
POST https://hishuanigami.com/api/v1/webscrapper/scrape { "url": "https://example.com", "selector": "div.content p" }
POST https://hishuanigami.com/api/v1/webscrapper/scrape { "url": "https://example.com", "tag": "p", "cache_minutes": 60, "return_meta": true }
POST https://hishuanigami.com/api/v1/webscrapper/scrape { "url": "https://example.com", "tag": "article", "extract_links": true, "extract_images": true }
POST https://hishuanigami.com/api/v1/webscrapper/scrape { "url": "https://example.com", "class": "content", "contains_text": "example" }
Request successful, scraped data returned
Invalid parameters provided (e.g., invalid URL)
{"error": "The url field must be a valid URL"}
Rate limit exceeded
{"error": "Too many requests, Rate Limit is 50 per hour"}
Server error occurred while processing request
{"error": "Scraping failed: [Error message]"}