Web Scraper API Documentation

Have some Issues Right now but mostly working

Advanced web scraping with CSS selectors, XPath, and customizable extraction options

Rate Limit: 50 requests per hour (Contact me if you want to increase the limit)

API Overview

The Web Scraper API allows you to extract data from web pages using flexible filtering options, including CSS selectors, XPath queries, and attribute-based filtering. It supports caching, custom headers, and metadata extraction.

Base URL: https://hishuanigami.com/api/v1/webscrapper/scrape

Features

Endpoint

POST
/api/v1/webscrapper/scrape

Get request to the same endpoint will return usage instructions

Parameters

Parameter Type Required Description
url string Yes URL of the web page to scrape
tag string No HTML tag to target (default: *)
selector string No CSS selector for targeting elements (alternative to tag)
class string No CSS class name to filter elements
id string No Element ID to filter elements
attribute string No Attribute name to filter elements
attribute_value string No Attribute value to match
nth integer No Get the nth element (0-based index)
from integer No Start index for range filtering
to integer No End index for range filtering
limit integer No Maximum number of results (1-1000)
contains_text string No Filter elements containing specific text
regex_pattern string No Regex pattern to extract specific content
strip_empty boolean No Remove empty results (default: true)
raw_html boolean No Return raw HTML instead of text (default: false)
extract_links boolean No Extract links from results (default: false)
extract_images boolean No Extract images from results (default: false)
timeout integer No Request timeout in seconds (5-60, default: 30)
cache_minutes integer No Cache results for specified minutes (0-1440, default: 0)
user_agent string No Custom User-Agent string
headers array No Custom HTTP headers as key-value pairs
follow_redirects boolean No Follow HTTP redirects (default: true)
verify_ssl boolean No Verify SSL certificates (default: true)
return_meta boolean No Include page metadata (title, description, etc.)
encoding string No Character encoding (utf-8, iso-8859-1, windows-1252, default: utf-8)
remove_scripts boolean No Remove script tags from HTML (default: true)
remove_styles boolean No Remove style tags from HTML (default: true)

Response Format

{
  "url": "https://example.com",
  "query": "//h1",
  "count": 1,
  "results": [
    {
      "index": 0,
      "text": "Sample Heading",
      "tag": "h1",
      "raw_html": "

Sample Heading

", "attributes": { "class": "main-title" }, "links": [], "images": [], "regex_matches": [] } ], "cached": false, "scraped_at": "2025-07-11T20:07:00Z", "meta": { "title": "Example Page", "description": "This is an example page", "keywords": "example, test", "charset": "utf-8", "language": "en" } }

Error Response

{
  "error": "Scraping failed: Failed to fetch URL. Status: 404",
  "url": "https://example.com",
  "timestamp": "2025-07-11T20:07:00Z"
}

Usage Examples

1. Basic Scraping (Tag-Based)

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "h1"
}

2. CSS Selector

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "selector": "div.content p"
}

3. With Caching and Metadata

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "p",
  "cache_minutes": 60,
  "return_meta": true
}

4. Extract Links and Images

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "article",
  "extract_links": true,
  "extract_images": true
}

5. Filter by Class and Text

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "class": "content",
  "contains_text": "example"
}

Response Codes

200 Success

Request successful, scraped data returned

400 Bad Request

Invalid parameters provided (e.g., invalid URL)

{"error": "The url field must be a valid URL"}

429 Too Many Requests

Rate limit exceeded

{"error": "Too many requests, Rate Limit is 50 per hour"}

500 Internal Server Error

Server error occurred while processing request

{"error": "Scraping failed: [Error message]"}