Web Scraper API Documentation

API Overview

The Web Scraper API allows you to extract data from web pages using flexible filtering options, including CSS selectors, XPath queries, and attribute-based filtering. It supports caching, custom headers, and metadata extraction.

Base URL: https://hishuanigami.com/api/v1/webscrapper/scrape

Features

Flexible Filtering: Use CSS selectors or XPath for precise data extraction
Customizable Output: Extract raw HTML, links, images, or specific attributes
Caching: Cache results for up to 24 hours to reduce load
Metadata Extraction: Retrieve page title, description, and other metadata
Rate Limited: 50 requests per hour per IP

Endpoint

POST

/api/v1/webscrapper/scrape

Get request to the same endpoint will return usage instructions

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL of the web page to scrape
`tag`	string	No	HTML tag to target (default: *)
`selector`	string	No	CSS selector for targeting elements (alternative to tag)
`class`	string	No	CSS class name to filter elements
`id`	string	No	Element ID to filter elements
`attribute`	string	No	Attribute name to filter elements
`attribute_value`	string	No	Attribute value to match
`nth`	integer	No	Get the nth element (0-based index)
`from`	integer	No	Start index for range filtering
`to`	integer	No	End index for range filtering
`limit`	integer	No	Maximum number of results (1-1000)
`contains_text`	string	No	Filter elements containing specific text
`regex_pattern`	string	No	Regex pattern to extract specific content
`strip_empty`	boolean	No	Remove empty results (default: true)
`raw_html`	boolean	No	Return raw HTML instead of text (default: false)
`extract_links`	boolean	No	Extract links from results (default: false)
`extract_images`	boolean	No	Extract images from results (default: false)
`timeout`	integer	No	Request timeout in seconds (5-60, default: 30)
`cache_minutes`	integer	No	Cache results for specified minutes (0-1440, default: 0)
`user_agent`	string	No	Custom User-Agent string
`headers`	array	No	Custom HTTP headers as key-value pairs
`follow_redirects`	boolean	No	Follow HTTP redirects (default: true)
`verify_ssl`	boolean	No	Verify SSL certificates (default: true)
`return_meta`	boolean	No	Include page metadata (title, description, etc.)
`encoding`	string	No	Character encoding (utf-8, iso-8859-1, windows-1252, default: utf-8)
`remove_scripts`	boolean	No	Remove script tags from HTML (default: true)
`remove_styles`	boolean	No	Remove style tags from HTML (default: true)

Response Format

{
  "url": "https://example.com",
  "query": "//h1",
  "count": 1,
  "results": [
    {
      "index": 0,
      "text": "Sample Heading",
      "tag": "h1",
      "raw_html": "Sample Heading",
      "attributes": {
        "class": "main-title"
      },
      "links": [],
      "images": [],
      "regex_matches": []
    }
  ],
  "cached": false,
  "scraped_at": "2025-07-11T20:07:00Z",
  "meta": {
    "title": "Example Page",
    "description": "This is an example page",
    "keywords": "example, test",
    "charset": "utf-8",
    "language": "en"
  }
}

Error Response

{
  "error": "Scraping failed: Failed to fetch URL. Status: 404",
  "url": "https://example.com",
  "timestamp": "2025-07-11T20:07:00Z"
}

Usage Examples

1. Basic Scraping (Tag-Based)

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "h1"
}

2. CSS Selector

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "selector": "div.content p"
}

3. With Caching and Metadata

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "p",
  "cache_minutes": 60,
  "return_meta": true
}

4. Extract Links and Images

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "tag": "article",
  "extract_links": true,
  "extract_images": true
}

5. Filter by Class and Text

POST https://hishuanigami.com/api/v1/webscrapper/scrape
{
  "url": "https://example.com",
  "class": "content",
  "contains_text": "example"
}

Response Codes

200 Success

Request successful, scraped data returned

400 Bad Request

Invalid parameters provided (e.g., invalid URL)

{"error": "The url field must be a valid URL"}

429 Too Many Requests

Rate limit exceeded

{"error": "Too many requests, Rate Limit is 50 per hour"}

500 Internal Server Error

Server error occurred while processing request

{"error": "Scraping failed: [Error message]"}