CLI tool to scrape and organize Pakistani news articles into CSV format.
Project Details
Pakistan News Fetcher is a lightweight command-line utility designed to crawl and fetch the latest headlines and article content from various Pakistani news websites.
Functionality & Flow:
- Utilizes Axios to make HTTP requests to news websites
- Uses Cheerio to parse and extract relevant content from HTML pages (titles, timestamps, links, summaries)
- Fetched data is cleaned and stored in structured CSV format for easy analysis or archiving
- Supports multiple major Pakistani news sources (e.g., Dunya, Geo, Express, ARY News & Bol News)
Key Features:
- Automated scraping and extraction of daily news updates
- Command-line interface with options to choose sources and output location
- Built for scripting, automation, or personal research use
- Organized CSV output with columns like headline, source, link, date, and summary
Technical Stack:
- Node.js and Express for CLI structure and tool logic
- Axios for HTTP fetching
- Cheerio for HTML scraping and parsing
- JavaScript for scripting and file handling
Outcome:
- Streamlined tool for collecting structured news data
- Useful for researchers, data journalists, or hobby projects
- Designed for easy customization and source extension