SECWEB: THE DAILY CARTOGRAPHER

Overview

SecWeb is your daily cartographer of the internet’s web layer. It continuously crawls and archives thousands of live websites across newly discovered domains and subdomains. It captures full HTML content, screenshots, extracted JS/API endpoints, and robots.txt metadata β€” enabling researchers to explore, analyze, and build attack surface intelligence at scale.

β€œAutomated Web Reconnaissance for Bug Bounty & Security Research.”

Key Features

  • Zone-Based Domain Collection: Pulls fresh domains/subdomains from daily .zone files (e.g., .in, .us, .gov) and custom inputs.
  • Daily Crawl & Archive:
    • Renders and screenshots the live web page (via Playwright)
    • Saves full HTML content as .html.gz
    • Extracts all discovered URLs and saves to -urls.txt
    • Parses robots.txt and aggregates disallowed paths
  • Wordlist Generation: Uses robots.txt to generate a global robotsDisallowed.txt for smarter directory brute-forcing.
  • Open Access: All data is available on GitHub for the community.

πŸ“ Directory Structure

Each scanned domain is stored in the following format:

You can browse or download historical snapshots by date or domain.

GitHub Repo: https://github.com/theprojectnebulla/secweb

How It Works

  1. Discover Domains:
    • Parse zone files or curated target lists
    • Extract subdomains from passive sources
  2. Crawl Each Domain:
    • Load in headless browser
    • Save .html.gz, .png, robots.txt, and all discovered URLs
  3. Post-Processing:
    • Update robotsDisallowed.txt (from all robots.txt files)
    • Push updates to GitHub

Use Cases

Use CaseSecWeb Output Used
Passive Endpoint Discovery-urls.txt files
JS ReconJS URLs from extracted pages
Wordlist ExpansionrobotsDisallowed.txt
Asset FingerprintingScreenshots + HTML
AI Recon Training.html.gz for model input

Access the Data