crawley

Unix-way web crawler.

The tool crawls web pages and outputs every link it discovers, streaming unique URLs to standard output while stripping fragments. It parses HTML with a fast SAX parser, extracts URLs from JavaScript and CSS using lexical parsers, and can retrieve API endpoints, image, video, audio, and form resources. Users can control crawl depth, include or exclude specific HTML tags, ignore URLs matching patterns, and follow subdomains, with optional politeness via robots.txt handling and request delays.

It operates as a command‑line application written in Go, supporting environment‑based proxy configuration, custom cookies and headers in curl‑compatible syntax, and a “brute” mode that also scans HTML comments. Additional modes include directory‑only traversal, headless operation that suppresses HEAD requests, and tag‑based filtering for targeted scans. The binary is available for Linux, FreeBSD, macOS, and Windows, with package options for Debian/Red Hat and Arch Linux.

The codebase is under 1500 source lines, idiomatic Go, and more than 80 % test coverage, providing a lightweight yet feature‑rich solution for developers needing to enumerate links, resources, or endpoints from a site without requiring a full browser engine.

Reviews

Loading reviews…

Similar apps

API & Network Testing

curlie

A curl frontend with the ease of use of HTTPie.

API & Network Testing

tunnelmole

tunnelmole - listed on awesome-cli-apps.

API & Network Testing

shell2http

Shell script based HTTP server.

API & Network Testing

Lura

High-performance API Gateway.

API & Network Testing

ATAC

A feature-full TUI API client made in Rust.

Network & Connectivity

Sosse

Selenium based search engine and crawler with offline archiving.