Show HN: AnyCrawl v0.0.1-alpha.5 – custom user-agent and richer scraping API ## [0.0.1-alpha.5] - 2025-06-14 ### Added - Integrated AWS S3 storage support with new `S3` class and environment variables for seamless file uploads and retrievals. - Introduced `FileController` for serving files from S3 or local storage with robust path validation and error handling. - Added multiple content transformers (Screenshot, `HTMLTransformer`) improving HTML/Markdown extraction and screenshot generation. - Extended scraping capabilities with new options: output `formats`, `timeout`, tag filtering, `wait_for`, retry strategy, viewport configuration, and custom user-agent support. - Added Safe Search parameter to `SearchSchema` for filtered search results. - Refactored engine architecture with a factory pattern and new core modules for configuration validation, data extraction, and job management. - Implemented graceful shutdown handling for the API server and improved logging for uncaught exceptions / unhandled rejections. - Added Jest configuration for API and library packages with ESM support and updated test scripts. - Updated CI workflows to publish Docker images on version tags. - Expanded README with detailed environment variable descriptions and API usage examples. ### Changed - Refined error handling in `ScrapeController` and `JobManager`; failure responses now include structured error objects and HTTP status codes. - Enhanced `BaseEngine` with explicit HTTP error checks and resilience improvements. - Updated OpenAPI documentation to reflect new scraping parameters and error formats. - Migrated key-value store name to environment configuration for greater flexibility. - Enhanced per-request credit tracking in `ScrapeController` and enhanced logging middleware to include credit usage. ### Fixed - Improved job failure messages to include detailed error data, ensuring clearer debugging information. - Minor documentation corrections and clarifications. https://ift.tt/a9sBjox June 14, 2025 at 11:18PM
Show HN: AnyCrawl v0.0.1-alpha.5 – custom user-agent and richer scraping API https://ift.tt/wa1zW2r
Related Articles
Show HN: Openlayer – Test, fix, and improve your ML models https://ift.tt/usS8fAYShow HN: Openlayer – Test, fix, and improve your ML models Hey HN, my … Read More
Show HN: Infinity Whiteboard, Designed for Teachers https://ift.tt/6H7CvYiShow HN: Infinity Whiteboard, Designed for Teachers I've created a whi… Read More
Show HN: Ollie – AI powered gift recommendations https://ift.tt/FGmD61PShow HN: Ollie – AI powered gift recommendations https://heyollie.ai/ … Read More
Show HN: Tack, a fast lightweight scripting language for games and embedding https://ift.tt/SJv5IctShow HN: Tack, a fast lightweight scripting language for games and emb… Read More
Show HN: Browse, upvote and comment on HN from command line https://ift.tt/gxMi3Q0Show HN: Browse, upvote and comment on HN from command line Hi HN! I'v… Read More
Show HN: Jesth – Next-level human-readable data serialization format https://ift.tt/z6qjxZRShow HN: Jesth – Next-level human-readable data serialization format H… Read More
Show HN: Mineo.app – Better Python Notebooks https://ift.tt/u80snMbShow HN: Mineo.app – Better Python Notebooks Hello everyone, I would l… Read More
Show HN: Smallville – Create Generative Agents for simulations and games https://ift.tt/c5yo6LBShow HN: Smallville – Create Generative Agents for simulations and gam… Read More
0 Comments: