ScraperETC

PyPI
GitHub Documentation Status
codecov
CI

ScraperETC is a lightweight Python package that streamlines browser automation and HTTP scraping. It wraps Selenium and requests with clean, Pythonic interfaces that remove the usual boilerplate - especially for waits, drivers, and headers. ScraperETC is designed with anti-bot detection in mind, using smart defaults to reduce the chance of blocks or bans.

Why Use ScraperETC?

  • Selenium imports are long, clunky, and almost impossible to remember. ScraperETC lets you write from scraper_etc import By and use By.ID, By.XPATH, etc., directly.

  • webdriver_wait() simplifies waiting for elements with built-in selector validation and multiple wait modes (located, all_located, clickable).

  • HTTP requests are often blocked by anti-bot filters. ScraperETC provides default headers that reduce detection without extra effort.

  • Verifying file downloads shouldn’t require writing custom content checks. This package includes built-in PDF validation tools to save you time.

ScraperETC was built to reduce the friction of browser automation and HTTP scraping, especially when using headless Chrome.

Features

  • Minimal wrappers for selenium.webdriver.Chrome and undetected_chromedriver to get up and running fast

  • webdriver_wait() handles selector validation and WebDriverWait with three modes:

    • "located": wait for a single element to appear

    • "all_located": wait for multiple matching elements

    • "clickable": wait until the element is ready to be interacted with

  • Use By directly from scraper_etc, so you don’t have to remember where Selenium hides it

  • http_GET() adds default headers that mimic a modern browser to help you evade bot detection

  • Built-in tools for validating PDF downloads and checking response status

  • Optional exception-raising on failure to let you choose between passive and strict workflows

  • Currently supports only the Chrome web browser, which must be installed and available on your system PATH

Installation

pip install scraper-etc

Requires Python 3.10 or later.

Example Usage

from scraper_etc import setup_chrome_driver, webdriver_wait, http_GET_valid_pdf, By

# start a headless Chrome driver (using undetected_chromedriver under the hood)
driver = setup_chrome_driver(headless=True)

# wait for a div with a specific ID to appear
elem = webdriver_wait(driver, by="ID", selector="main")

# wait for a clickable button (alternative usage)
button = webdriver_wait(driver, by="XPATH", selector="//button", ec="clickable")

# use imported By class directly with driver methods
different_ele = drive.find_element(By.ID, "differentID")

# validate a remote PDF and save it
res = http_GET_valid_pdf("https://example.com/sample.pdf")
if res:
    with open("sample.pdf", "wb") as f:
        f.write(res.content)

Development

ScraperETC includes a modern CI/CD pipeline:

  • Ruff for linting and auto-formatting

  • mypy for static type checking

  • Bandit for security scanning

  • pytest with unit tests covering all core logic

  • Codecov integration for test coverage

  • GitHub Actions CI to run it all on push

  • Dependabot for automated dependency updates

CI workflows live in .github/workflows.

License

This project is released under CC0 (public domain). You are free to use, modify, and redistribute it without restriction.