A modular Python pipeline for identifying and scoring high-impact 501(c)(3) organizations by integrating IRS Business Master File (BMF) data with Form 990 XML filings.
This tool automates the process of finding specific types of non-profits (e.g., Higher Education, Research, Foundations) and ranking them based on financial health and growth metrics extracted directly from IRS TEOS XML archives.
src/: Python source scripts for the pipeline.input/: Configuration files and URL lists for downloads.data/: Local storage for downloaded CSV and ZIP archives.db/: SQLite database (philanthropy.db) containing processed data.output/: Timestamped directories containing final category rankings.The pipeline is controlled by input/pipeline.txt. Set specific tasks to YES to include them in the run.
# From the project root
python src/main.py