Filter-IRS-990-BMF

A modular Python pipeline for identifying and scoring high-impact 501(c)(3) organizations by integrating IRS Business Master File (BMF) data with Form 990 XML filings.

Project Overview

This tool automates the process of finding specific types of non-profits (e.g., Higher Education, Research, Foundations) and ranking them based on financial health and growth metrics extracted directly from IRS TEOS XML archives.

Directory Structure

How to Run

The pipeline is controlled by input/pipeline.txt. Set specific tasks to YES to include them in the run.

# From the project root
python src/main.py

Pipeline Stages

  1. Download & Load: Retrieves BMF and Form 990 data from IRS servers.
  2. Filter: Categorizes entities into EduHigh, EduLow, Research, Foundation, or ProfSoc based on NTEE codes.
  3. Ingest: Extracts deep financial history from Form 990 XML files (supports Deflate64 via 7-Zip).
  4. Score & Rank: Applies a multi-factor scoring engine to rank organizations within their categories.