System Requirements
Objective
Modular Python pipeline identifies 501(c)(3) entities by integrating IRS Business Master File (BMF) data with Form 990 XML filings. It ranks 5 types of non-profits (Higher Education, Lower Education, Community Foundations, Professional Societies, and Research) based on financial health from Form 990. Users customize the parameters.
Functional Requirements
- Data Acquisition: Must download IRS BMF files and TEOS Form 990 XML ZIPs using provided URL lists.
- Categorization: Must group organizations into five target categories:
EduHigh, EduLow, Research, Foundation, and ProfSoc.
- XML Parsing: Must extract at least 25 specific financial and metadata fields from Form 990 schemas.
- Scoring Engine: Must compute a composite score based on:
- Organization Age.
- 5-year trends (increases) in grants and endowment balances.
- 2-year gaps in net assets and grants paid.
- Peer-relative performance (top-half ranking) across investment income and revenue.
Technical Requirements
- Python 3.x: Primary language.
- SQLite3: For local data persistence.
- 7-Zip (7z): Must be in the system PATH to handle IRS ZIP files using Deflate64 compression.
- Dependencies:
requests, lxml, and pathlib.
- No Hardcoding: All URLs and task toggles must reside in the
input/ directory.