This project develops tools to semi-automate a specific style of philanthropic evaluation. The current system has two distinct components for processing different data:
The Form 990 code uses Excel VBA to score IRS Form 990s submitted by 501(c)(3) organizations. The scoring highlights entities that have endowments, award scholarships, and emphasize science education or research. This framework is at an early stage of development, and future versions are expected to be implemented in Python with relational database support and a web-based interface.
The Scholarship Directory information is scraped from the Labor Department website using Python and associated tools.
To get the system running:
Code.xlsm and place it in your working directory (e.g., x/)..txt files:
nodenames.txtstopwords.txtpunctuation.txtrule.txt
Place them in the same directory as the Excel file.x/testforms/, and create subdirectories:
x/testforms/990 for standard 990 filesx/testforms/errant for nonstandard or filtered-out files.txt files to change what data is parsed or scored.Each line defines:
String, Date, Integer, AbsInt)Date;10;Return/ReturnHeader/TaxPeriodBeginDt
Integer;4;Return/ReturnHeader/TaxYr
AbsInt;15;Return/ReturnData/IRS990/CYInvestmentIncomeAmt
String;600;Return/ReturnData/IRS990/ActivityOrMissionDesc
Used to clean and tokenize text fields—feel free to modify.
Defines scoring logic for each rule. Users can modify or add rules.
Parsed990Data contains extracted data:
Scored990Data contains rule evaluations:
There are four rule types. Each uses a semicolon-delimited format:
SubstringSubstring;RuleName;Nodename;Present;token1,token2,...
TrendTrend;RuleName;Nodename1,Nodename2,...
PercentilePercentile;RuleName;Nodename;Cutoff
EvalEval;RuleName;Nodename;NumOrTxt;Expression
rule.txt:Eval;Age;IRS990_FormationYr;Num;Year(Now()) - IRS990_FormationYr > 15
Substring;Web;IRS990_WebsiteAddressTxt;T;academy,edu
Percentile;EndYrBal;CYEndwmtFundGrp_EndYearBalanceAmt;0.50
Trend;YrNet;IRS990_NetAssetsOrFundBalancesBOYAmt,IRS990_NetAssetsOrFundBalancesEOYAmt
Move Files
In VBA module move990, run Move990Files
Moves Form 990 files to
/990/, skips Form 990EZ and others
Parse XMLs
In module Parse, run ParseXML990Files
Extracts nodename data into
Parsed990Data
Clean Text
In module Strip, run Master
Cleans descriptions and web addresses; populates
DescFiltered
Score Data
In module Score, run Score
Evaluates rules and outputs to
Scored990Data
Activate ChromeDriver.exe
If you don't have these Python extensions, then run:
Download scraper.py from GitHub repository and place it in your working directory.
The output goes to a csv file with 8 columns and as many rows as scholarships. The 8 columns are labeled
In the output uploaded to GitHub, the csv file (called scholarships.csv) has 10,000 rows which was the entirety of what was available from the Labor Department's CareerOneStop website on July 30, 2025.
This project is licensed under the GNU General Public License v3.0.