System Design

Architecture

The system uses a Sequential Pipeline Architecture managed by main.py. It uses subprocess to trigger individual scripts, ensuring memory isolation between large data-loading tasks.

Data Schema

Data is centralized in db/philanthropy.db.

Scoring Logic

The system uses a "Top-Half" scoring model. For every metric (e.g., Asset Growth), the median is calculated for the specific category. Organizations at or above the median receive +1 point.

Key Scripts

Refinement Rules

Entities are rejected (Category = 'Reject') if:

  1. The ReturnTypeCd is not 990 (e.g., 990-PF or 990-EZ are excluded).
  2. The WebsiteAddress is missing or malformed (normalized via regex).