Tracking partisan content on US federal government websites over time
| Worker | Location | Jobs completed | Current activity |
|---|
| URL | Change | First seen | Last seen |
|---|
Homepages of ~1,300 US federal government (.gov) websites were scraped at multiple points between October 2025 and February 2026 using a Playwright-based scraper to retrieve rendered page content.
Each page's text (up to 4,000 characters) was classified using
Google Gemma 4 31B (prompt version v2-with-quote).
Sites were labeled partisan if the text contained explicit partisan language:
named party attacks, political blame statements, or campaign-style rhetoric.
Factual policy descriptions and neutral government content were labeled neutral.
Pages that were blocked, empty, or returned errors are labeled unknown.
For each partisan classification, the model also extracted a verbatim quote from the page text — the specific phrase that triggered the classification. This quote is displayed alongside each site's classification rationale.
Results are categorized into three groups: agency sites (executive branch departments and agencies), congressional sites (official caucus and leadership pages, which are expected to reflect partisan positions), and historical archives (frozen sites from prior administrations). Headline statistics and charts reflect agency sites only. All categories appear in the partisan detections feed above.
News reporting exclusions: One .gov domain — voa.gov (Voice of America)
— is excluded from partisan counts and visit totals. VOA is a government-funded news broadcaster;
its homepage displays rotating news headlines, and partisan framing in those headlines reflects
news coverage of political events rather than the agency's own messaging. This is distinct from
usagm.gov (USAGM, VOA's parent agency), whose homepage uses its own press releases
in partisan language and is correctly included.
Manual overrides: Four site-snapshot combinations were manually reclassified
as partisan after the model returned a neutral label. In each case, manual review of the scraped
page text confirmed explicit partisan content that the model missed: agency branding attributing
the agency to a named president (cpsc.gov, eeoc.gov), campaign-style
press release headlines (ustr.gov), and a record with an API classification error
at a snapshot where every surrounding snapshot was classified partisan with identical content
(treasury.gov).
Visit counts come from the Digital Analytics Program (DAP), a GSA-run platform that collects traffic data from participating federal websites. In the DAP, a "visit" is a session — a continuous period of interaction that begins when a user opens a page and ends when they close it or go idle for 30 minutes. Visit counts are not pageviews and not unique visitors; one person can generate multiple visits in a day. All traffic figures are reported as daily visits (sessions).
Traffic is matched on an exact apex domain basis only — we use the visit count
for irs.gov directly, and do not aggregate subdomain traffic (e.g.
apps.irs.gov). Visitors to those subdomains probably never saw the homepage content we
classified; including them would potentially misrepresent the reach of partisan content.
Of the 1,336 sites tracked, 898 have no presence in DAP at all and an additional 54 appear only via subdomains. We do not know whether absent sites are low-traffic or simply not enrolled in DAP analytics. These sites are shown as "no data" — not zero — in all traffic figures. Visit counts reflect only the ~360–390 domains with confirmed DAP data per snapshot.
Interpreting the Traffic % chart: Reach percentage is calculated as partisan site visits divided by (partisan + neutral) site visits for sites with DAP data. Unknown/blocked sites are excluded from this calculation — they have no content and no measurable traffic. This is intentional: the Sites chart shows 3 datasets (partisan, neutral, unknown) while the Traffic % chart shows a single line, because unknown sites have no visit data to plot.
Day-of-week effects: Government website traffic is heavily weekday-weighted. Three snapshots fall on Sundays (Oct 12, Feb 1, Feb 8) and show lower total visits than weekday snapshots. The Traffic % view normalizes for this — percentages are directly comparable across snapshots regardless of overall traffic volume.
The coverage count shown in reach KPI footnotes reflects the exact number of tracked sites with DAP data for the latest snapshot.
The Est. Reach chart shows a running total of estimated daily visits to partisan agency sites across the full observation period (Oct 12, 2025 – Feb 8, 2026).
Method: For each pair of consecutive snapshots with distinct calendar dates, we identify agency sites that were classified partisan at both the earlier and later snapshot. Every calendar day between those two snapshots is attributed to those sites — visits for each day are looked up in the DAP cache and summed. This is a conservative estimate: a site must be confirmed partisan at both surrounding snapshots to be counted for any day in between. Where a same-day snapshot pair exists (Nov 13 AM/PM, Jan 30/Jan 30 PM, Jan 31 AM), those pairs contribute zero inter-snapshot days and are skipped; the conservative (neutral) side is always used when a site's status is ambiguous at a same-day boundary.
Boundary days: Oct 12 is included — a site partisan at both Oct 12 and Nov 6 is counted from Oct 12 onward. Feb 8 (the final snapshot) is a special case: sites classified partisan at Feb 8 are counted for that single day with no subsequent snapshot required.
The 77-day gap (Nov 13 – Jan 29) is the longest interval without an intermediate snapshot, shown as a shaded band on the chart. Jan 6, 2026 (one year after the Jan 6 Capitol event) and Jan 20, 2026 (one year after the inauguration) both fall within this window and represent plausible moments of heightened partisan activity on government websites. This estimate neither confirms nor denies activity during this period — it is an extrapolation from the Nov 13 PM and Jan 29 snapshots.
Days with no DAP data are silently excluded; the cumulative value carries forward across those gaps on the chart.
The Est. Daily chart shows the same data broken down by calendar day — the per-day partisan visit count rather than the running total. Dark bars are snapshot days (sites were actually classified on that date); light bars are interpolated days where partisan status is extrapolated from the surrounding snapshots. The same snapshot-bracketing methodology applies: a day is counted only if the site was partisan at both the preceding and following snapshot.
The gray line on the Est. Daily chart is the 7-day trailing rolling average of total named-domain .gov traffic — the current day and the 6 preceding days averaged together, so no future data is incorporated. It provides a stable baseline showing overall government web activity for context. Raw daily totals are available in the tooltip for full transparency.
Why a rolling average? Raw daily .gov traffic swings approximately 2× between weekdays (~75–85M visits) and weekends (~45–58M visits) due to federal workforce patterns unrelated to partisan content. The 7-day window covers one full weekday/weekend cycle, smoothing this variation to a clean backdrop.
Why named domains only? The DAP feed includes an (other) aggregate
bucket that produced an anomalous spike of ~72M visits/day during Jan 4–13, 2026, with no
corresponding increase in any named domain. This is most probably a DAP reporting artifact, not real
traffic (USPS, NIH, IRS, SSA, and other named domains show unchanged visit counts across the
same period). This bucket is excluded from the baseline line. It does not affect the partisan
visit figures, which are computed by looking up specific named domains only.
Key finding: Total .gov traffic was essentially unchanged on Nov 13, 2025 — the 7-day rolling average moved from ~64.0M to ~64.5M — while partisan site visits dropped ~70% (from ~1.3M to ~442K). Visits impacted by partisan messages fell from approximately 2% of total tracked .gov traffic before Nov 13 to approximately 0.6% after. The overlay makes this contrast visible at a glance: the gray line is flat while the bars drop sharply.
apps.irs.gov) is
excluded. For agencies with complex subdomain structures, the estimate understates total web presence.Raw data is available for download. This project is independent and not affiliated with any government agency.