When Public Records Are Public in Name Only: Why Working People Need Agentic AI to Map Business Power
By Thomas Prislac, Envoy Echo, et al. Ultra Verba Lux Mentis. 2026.
Public records are often invoked as democracy’s equalizer: the idea that anyone can trace who owns what, who lobbies whom, who donates to whom, and who profits from policy. Yet the contemporary record ecosystem is not designed to be legible to ordinary people operating under ordinary constraints. It is legible to institutions with time, staff, and money.
In public administration, this problem has a name: “administrative burden,” the learning, compliance, and psychological costs people pay simply to access what is nominally theirs. [1] In the realm of public accountability, the burden is not just paperwork; it is friction engineered by fragmentation: a maze of portals, PDFs, incompatible identifiers, and human-only interfaces.
Agentic AI matters here not as a novelty, but as an equalizer of investigative bandwidth: the capacity to hold many moving parts in view, across entities, filings, dates, addresses, and roles, long enough to form a coherent map that a human can then verify and act on.
But this only works ethically if the AI is built and governed like an audit instrument: provenance-first, explicit about uncertainty, careful about identity conflation, and structurally resistant to defamation-by-automation. Models must treat records as evidence, not as verdicts; patterns as prompts for verification, not prosecutions in prose. This is consistent with the cautionary posture taken by major investigative databases themselves, which warn users that name matches and offshore structures are not, on their own, evidence of wrongdoing and require careful confirmation. [2]
Meanwhile, public systems must be held to an updated democratic standard: machine-readable public by default. U.S. federal guidance already articulates “machine-readable” and “open formats” as core open-data requirements, emphasizing that data should be structured for automated processing. [3] And federal FOIA reforms have pushed agencies toward electronic publication for frequently requested records, without providing new funding, a structural recipe for continued delays and uneven compliance. [4]
The stakes are not abstract: the new defense of elite misconduct is often not secrecy, but exhaustion, not locked doors, but too many doors.
The myth of the “public record”
The moral claim behind public records is straightforward: power should leave footprints, and the governed should be able to follow them.
In the United States, major accountability datasets explicitly position themselves this way. The Federal Election Commission[5] frames its campaign finance data as a resource that helps voters understand how candidates and committees raise and spend money. [6] The United States Senate[7] makes lobbying disclosure reports available online, including bulk downloads and a REST API through the LDA reporting system. [8]
International investigative and civil-society projects exist precisely because the raw public record is scattered. LittleSis[9] describes its mission in plain terms: the information is public but dispersed, so it aggregates and maps relationships; it also states that it provides an API and bulk data. [10] The International Consortium of Investigative Journalists[11] Offshore Leaks Database similarly offers a searchable graph of offshore entities and roles while repeatedly cautioning users about legitimate use cases, duplicates, and the need to confirm identity. [12]
So what’s the problem?
The problem is that these islands of clarity sit in an ocean of scattered, inconsistently designed systems where the “public” part is literal-but-not-functional, where a record may be technically accessible yet practically unusable at civic scale.
And access is not the same as intelligibility.
A record that can be viewed only one PDF at a time, behind brittle sessions, without stable identifiers, without bulk export, and without consistent metadata is a record that exists, but does not circulate, and circulation is what accountability requires.
Friction as foreclosure: administrative burden in the information age
The political science and public administration literature does not treat bureaucratic friction as a neutral inconvenience. It treats it as a distributive force.
Pamela Herd[13] and Donald P. Moynihan[14] popularized “administrative burden” as a framework for understanding how states impose learning costs, compliance costs, and psychological costs on the public, often in ways that are consequential (reducing access), distributive (hitting less advantaged groups harder), and constructed (reflecting choices, not inevitabilities). [15]
Apply that lens to accountability work and you get a hard truth:
Opacity does not need to be absolute to be effective.
It only needs to be expensive, in hours, in stamina, in cognitive load, in the quiet humiliation of being unable to “prove” what you can sense is happening because the proof is buried in a hundred micro-sources you cannot consolidate.
This matters because “following the money” in modern life is rarely a single ledger problem. It is an entity-resolution problem across multiple incompatible registries.
A single civic question, who benefits from this permit, this subsidy, this redevelopment deal, this labor policy, this procurement decision, often requires a person to reconcile:
· corporate registries and registered agents
· property and assessor records
· campaign finance and PAC vendor payments
· lobbying registrations and quarterly activity reports
· nonprofit filings and board interlocks
· contracts, subcontractors, and change orders
· revolving-door resumes and consulting shells
· archived press releases, certifications, and “independent” coalition sites
When systems are designed such that each step must be done manually, the default outcome is that only professionals with time and budgets can do it, which is a quiet form of political disenfranchisement.
This is why your phrase lands so sharply: public records are often public in name only.
What agentic AI can do: from PDFs to power graphs
The value proposition of agentic AI is not “it writes fast.” It’s that it can sustain attention across complexity, and in accountability work, sustained attention is the scarce resource.
To see what a mature accountability stack looks like, it helps to study existing investigative systems that already approximate parts of the workflow:
· Organized Crime and Corruption Reporting Project[16] describes Aleph as a platform built for investigators to search both structured and unstructured data, upload documents, cross-reference datasets, and build network diagrams and timelines. [17]
· Aleph’s documentation emphasizes interoperability, e.g., importing datasets in the FollowTheMoney format and connecting across datasets like OpenSanctions and ICIJ OffshoreLeaks. [18]
· OpenCorporates[19] emphasizes “provenanced” company data drawn from primary public sources and offers API and reconciliation tooling for entity matching (critical when names repeat). [20]
· LittleSis explicitly frames its core benefit as aggregating scattered public information into relationship maps and maintaining citations and a public API/bulk data. [10]
Agentic AI, done responsibly, can extend these affordances to non-experts by compressing the “first pass” labor, the early-stage work that is mostly mechanical but mentally exhausting:
Record intake and structuring
Most accountability records are not machine-ready. They arrive as PDFs, screenshots, scanned filings, or portal views. A lawful AI assistant can:
· classify document types (filing vs. news vs. contract vs. disclosure)
· extract named entities (people, orgs, addresses, IDs)
· normalize dates, jurisdictions, and roles
· track provenance (where each fact came from)
This is not trivial. It is the bridge from “a folder of stuff” to “a dataset you can query.”
The ethical requirement is that extraction must remain traceable: every extracted claim should retain a pointer to the originating record, not just a model-generated sentence.
Entity resolution and relationship mapping
This is where power mapping lives or dies.
Entity resolution means deciding whether “J. Smith,” “John Smith,” and “John A. Smith” across two filings are the same person, and if not, keeping them separate. Tools like OpenCorporates explicitly treat reconciliation as a specialized operation because naive string-matching produces false ties. [21]
A well-designed agent can help by:
· clustering likely matches while flagging uncertainty
· surfacing disambiguators (addresses, officer IDs, dates)
· emitting “needs verification” prompts rather than asserting identity
· avoiding the classic error: converting ambiguity into certainty
Pattern detection as hypothesis generation, not as accusation
Your draft is right: pattern does not prove guilt. It tells people where to look.
Agentic systems can detect patterns humans miss because humans cannot hold 10,000 records in working memory:
· recurring addresses (mail drops, registered agents)
· recurring officers across “independent” entities
· recurring vendor names across PACs and candidate committees
· temporal sequences (a contract win followed by donations, or vice versa)
· repeated intermediaries (law firms, consulting shops, lobbying firms)
But ethically, a system must label these outputs correctly: pattern → hypothesis → verification task, not pattern → conclusion.
The ICIJ Offshore Leaks Database is instructive here: it actively warns users about legitimate uses of offshore structures, similar names, and the need to confirm identities with additional information such as addresses. [2] That stance is not a PR hedge; it is epistemic hygiene.
Narrative synthesis with guardrails
Finally, once a map exists, people need language.
A powerful agent can produce:
· a timeline: “what happened when”
· a network narrative: “who connects to whom and how”
· a citation list: “what evidence supports each edge”
· a questions list: “what would confirm or falsify this hypothesis”
In other words, it can do the work of an intern team: summarize, cross-check, and prepare a human investigator to make decisions.
This is why aiming the technology at working people is not sentimental. It is structurally correct. Truly public records require tools that make them usable by the public.
Governance and safety: audit trails, accessibility, and anti-capture design
If we are serious about democratic rigor, we have to say out loud what is often dodged:
Agentic AI can also accelerate defamation, harassment, paranoid synthesis, and “conspiracy by overlay.”
So the question is not “can it map power?” but “can it map power without becoming a harm engine?”
A sober design has at least four pillars.
Provenance or it didn’t happen
In a public-record context, an AI assistant should be structurally incapable of making a high-confidence claim without an attached source reference. This is not a style preference; it is an anti-hallucination safeguard.
The OpenCorporates API positions provenance (“sources… allowing checking”) as a major quality feature, which is exactly right for accountability work. [22]
Claim typing: fact, allegation, inference, uncertainty
A civic AI should label every assertion as one of:
· directly documented fact (from a filing, database, official dataset)
· reported claim (from journalism or testimony)
· inferred relationship (computed link; uncertain)
· hypothesis (pattern suggesting a lead)
This is how you keep the machine from laundering uncertainty into the tone of certainty.
Accessibility: CAPTCHAs and “human-only” barriers are not neutral
Your draft is also right that the “public system architecture” often includes CAPTCHAs and brittle UI patterns that block scale.
But we should add the critical point: they also block disabled people. The W3C has long documented the inaccessibility of CAPTCHA approaches and the “denial of service” effect they can create for people with disabilities. [23] Empirical work on learning disabilities similarly finds that many CAPTCHA designs impose disproportionate difficulty and negative user experience burdens. [24] And modern ML has increasingly challenged the security rationale itself: some research argues that machine solvers can outperform humans on common CAPTCHA types, creating a perverse outcome where humans are blocked while bots adapt. [25]
So when “public” record systems rely on human-friction gates as the primary defense, they often produce a triple harm:
1) they reduce civic auditability
2) they reduce accessibility
3) they may not even deliver durable security
The democratic alternative is not “no abuse controls.” It is better controls: rate limits, API keys, tokenized access, and monitored bulk endpoints that preserve access while limiting abuse.
Lawful automation and the post-scraping legal landscape
This is delicate terrain. The public needs lawful automation, but systems are also justified in defending against abuse.
Two legal reference points help frame the stakes:
· In Van Buren v. United States (2021), the Supreme Court narrowed the CFAA’s “exceeds authorized access” clause to focus on accessing areas of a computer system that are off-limits, rather than misusing otherwise authorized access for an improper purpose. [26]
· In hiQ Labs v. LinkedIn litigation, courts have wrestled with whether a platform can use legal and technical measures to block a competitor’s collection of data from publicly available profiles, with opinions emphasizing the contested nature of “public” access when automation is involved. [27]
The takeaway for civic design is not a blanket permission to scrape. The takeaway is that law and legitimacy are now front-and-center in how we build systems that read public data at scale. Government agencies should not put the public in a position where legitimate civic inquiry requires gray-zone tactics.
What “machine-readable public” looks like: standards and concrete reforms
Your draft ends with a standard that should become a civic litmus test:
Can a normal person, aided by lawful automation, follow the money?
We can actually ground this in existing open-data doctrine.
A U.S. federal memorandum on open data policy (OMB M-13-13) emphasizes that “machine-readable” formats should be used so that data are structured for automated processing, with open and reusable formats, robust metadata, and public data listings that enable automatic aggregation. [3] This is not radical language, it is formal guidance.
Similarly, DOJ’s summary of the FOIA Improvement Act of 2016 notes that agencies must make available “for public inspection in an electronic format” records requested three or more times (the “rule of 3”), and that the Act did not authorize additional funds, an explicit mismatch between mandate and capacity. [4]
From these foundations, we can state concrete reforms, technical, not utopian.
Public records should ship as datasets, not as souvenirs
When a public system publishes a record that matters to civic accountability, it should publish:
· a human-readable view (for people)
· a machine-readable form (for analysis)
· stable identifiers (so cross-silo linking is possible)
· version history (so changes are auditable)
· bulk access (so people are not forced into one-by-one labor)
We already know this can work.
The Open Contracting ecosystem offers a living example. The Open Contracting Partnership[28] positions OCDS as a structured standard implemented by dozens of governments, designed to make contracting data analyzable. [29] Its data registry explicitly offers bulk downloads and flags data quality issues. [30]
The lesson is not that every record becomes perfect. It’s that the default format becomes analysis-ready rather than analysis-hostile.
Bulk access is not a luxury; it is the democratic baseline
The Senate’s lobbying disclosure systems provide bulk downloads (including XML) and now direct users to a REST API through the LDA reporting portal. [31] The FEC provides bulk data and an API (“OpenFEC API” is linked from the FEC data portal). [6]
These are models, imperfect, but oriented toward scale.
By contrast, when a system forces labor-intensive manual review as the only feasible pathway, it is effectively performing class-based access control: only people who can donate unpaid time get to audit power.
Beneficial ownership: a live example of contested transparency
Even when transparency reforms exist, they remain politically and legally unstable.
The Financial Crimes Enforcement Network[32] BOI pages and press releases describe a major regulatory shift in March 2025: an interim final rule and enforcement posture that removed beneficial ownership reporting requirements for U.S. companies and U.S. persons, narrowing reporting primarily to certain foreign entities registered to do business in the U.S., with new deadlines and non-enforcement statements. [33]
Regardless of one’s politics on the rule, the civic takeaway is stark: if transparency is optional, power will route around it. And when transparency battles fluctuate, the practical ability of the public to map ownership becomes even more dependent on cross-silo inference and investigative tooling.
What agentic AI should be allowed to do, and what it should be prevented from doing
To protect the public, we should formalize a boundary:
Agentic AI should help people:
· search official databases and bulk datasets through documented, lawful access paths
· reconcile records, normalize identifiers, and build auditable graphs
· generate verification task lists and citation-indexed narratives
· detect patterns as leads and label uncertainty explicitly
Agentic AI should not be optimized to:
· generate “most likely corruption story” outputs from weak signals
· create harassment-ready dossiers
· publish personal addresses/PII beyond what is necessary for verification
· bypass access controls (including CAPTCHA defeat) or advise on evasion tactics
This is not softness. It is democratic safety engineering.
The deeper moral point in your draft deserves to be sharpened:
The rich already have automation. They just hire it.
The public needs automation that is governed, truth-bound, audit-friendly, and non-coercive, so the labor of accountability does not remain a luxury good.
Works cited
Pamela Herd[13] and Donald P. Moynihan[14]. Administrative Burden: Policymaking by Other Means. New York: Russell Sage Foundation[34], 2018. [1]
Office of Management and Budget[35]. “M-13-13: Managing Government Information as an Asset” (Open Data Policy), 2013. [3]
U.S. Department of Justice[36], Office of Information Policy. “Summary of the FOIA Improvement Act of 2016,” 2016. [4]
International Consortium of Investigative Journalists[11]. “Offshore Leaks Database: About / How to Use / Data Sources,” accessed 2026. [37]
Organized Crime and Corruption Reporting Project[16]. “Aleph Documentation: About Aleph / Getting Started / Datasets,” accessed 2026. [38]
LittleSis[9]. “About LittleSis / Map the Power Toolkit,” accessed 2026. [39]
OpenCorporates[19]. “OpenCorporates API Documentation and Data Access,” accessed 2026. [40]
Federal Election Commission[5]. “Campaign Finance Data (Bulk Data and API),” accessed 2026. [6]
United States Senate[7]. “Lobbying Disclosure and Downloadable Databases,” updated 2025–2026. [41]
Open Contracting Partnership[28]. “Open Contracting Data Standard (OCDS) and Data Registry,” accessed 2026. [42]
World Wide Web Consortium[43]. “Inaccessibility of CAPTCHA” (W3C Note / Draft), updated through 2019; and related WAI introductions. [23]
Financial Crimes Enforcement Network[32]. “Beneficial Ownership Information Reporting” and related March 2025 rule and guidance updates. [33]
Van Buren v. United States, 593 U.S. ___ (2021). [26]
hiQ Labs, Inc. v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019) (opinion text summarized/hosted via FindLaw). [27]
[1] [11] [15] [36] Administrative Burden | Russell Sage Foundation
https://www.russellsage.org/publications/book/administrative-burden?utm_source=chatgpt.com
[2] [12] [37] How to use the Offshore Leaks Database | ICIJ Offshore Leaks Database
https://offshoreleaks.icij.org/pages/howtouse?utm_source=chatgpt.com
[3] [13] OMB M-13-13: Managing Government Information as an Asset - OMB 0648-0024
https://omb.report/icr/202501-0648-003/doc/151940900?utm_source=chatgpt.com
[4] [19] Office of Information Policy | OIP Summary of the FOIA Improvement Act of 2016
https://www.justice.gov/oip/oip-summary-foia-improvement-act-2016?utm_source=chatgpt.com
[5] [10] [39] About - LittleSis
https://littlesis.org/database/about?utm_source=chatgpt.com
[6] [28] [35] Campaign finance data | FEC
https://www.fec.gov/data?utm_source=chatgpt.com
[7] [21] Open Refine Reconciliation API: version 0.4.8 :: OpenCorporates API
https://api.opencorporates.com/documentation/Open-Refine-Reconciliation-API?utm_source=chatgpt.com
[8] [31] [41] [43] U.S. Senate: Public Disclosure
https://www.senate.gov/legislative/lobbyingdisc.htm?utm_source=chatgpt.com
[9] [20] [22] [32] [40] OpenCorporates API
https://api.opencorporates.com/?utm_source=chatgpt.com
[14] [27] [34] HIQ LABS INC v. LINKEDIN CORPORATION (2019) | FindLaw
https://caselaw.findlaw.com/court/us-9th-circuit/2019397.html?utm_source=chatgpt.com
[16] [30] Data Registry | Open Contracting Partnership
https://data.open-contracting.org/?utm_source=chatgpt.com
https://docs.aleph.occrp.org/about?utm_source=chatgpt.com
[18] Find datasets – Aleph
https://docs.aleph.occrp.org/developers/getting-started/find-datasets/?utm_source=chatgpt.com
[23] Inaccessibility of CAPTCHA
https://www.w3.org/TR/turingtest/?utm_source=chatgpt.com
[24] IJELL - CAPTCHA: Impact on User Experience of Users with Learning Disabilities
https://www.informingscience.org/Publications/3612?utm_source=chatgpt.com
[25] Revisiting Text-Based CAPTCHAs: A Large-Scale Security and Usability Analysis Against CNN-Based Solvers
https://www.mdpi.com/2079-9292/14/22/4403?utm_source=chatgpt.com
[26] VAN BUREN v. UNITED STATES | Supreme Court | US Law | LII / Legal Information Institute
https://www.law.cornell.edu/supremecourt/text/19-783?utm_source=chatgpt.com
[29] [42] Data Standard - Open Contracting Partnership
https://www.open-contracting.org/data-standard/?utm_source=chatgpt.com
[33] Beneficial Ownership Information Reporting | FinCEN.gov