How We Navigated the Government Data Microeconomy
When the Citizen Codex team decided to explore racial equity via filings to the Securities and Exchange Commission (SEC), we expected a straightforward interface to pull bulk data. Instead, we found a well-presented, intuitive, graphical user interface (GUI) called EDGAR but no way to do deep research at scale into the free text of corporate filings.
Around this gap has sprung up a microeconomy of 3rd party web services whose sole purpose is to provide easier access to existing SEC data for developers and data analysts. These services, many of which are advanced web scrappers on top of the existing SEC website, charge a hefty sum for access, with some plans starting at $55 per month.
We relied on this microeconomy to produce our latest piece exploring the corporate climate around racial equity. I began the data analysis journey by examining the available results on the EDGAR front end, where I was able to understand important key terms and the ballpark size of the data. But to streamline the retrieval process, which would have taken months through custom scrapper development, I decided to use the third-party tool SEC-API.io. Leveraging both the SEC GUI and the web scrapper allowed us to validate results and adjust as I developed a script for formal extraction.
After completing this piece, I learned that this microeconomy around SEC data is not unique. Indeed, across government, services have cropped up to solve real end-user pain points for accessing and using government data. Some illustrative examples:
- $350 per month to access a unified feed of data from the Centers for Medicare and Medicaid Services
- $500 per year to access Department of Labor data on salary and occupations
- $125 per month to access the Department of Transportation’s National Address Database
To be clear, these services typically provide some enrichment above the existing government dataset or interface, such as normalizing data or joining it with other government datasets. But at their core, they are solving the gaps often faced by data researchers and analysts within existing government data dissemination.
So, what are we to make of this?
One interpretation of this phenomenon is that government agencies will never be able to solve every open data use case, so they should instead prioritize producing machine-readable datasets and feeds. This is true. Government data sources are so general purpose that there will always exist a private sector need to match to market problems. However, the microeconomy of data services suggests that there is more to be desired in machine readability across federal agencies.
Another interpretation is that this is a story fundamentally about state capacity. Decades of disinvestment in government technological capacity have created an environment where agencies are not equipped with the tools or staff to produce data products at the speed and usefulness the public needs. A heavy dose of technical capacity and user-centered thinking would lift the tides of all federal government services.
But a final reading of this situation that I am struck by is that this is a question of power dynamics. Putting clean, normalized, and accessible government data behind paywalls enables well-resourced organizations and individuals to build products that serve their interests. Lowering barriers to entry for access and use of government data equips the local news journalist, the community activist, or the concerned parent with the tools they need to be a more engaged and informed citizen.
Additional work to liberate data from legacy or difficult-to-use databases is a way we can help shift the balance of power towards the most people possible. Recent investments in open data across all levels of government signal that change is coming, but slowly.
In the interim, nonprofits and thoughtful private sector actors should continue to play a role in closing the gap. This is part of the mission of Bellingcat, an open-source intelligence organization I admire. Recognizing the hurdles to analyzing and understanding corporate activities, they recently released an open-source script to directly interface with the SEC data. As Citizen Codex grows, we aspire to build data products with a similar ethos.