By Anna Russell, Electronic Resources Librarian
This post is meant to summarize recent efforts by various private and academic institutions to save federal government information for public access. I am referring specifically to online information, typically agency website reports, pages and datasets but also federal commission work and other federal entity online presence pieces. I do not turn my attention to the social media posting by these same federal entities, but Facebook and Twitter pages and feeds remain another arena for collection efforts. Read more here from the NC State libraries on social media harvesting tools.
The Federal Process for Saving Online Data?
Currently, there is no federal system for preserving the entirety of federal government online information for public access. There are some purely apolitical reasons for the lack of a systematic approach to creating online site archival pages such as lack of server space and too few human agency records managers to properly preserve the online content. There is no Federal Depository Library Program for online content. Lack of a standard method for structuring agency website content is also an issue. An agency application, pulling datasets from multiple sources can be difficult to digitally preserve. Likewise, until recently, harvesting ftp materials has been out of scope for web crawlers.
Is Anyone Minding the Store (of Information)?
Interestingly, as the Trump administration made preparations to govern, many began looking into this issue of federal agency web content preservation. Stepping back for a moment, in 2004, National Archives and Records Administration (NARA) mandated that all agencies capture their web content but has not required such full-scale preservation since then. Under the current Office of Management and Budget guidelines, Circular A-130, federal agencies are not required to maintain nor provide GPO or NARA materials they create so long as the work is not officially categorized as a “record” or a “report” or a “publication.”
Enter the End of Term Project: Every four years, beginning in 2008, libraries and academics including the Library of Congress, the Internet Archive, the University of North Texas, ann Stanford work feverishly to preserve the previous U.S. administration’s publicly accessible .gov and .mil sites and social media pages, statistics, pdfs and reports from http/https content. According to Mark Phillips, a founder of the End of Term project, by overlaying the archived 2008 pdfs with its 2012 sister project’s saved pdfs, he found that approximately only 17 percent of the 2008 pdfs remained online in 2012 (i.e. thank goodness for the End of Term project!). For the 2016 End of Term project, the harvesting and archiving of federal domains far surpassed the previous two end of term projects.
The energized End of Term project folks continue to spread the word about the importance and issues surrounding preservation of online government information. I include a Google doc list of current data rescue efforts here. They are also currently recruiting technical librarian volunteers. There are over 1200 records that need metadata support. If you are interested in helping create good archival records, the End of Term has created a volunteer metadata cataloging Google doc guide here.