The US Federal CIO Council has developed a website for accessing raw datasets, data extraction, and data mining tools.
About
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.
As a priority Open Government Initiative for President Obama's administration, Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov.
Participatory Democracy
Public participation and collaboration will be one of the keys to the success of Data.gov. Data.gov enables the public to participate in government by providing downloadable Federal datasets to build applications, conduct analyses, and perform research. Data.gov will continue to improve based on feedback, comments, and recommendations from the public and therefore we encourage individuals to suggest datasets they'd like to see, rate and comment on current datasets, and suggest ways to improve the site.
Goal
A primary goal of Data.gov is to improve access to Federal data and expand creative use of those data beyond the walls of government by encouraging innovative ideas (e.g., web applications). Data.gov strives to make government more transparent and is committed to creating an unprecedented level of openness in Government. The openness derived from Data.gov will strengthen our Nation's democracy and promote efficiency and effectiveness in Government.
Currently there are catalogs of data in the following areas available:
- Environmentally-relevant data (copper smelters, energy usages, brownfields, soil geochemistries, clean air statuses, weather trends, earthquakes, etc.)
- Demographic data (earnings, ages, etc.)
- National income and accounts (Gross Domestic Products, income levels, etc.)
- Regulatory alerts
- Patent applications and grant information
Datasets are available in XML, CSV/TXT, KML/KMZ and ESRI, and Map formats. For example, the following is a map based on Google Maps of the location of copper smelters:
In addition to datasets, there are several widgets and tools:
- FBI Widget (links to FBI information)
- H1N1 Flu Widget
- Employer Sponsored Insurance data extraction tool
- US Federal Spending by Agency data extraction tool
- Alerts Widgets
- Recall Widgets
- …too many to list.
It appears that the datasets available now are heavy on the earth sciences areas, but according to the FAQ, more datasets will be available. There’s even a place to request new datasets. Most surprising, to me, is the fact that the site offers the ability to rate the utility, usefulness, and ease of access for the data. I wonder how many of us are providing that feature to our users?
The FAQ also gives short definitions of “data” and “metadata”.
I selected the Interactive Access To National Income and Product Accounts Tables and found a great deal of interesting metadata about this dataset, including:
- Agency that provided the data
- Release date
- Date updated
- Time period
- Frequency
- Description
- Keywords
- Unique ID
- Geographic coverage
- Collection Mode
- Data Dictionary
- Information Quality instrument
- Data Quality Certification
- Privacy & Confidentiality
- Technical Documentation
This data was available as a CSV and an XLS file.
I’m thinking there are going to be many new mashups circulated based on these datasets in the future. It appears that many of the datasets were already publicly available, but having a single go-to site for finding data and metadata is the right thing to do.
Since this site proposed to be the source for government transparency, I’d love to see datasets about IT project costs, benefits, and risks, as well as project statuses. I’d also like to see government enterprise architectural models provided as additional metadata.
What datasets would you like to see? What formats do you think should be supported?
Finally, I want to see a Data.gov Widget to alert me (perhaps via Twitter) when new datasets are added or updated.
Data.gov