Documentation
This is a guide to explain how the Agency Dashboard measures and tracks implementation of Project Open Data. This website also provides a number of tools and resources to help agencies implement their open data programs and many of those are documented here as well.
You can help edit this documentation on GitHub.
Agency Dashboard
The Agency Dashboard is used to track how agencies are implementing Project Open Data (aka OMB M-13-13). This is done in two ways:
- Review of the leading indicators (detailed below) by OMB staff; see also the Leading Indicators Strategy Rubric
- Automated metrics that analyze machine readable files (eg, data.json, digitalstrategy.json)
The leading indicators are scored by OMB staff and they can involve a more subjective evaluation process. The indicators are also informed by the automated metrics which are generated by a daily automated script that analyzes files on agency websites to understand the progress and current status of their public data listings.
Milestones
The dashboard is oriented around quarterly milestones. You can use the blue milestone selection menu to navigate between milestones. The OMB scoring as well as the automated metrics are always tied to a specific milestone. The automated metrics will update every 24 hours until the end of the quarter when the milestone has been reached. At that point those automated metrics will represent a historical snapshot. To see the most current automated metrics, you'll need to view the current quarter (the next approaching milestone).
Leading Indicators Strategy
The "Leading Indicators Strategy" refers to the five categories of indicators drawn from the Cross Agency Priority Goals (CAP Goals) for Open Data. The strategies are based on the Enterprise Data Inventory, the Public Data Listing, Public Engagement, Privacy & Security, and Human Capital and are all detailed below.
Doughnut Charts
There are three doughnut charts (a variation on a pie chart) displayed at the top of the Leading Indicators section. Each of these charts will only be displayed if the data is available.
Inventory Composition
This chart shows the accessLevel
percentages in the Enterprise Data Inventory ("public", "restricted public", and "non-public").
Public Dataset Status
This chart shows the percentage of public datasets ("accessLevel":"public"
) that include a distribution with at least one downloadURL
provided. A dataset without a distribution or a dataset that only includes a distribution with an indirect accessURL
does not count as publishing a link to raw data.
Dataset Link Quality
This chart shows percentages for HTTP status codes from the total number of access and download links. For more information on how these numbers are determined, see the Quality Check Analysis overview and the section for each status code linked below:
- Working Links (HTTP 2xx)
- Redirected Links (HTTP 3xx)
- Broken links (HTTP 4xx)
- Error Links (HTTP 5xx)
- Unreachable Links (Other) (HTTP 0)
NOTE: The percentages for Error Links and Broken Links are currently mixed-up (swapped) in this chart. This should be fixed by April 17th 2015
Leading Indicators
The Leading Indicators Strategies described above are broken down here into their component parts:
Enterprise Data Inventory
Overall Progress this Milestone
This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)
Inventory Updated this Quarter
This element captures whether or not an agency had submitted an updated Enterprise Data Inventory into OMB Max by the milestone deadlines.
Number of Datasets
This element accounts for the total number of all datasets listed in the Enterprise Data Inventory. This includes those marked as "Public", "Non-Public" and "Restricted". (Quantitative)
Schedule Delivered
This element captures whether or not an agency has successfully submitted a schedule, via the digitalstrategy.json, or via another document on their agency's website, that indicates a schedule of deliverables against the various outlined milestones for the Open Data Initiative. (Qualitative)
Bureaus represented
This number represents the number of bureaus (based on codes from OMB Circular A-11 for the Common Government-wide Accounting Classification - CGAC in Appendix C) that have data sets reported in the agency's EDI. (Quantitative)
Programs represented
This is a count of primary agency programs that are represented within the EDI based on the Federal Program Inventory. (Quantitative)
Access Level = Public
This is a count of data assets that are or could be made publicly available to all without restrictions. (Quantitative)
Access Level = Restricted
This is a count of data assets that are available under certain use restrictions. (Quantitative)
Access Level = Non-Public
This is a count of data assets is not available to members of the public. (Quantitative)
Inventory > Public listing
This is a comparison of the count of data sets (including those marked as "Public", "Non-Public" and "Restricted") in the EDI versus those in the entire public data listing. In is rare for the EDI to be equal to the PDL (which indicates all data sets are publically accessible) and is often greater than the PDL. If the EDI is less than the PDL, this indicates an error in reporting and collection. (Quantitative)
Percentage growth in records since last quarter
This is calculated by subtracting the last quarter's EDI count of data sets from the current quarter's EDI count of data sets, then divided by last quarter's EDI count of data sets, then multiplied by 100 in order to get the percentage ([(Qb - Qa) / Qa] * 100)
(Quantitative)
Schedule Risk for Nov 30, 2014
This is an objective evaluation (Green = On Schedule, Yellow = Possible Schedule Issues, Red = Schedule Miss/Incomplete) if an agency will be able to make/deliver on their published digital strategy deliverables for Open Data milestones that were outlined in OMB M-13-13. (Qualitative)
Spot Check - Site search, SORNs, PIAs, FOIA
This is a check by OMB eGov for SORNs (System of Records Notices), PIAs (Privacy Impact Assessments), FOIA (Freedom of Information Act) statements, and through a search for typical data file types, for example (the number in parenthesis indicates how many files matched in the search were returned - the example below is via Google):
allinanchor: site:agencydomain.gov filetype:xls
(5,000)
allinanchor: site:agencydomain.gov filetype:csv
(300)
allinanchor: site:agencydomain.gov filetype:xml
(38,000)
Public Data Listing
Overall Progress this Milestone
This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)
Number of Datasets
This element captures the count of publically listed data sets via the published Public Data List, and corresponds to the number captured during the dashboard's automated crawl. (Quantitative)
Number of Public Datasets with File Downloads
This element captures the count of downloadable publically listed datasets ("accessLevel":"public"
) via the published Public Data List, and corresponds to the number captured during the dashboard's automated crawl. This should correspond with downloadURL
in the PDL JSON file that is the URL providing direct access to the downloadable distribution of a dataset. In version 1.0 of the POD Schema, this metric was based on the accessloadURL
, but since v1.1 it has been based on the downloadURL
. (Quantitative)
Total number of access and download links
The total number of accessURL
and downloadURL
URLs in distributions for public datasets ("accessLevel":"public"
).
Quality Check Analysis
The Quality Check fields show the breakdown of HTTP status codes for accessURL
and downloadURL
URLs in distributions. This check uses the HTTP HEAD method to analyze results.
In some cases, the analysis may determine that the HTTP HEAD check did not work properly and it will fall back to HTTP GET. However, some servers do not properly support HTTP HEAD and may return false positive results (e.g. an HTTP GET request that would normally return HTTP 200 may erroneously return an HTTP 500 for the same request using HTTP HEAD). These false positives are the result of web servers that do not correctly implement the HTTP standard since HTTP HEAD support is required. RFC2616 Section 5.1.1 - Method states:
The methods GET and HEAD MUST be supported by all general-purpose servers.
Working Links (HTTP 2xx)
The number of accessURL
and downloadURL
URLs that respond with an HTTP 2xx status code.
Redirected Links (HTTP 3xx)
The number of accessURL
and downloadURL
URLs that respond with an HTTP 3xx status code.
Broken links (HTTP 4xx)
The number of accessURL
and downloadURL
URLs that respond with an HTTP 4xx status code.
Error Links (HTTP 5xx)
The number of accessURL
and downloadURL
URLs that respond with an HTTP 5xx status code.
Unreachable / Server Not Found / Other (HTTP 0)
The number of accessURL
and downloadURL
URLs that do not return an HTTP status code within 5 seconds.
Percentage growth in records since last quarter
This is calculated by subtracting the last quarter's PDL count of data sets from the current quarter's PDL count of data sets, then divided by last quarter's EDI count of data sets, then multiplied by 100 in order to get the percentage ([(Qb - Qa) / Qa] * 100)
(Quantitative)
Valid Metadata
See the section for Valid Schema
/data
This element indicates if an agency has published a page for their Open Data activities, often containing links to their data catalog, links to other Open Data related documents, as well as the Digital Strategy.
/data.json
This element collects whether the agency has successfully published a data.json file, which contains the whole of the Public Data Listing.
Harvested by data.gov
This element captures if DATA.GOV has harvested the PDL for indexing via regular crawls. This usually requires notifying GSA (who houses the DATA.GOV team) to index the PDL.
Public Engagement
Overall Progress this Milestone
This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)
Description of feedback mechanism delivered
This element is a narrative provided by the agency, through the Digital Strategy, on how it plans to engage the public for Open Data initiative activities, including developing two-way forms of communication (e.g., social media, etc.), issue tracking, outreach and other items. These methods should add-value to the open data activities and directly address the public and customer needs. (Qualitative)
Data release is prioritized through public engagement
This is a measure, based on information provided to OMB for review, or gathered from the agency Open Data websites and public engagement mechanisms, that data sets have been identified by the public and been prioritized for release based on that engagement (such as e-mail, public open data events, IdeaScale/GitHub/Twitter, etc.) activity. This may include those that were requested via FOIA mechanisms or other formal requests. (Qualitative)
Feedback loop is closed, 2 way communication
This element is an assessment, based on information provided by the agency and confirmed, through review of published public feedback mechanisms by OMB (including reviewing post-event outcomes - such as those from datajams and datapaloozas), if the input from public engagement is acted upon and produces an output to the open data milestone activities, such as inclusion of data sets, quality improvement, format changes, API development or other outcomes. (Qualitative)
Link to or description of Feedback Mechanism
This element should contain a link (URL, email address, etc.) of the primary feedback mechanism used for customer engagement. If more than one is regularly used, this should be a small narrative about each mechanism and how it's used to interact with the public for engagement activities.
Privacy & Security
Overall Progress this Milestone
This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)
Data Publication Process Delivered
This element captures the state of the Open Data publication process deliverable. The process is often located within the Digital Strategy for an agency, and is usually contained and updated within the JSON file. Some agencies have independently published this schedule on their websites separate from the Digital Strategy site, which is not recommended.
Information that should not be made public is documented with agency's OGC
As part of the Data Publication Process (this element can't be "Green" without the previous element existing), the Office of General Counsel (OGC) or the agency's Office of the Solicitor, is listed as part of the process for determining which data sets are to be released publically. (Qualitative)
Human Capital
Overall Progress this Milestone
This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)
Open Data Primary Point of Contact
This element should contain the name (and/or contact information) for an agency's primary point of contact for Open Data Initiative activities.
POCs identified for required responsibilities
This element accounts for the agency identifying and publishing primary points of contacts for Open Data activities.
Automated Metrics
These fields are determined by an automated script that analyzes agency data.json, digitalstrategy.json, and /data files.
The automated metrics will update every 24 hours until the end of the quarter when a milestone has been reached. At that point those metrics will represent a historical snapshot. To see the most current automated metrics, you'll need to view the current quarter (the next approaching milestone).
Expected URL
This is the URL where the data.json file is expected to be found. This is based on the main agency URL provided through the USA.gov Directory API
Resolved URL
This is the URL that is resolved after following any redirects.
Redirects
This is the number of redirects used to reach the final data.json URL. Currently this is only set to follow 5 redirects before stopping.
Ideally this should be 0
HTTP Status
This is the HTTP status code received when attempting to reach the expected or resolved URL. For more information on properly using HTTP status codes, see: Knowing Your HTTP Status Codes In Federal Government
This should be 200 it the data.json or /data URL was found successfully.
Content-Type
The Content-Type is how the server announces the type of file it is serving at the requested URL. Usually it won't break anything if this is set incorrectly, but some applications may need to be set to force it to be read as JSON even if it announces it's something else. This is very similar to how a file extension on a file identifies the file type. Yes, the URL says data.json, but the browser just sees that as an arbitrary URL. The Content-Type is what identifies the actual file type. Setting this incorrectly would be like if you had a file named graph.pdf that was actually a CSV spreadsheet file.
The character encoding should also be specified as part of the Content-Type. This encoding should match the actual encoding of the text in the file. The correct character encoding for JSON is always unicode, preferably UTF-8.
For data.json this should be: application/json; charset=utf-8
For /data this should be: text/html; charset=utf-8
Valid JSON
This identifies whether the data.json was actually JSON. Even if the HTTP Status is 200 for the data.json URL and the Content-Type announces it's application/json; charset=UTF-8 the response might actually be HTML or improperly formatted JSON. If the syntax of the file can be parsed as JSON, the validator will attempt to do additional analysis, but the file may in fact still be invalid JSON if it doesn't use the proper text encoding. While it is possible for the validator to convert the file to the correct encoding to do this additional analysis, it's important that the correct encoding be used at the source so that others will be able to parse the JSON without knowing they need to convert it to a valid encoding. JSON must use Unicode text encoding (use UTF-8) and it should not include a byte order mark. It's highly recommend you generate your JSON with a tool designed to produce JSON rather than attempt to produce JSON by hand. You can check how well formed your JSON is with a tool like JSONLint. When using this tool it is best to enter the URL of the JSON file rather than copying and pasting the JSON. This is because when you copy and paste the raw JSON, your browser may attempt to automatically fix problems that the server will not know to fix when it retrieves the file directly.
The "Public Datasets" column on the main agency dashboard table will be green if it's a valid JSON file and red or yellow otherwise. If it's not a valid JSON file, the "Valid Metadata" column can't be green - at best it can be yellow. If it's not valid JSON it most likely can't be parsed regardless of how valid the metadata schema is, so this is a serious consideration. This also means it's possible to be listed under the "Valid Metadata" column in yellow even if 100% of the records validate against the schema.
Datasets with Valid Metadata
The percentage and specific number of datasets in the data.json file that successfully validate against the Project Open Data schema.
The "Valid Metadata" column on the main agency dashboard table will be green if 100% of the metadata records validate against the Project Open Data schema and they are from a valid JSON file. It's possible to have 100% valid metadata records but still be shown as yellow if it's not a valid JSON file. Any record that doesn't validate against the schema won't meet the requirements and also won't be included by harvesters like data.gov.
Valid Schema
This identifies whether the data.json has all the required fields and has values that fit within the parameters specified by the Project Open Data schema.
Schema Errors
This displays instances where the data.json doesn't validate against the Project Open Data schema based on rules codified within a JSON Schema document hosted on Project Open Data. For more detailed and more readable results, you should use the Project Open Data validator
Datasets
The total number of datasets listed in the data.json file
Datasets with Downloadable URLs
The total number of datasets listed in the data.json file that include an accessURL
for a downloadable file
Total Downloadable URLs
The total number of accessURL
download links listed for all datasets in the data.json file
Server Not Found
The number of accessURL
download links with a server or domain name that could not be reached. In the error log CSV file this is listed with an error_type
of "broken_link" and an http_status
of "0".
Broken links (accessURL 4xx)
The number of accessURL
download links where the server responded indicating the URL could not be found. In the error log CSV file this is listed with an error_type
of "broken_link" and an http_status
of anything that starts with "4".
Error Links (accessURL 5xx)
The number of accessURL
download links where the server responded indicating the URL had an error preventing it from properly working. In the error log CSV file this is listed with an error_type
of "broken_link" and an http_status
of anything that starts with "5".
Redirected Links (accessURL 3xx)
The number of accessURL
download links where the server responded indicating the URL had moved to a new location. In the error log CSV file this is listed with an error_type
of "broken_link" and an http_status
of anything that starts with "3".
Correct format (accessURL/format)
The number of accessURL
download links where the server responded indicating that the format of the resource did not match what was specified in the data.json metadata. In the error log CSV file this is listed with an error_type
of "format_mismatch" and the format specified by the server is format_served
while the one listed in the data.json is format_datajson
PDF for raw data (accessURL)
The number of accessURL
download links where the server responded indicating that the format of the resource was a PDF file. The accessURL
should point to raw machine readable data (like a spreadsheet) rather than documents. Use references
, dataDictionary
for documents meant to accompany data.
HTML for raw data (accessURL)
The number of accessURL
download links where the server responded indicating that the format of the resource was an HTML file. The accessURL
should point to raw machine readable data (like a spreadsheet) rather than documents. Use references
, dataDictionary
, or landingPage
for documents meant to accompany data.
Bureaus Represented
The number of bureaus used throughout the data.json metadata as specified with bureauCode
.
Programs Represented
The number of programs used throughout the data.json metadata as specified with programCode
.
Data.json File Size
The size of the data.json file the last time it was checked by the validator (for the selected milestone)
Data.json Last Modified
The last time the data.json file appears to have been updated (for the selected milestone)
Data.json Last Crawl
The last time this validator analyzed the data.json file