Google has launched a customized search aimed at ‘scientists, data journalists, and data geeks’ who need to find datasets no matter where they’re hosted.
The aim of the search is to let people find the data they need from the many data repositories on the web. The tool works in a similar way to Google Scholar, which can be used to search academic papers for data.
Dataset Search in part relies on the creators or providers of the dataset making metadata available for the search, such as who created the dataset, when it was published, a citation describing the dataset, summary keywords, and spatial coverage. These metatags are indexed by Dataset Search and combined with input from Google’s Knowledge Graph, which is what shows as an infobox next to search results to make the results more useful. Google collects and links this information, analyzes where different versions of the same dataset might be, and finds publications that may be describing or discussing the dataset.
The current version of Google Dataset Search has references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organizations.
The developers say that as more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search will continue to grow. As Google acknowledges, the success of DataSet Search will depend on organizations choosing to add the metadata tags to their material to make it accessible to the indexing process, but given the power of Google, it’s unlikely that any organization making data available on the web will ignore this requirement.
Dataset Search works in multiple languages, and support for additional languages is ‘coming soon’.
or email your comment to: [email protected]