Jump to content

Google launches new search engine to help scientists find the datasets they need


ARMOUR

Recommended Posts

acastro_180508_1777_google_IO_0003.0.jpg

 

Google Dataset Search - A search engine to unite the fragmented world of online datasets

 

Google’s goal has always been to organize the world’s information, and its first target was the commercial web. Now, it wants to do the same for the scientific community with a new search engine for datasets. In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

 

The service, called Dataset Search, launches today, and it will be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports. Institutions that publish their data online, like universities and governments, will need to include metadata tags in their webpages that describe their data, including who created it, when it was published, how it was collected, and so on. This information will then be indexed by Google’s search engine and combined with information from the Knowledge Graph. (So if dataset X was published by CERN, a little information about the institute will also be included in the search.)

 

Speaking to The Verge, Natasha Noy, a research scientist at Google AI who helped created Dataset Search, says the aim is to unify the tens of thousands of different repositories for datasets online. “We want to make that data discoverable, but keep it where it is,” says Noy.

 

The initial release of Dataset Search will cover the environmental and social sciences, government data, and datasets from news organizations like ProPublica. However, if the service becomes popular, the amount of data it indexes should quickly snowball as institutions and scientists scramble to make their information accessible.

 

At the moment, dataset publication is extremely fragmented. Different scientific domains have their own preferred repositories, as do different governments and local authorities. “Scientists say, ‘I know where I need to go to find my datasets, but that’s not what I always want,’” says Noy. “Once they step out of their unique community, that’s when it gets hard.”

 

Noy gives the example of a climate scientist she spoke to recently who told her she’d been looking for a specific dataset on ocean temperatures for an upcoming study but couldn’t find it anywhere. She didn’t track it down until she ran into a colleague at a conference who recognized the dataset and told her where it was hosted. Only then could she continue with her work. “And this wasn’t even a particularly boutique depository,” says Noy. “The dataset was well written up in a fairly prominent place, but it was still difficult to find.”

 

Read More>>

  1. blog.google
  2. The Verge
  3. Dataset Search
Link to comment
Share on other sites


  • Views 579
  • Created
  • Last Reply

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...