Jumat, 24 Agustus 2012

Google's Experimental Table Search Is Better than Ever at Extracting Data from the Web


The amount of information on the web is staggering and it's growing all the time. But making sense of it all is hard, for both people and machines. For people, the problem is that it's too much info to go through.

For machines, the problem is that the info doesn't really make sense, it's all there but it's hard to sort and categorize. Structured data is the answer to that.

More and more websites are using structured data and a lot more meta information to make it easier for search engines to understand it, but this doesn't help much with the data that's already available and that is not getting updated.

For that, only increasingly smarter algorithms are the answer. Understandably, Google is working on that as well. One such project is Table search, an experimental tool, which, as the name implies, is designed to make it easier to uncover tables from the web.

There is plenty of data in tables online. But tables are used for plenty of other stuff, not the least of which is for website layouts. The search algorithm needs to be able to tell the difference and needs to determine whether the data in a table is relevant and useful or whether it's garbage.

"While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are 'good' (one that contains meaningful structured data) and which ones are 'bad' (for example, a table that hold the layout of a Web page)," Johnny Chen explained on Google's Research Blog.

"In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations," he added.

"We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have," he added.

The Knowledge Graph is Google's biggest move towards understanding data and the web rather than just curating and retrieving it, and it's the largest product to date to rely on structured data. It's no wonder that the technology behind it is making all types of search better.

The latest version of Table Search is still experimental, for all the improvements, and is still aimed at a very specific set of people. But all the lessons learned with Table Search will eventually make Google Search better for everyone.

Via: Google's Experimental Table Search Is Better than Ever at Extracting Data from the Web

Tidak ada komentar:

Posting Komentar