News about Cypher, Semantic Web, Natural Language Processing, and Computational Linguistics

Monday, May 12, 2008

Poweset hype, and Norvig pooh pooh's on NL

I saw a post this morning about Peter Norvig's remarks a few months ago about his perceptions of NL, and how it's all but useless in providing value to web search. The post resurfaced during this weekends' buzz over Powerset. Here's my reply.

I believe the discussion around Powerset and its potential suitors is on a misguided trajectory. A few months ago, Peter Norvig stated that NL provides only marginal advances over the state-of-the-art keyword search technologies, and that key word lookup is actually more natural for users than NL questions and phrases. As a NLP advocate in general, and a die-hard advocate of knowledge-driven NLP, I am amazed to find myself in perfect and absolute agreement with Norvig's assertions. A simple and concise list of keywords are the most suitable interface for search and retrieval of text documents from the WWW.

But the focus on document search as the future of information retrieval is itself a fallacy. Google's blindspot, and potential undoing, is the insurgent linked data web, or web of data, or semantic web, or web 3.0 (pick your favorite), which has been heralded in by Tim-Berners Lee. This vision will allow the web to consist primarily of structured databases comprised of graphs linked together by dereferencable, non-ambiguous URIs. For the data contained in any segment of this global graph, and the schema encapsulating the data, the convenience of having a consistent model for data exploration, and the notion of a fixed domain of discourse to guide UI designers will become a thing of the past. The user will no longer "search for a page using keywords", but will instead "lookup an entry by description". Any one "lookup" may span dozens of domains of knowledge/ontologies/schema, and will yield result sets of such breadth and heterogeneity as would defy any attempt at achieving the GUI consistency of Google's ranked list of links. People using this gloabal graph will search not for pages deemed relevant to a bag of words based on the consensus of the crowd. Instead, users will look up people, places and things, and links and relations of varying complexity between them, using unambiguous references to those entities. In order to perform these laserbeam-like lookups, users will demand to leverage the interface they have spent a lifetime mastering, a UI that is no less natural (in the task of expressing relationships between things) than the natural language user interface (NUI), where noun phrases and named entities will allow users to make reference to a set of URIs as expansive as the NL lexicon itself, while verbs, adjectives, relational nouns, prepositions and modifiers will offer users a broad and rich set of operators for describing the links between those URIs. There is a time and a place for every purpose under heaven, and I believe this is the proper place for NL technologies. NL and the SW shall evolve together, and each will symbiotically facilitate the critical mass adoption of the other.

I believe every contributor to NL should be involved in a project which seeks fuse the semantic web with NL.

Blogger Michael said...

Hi Sherman,

Yes, agreed totally.

While I'm not sure I would write off Google's ability to move in this direction, your assessment is I believe correct and success is often the cocoon that blinds us to changing seasons.

But, actually, I found your UI comment most telling. What that might be for linked data I suspect is wide open. As for Google, was it PageRank or the clean interface without allowing bought placements that was the difference?

Then, too, some would argue for cosmic serendipity in such matters!

So, I think we can readily point to the things that can screw it up, but the absence of those factors by no means ensures success.

At any rate, please keep posting on the UI imperatives. I agree these are one of those factors needed for success.

7:07 PM  
Blogger sdmonroe said...

Hi Michael,

Yes, there are many factors needed for success. I always say that Google is the most suited player to bring the Semantic Web to fruition, by sheer brute force if nothing else, but it's not surprising their aversion towards an effort that seeks to backtrack and fix the a flaw of the WWW they've capitalized on for so long, i.e. the problem of ambiguity on the web. If we succeed, then their secret cookbook of recipes for overcoming this flaw will be rendered obsolete in one fail swoop, and a critical part of their industry leverage will vanish with it. But as you said, success is indeed one deceptive monster. I almost wonder if Norvig really believes what he says or if this is simply Google's "official public stance".

I intend to blog more often as we begin to announce and roll out a new set of services and accompanying software around Cypher, so please stay tuned.

And thanks so much for your feedback!

8:49 PM  
Blogger sdmonroe said...

"So, I think we can readily point to the things that can screw it up, but the absence of those factors by no means ensures success."

You're right, but I know the hand of the good Lord is with us, and He will make the path straight for us. These challenges will be surmounted, because what the effort stands for is right. This is the laying down of a proper foundation for technologies we have not yet dreamed of nor can currently imagine. The flawed foundation we have today, which Google and others wish to present as sufficient, does not provide what is needed for tomorrow, which holds a far bolder vision than many people realize. So we must strive to leave the best possible foundation for posterity. God's laws of nature and evolution always favors improvement, this is why we will succeed.

9:08 PM  
