Monday, May 12, 2008

Poweset hype, and Norvig pooh pooh's on NL

I saw a post this morning about Peter Norvig's remarks a few months ago about his perceptions of NL, and how it's all but useless in providing value to web search. The post resurfaced during this weekends' buzz over Powerset. Here's my reply.

I believe the discussion around Powerset and its potential suitors is on a misguided trajectory. A few months ago, Peter Norvig stated that NL provides only marginal advances over the state-of-the-art keyword search technologies, and that key word lookup is actually more natural for users than NL questions and phrases. As a NLP advocate in general, and a die-hard advocate of knowledge-driven NLP, I am amazed to find myself in perfect and absolute agreement with Norvig's assertions. A simple and concise list of keywords are the most suitable interface for search and retrieval of text documents from the WWW.

But the focus on document search as the future of information retrieval is itself a fallacy. Google's blindspot, and potential undoing, is the insurgent linked data web, or web of data, or semantic web, or web 3.0 (pick your favorite), which has been heralded in by Tim-Berners Lee. This vision will allow the web to consist primarily of structured databases comprised of graphs linked together by dereferencable, non-ambiguous URIs. For the data contained in any segment of this global graph, and the schema encapsulating the data, the convenience of having a consistent model for data exploration, and the notion of a fixed domain of discourse to guide UI designers will become a thing of the past. The user will no longer "search for a page using keywords", but will instead "lookup an entry by description". Any one "lookup" may span dozens of domains of knowledge/ontologies/schema, and will yield result sets of such breadth and heterogeneity as would defy any attempt at achieving the GUI consistency of Google's ranked list of links. People using this gloabal graph will search not for pages deemed relevant to a bag of words based on the consensus of the crowd. Instead, users will look up people, places and things, and links and relations of varying complexity between them, using unambiguous references to those entities. In order to perform these laserbeam-like lookups, users will demand to leverage the interface they have spent a lifetime mastering, a UI that is no less natural (in the task of expressing relationships between things) than the natural language user interface (NUI), where noun phrases and named entities will allow users to make reference to a set of URIs as expansive as the NL lexicon itself, while verbs, adjectives, relational nouns, prepositions and modifiers will offer users a broad and rich set of operators for describing the links between those URIs. There is a time and a place for every purpose under heaven, and I believe this is the proper place for NL technologies. NL and the SW shall evolve together, and each will symbiotically facilitate the critical mass adoption of the other.

I believe every contributor to NL should be involved in a project which seeks fuse the semantic web with NL.

