Monrai Blog

News about Cypher, Semantic Web, Natural Language Processing, and Computational Linguistics

Saturday, June 21, 2008

Response to Dan Grigorovici's SemanticWeb.com articles (Part 1)

Dan Grigorovici, AOL exec, semantic web evangelist, and good friend of mine, is doing a series of articles for SemanticWeb.com. In it, he makes a call to action to the semantic web community, and admonishes us to buckle down on the PR aspect of the semantic web (or the lack thereof). I'd like to attempt to offer some responses to the issues he addresses.

I was having a conversation with a few of the "usual suspects" of the Semantic Web evangelical crowd, and it was mentioned that one of the problems we face is how do you make money in a medium that is not eyeballs-driven. Because the semantic web is a technology who users are mostly made up of machines, the catch-all monetization strategy of Web 1.0, advertising, does not apply. Someone then took the words out of my mouth by saying something to the effect that you don't have to have people looking at a web page to deliver an ad to them, the ad can be delivered across other mediums, SMS, etc. The problem is that current advertising is obtrusive. I am reminded of how one time I had a really big headache, and I went to the drug store to buy aspirin, and found myself in the aisle asking "Now, what's the name of that 'I have a headache this big' medicine?". A classic example of a good product that was offered to me at a bad time. So an improvement that is needed is the injection of "context" into the equation, being able to deliver to a user a product, service or opportunity, at the most opportune and relevant moment, based on their current need, time, and place. At the time I saw that commercial (btw, the brand is Excedrin), I was a small child and probably had never had a headache. But when I finally entered the market for it, I was unable to find/recall it.

I added that user behavior, interests, etc can be collected (by permission) and used to drive more intelligent referrals for purchase decisions. People are always looking for better advice before buying, case in point is an experience I had with a crummy airline. Had I had a service that could have made a quality recommendation on my airline (taking price, quality preference, and other factors together), I would have literally saved hundreds of dollars.

So one business model that will, IMHO, be a bread-and-butter source of revenue for Semantic Web companies will be in "cooking" triples that describe users and the things that interest them, to provide the knowledge/intelligence needed to fuel the next generation of recommendation services. These services will connect customers to the things they want and need with laser-like precision, and will deliver these laser-beam recommendations unobtrusively across a myrid of channels. Companies will pay a premium to have their products and services delivered by such services.

Stay tuned for the next micro-post, where I continue to offer monetization suggestions for Semantic Web startups.

Labels: , ,

Thursday, June 19, 2008

The High Definition Web

A lot of people have requested the presentation "Emergent Data and Semantics From Social Collaboration", prepared by Soren Auer and myself, for the Linked Data Planet 2008 Spring conference, so I have placed it online. (For now, as you read, please refer to the slides, I'll get some images posted soon).

In it, I expound on the trend towards a more High Resolution Web, or High Definition Web, where machines are able to see a richer description of people, places and things. Here is a bit of my notes from that talk for those who could not attend.

When we talk about Social Collaboration, as social creatures, sharing is a natural component of our evolutionary adaptation. The internet provided an infrastructure to connect computers, and WWW provides the means of performing this inherit human behavior of sharing, mainly documents across that connection. One of the greatest contributions the WWW made was that I could open a text editor, write something, then instantly share it asynchronously with someone across the world. But the document web limits sharing, i.e. the Social part of the WWW. Here are a couple of analogies that help illustrate this notion:

The Teller that Couldn't Tell:
- Suppose you deposit money, then I request a withdrawal, and the following conversation ensues:

You: I'd like to withdraw $20.oo please
Teller: Let me search for that, I’ll be right back... Ok, I found $10 that may be yours [or] I found $5 of the $20 you requested. Instead of telling me the amount you deposited, can you tell me what you had on when you deposited it, that may help me cross reference and find your deposit better?

Ridiculous, right? What's hindering the teller from delivering exactly what you requested?
-No matter if it’s an entry posted to my blog, or link sent to your email, or a dissertation in a PDF, or a web page, the only reason we have a notion of “search” and “results found”, is that documents are inadequate data containers that wind up suppressing the information we intend to share. The WWW, email, blogs, delicious bookmarks, etc., the document always looses important parts of the data we place in it. Because of this, the document must be searched for and founded again.

The Powerless Boss:
Suppose you have a boss who has a collection of many thousands of photos stored on your PC. He asks you one day to find a certain photo he took at a conference, he describes the photos in vivid detail. The problem is, you have this incredibly low resolution monitor, the figures in the photos are blurred beyond recognition, you can’t make out any of the people’s faces, how on earth will I you find the photo he's interested in? So you begin creating alternative heuristics for finding the photo, you think "he said he took it along side three people, there are a few with four human shaped objects, I can try to determine which one is him by cross referencing and narrow down…, well, he also took one that day at the podium, thankfully there’s only one with a human shaped form at a podium looking thing… and it’s shaped like and is the same color as the blob in this photo… one of these three are most likely him." So you email him the candidates, he prints them and selects the correct one, then says “Thanks so much, now I need the photo of me discussing the market data powerpoint slide”. Based on his feedback, you make a note that says “The tall purple blob in these photos is the Boss”. But then you then explain to him, "Hold on boss, all the detail you provide in your request is useless to me" (then you explain to him the situation)... "you’ll have to speak in terms of colors and blobs (i.e. please dumb down your request)".

He says: “Hmm, ok, the picture I want should have a tall, slender, dark blob left of center, and three smaller blobs to the right, because by that time two of the panelists had not gotten there yet”. Two photos match, you send, boss prints and selects the correct one from what you gave him, and you use that good guess to improve the heuristics in your little book. Your monitor’s terrible resolution introduces a tremendous pain for your boss, but gives you great job security, because of the tremendous value your book of heuristics now offers.

But now, let's take a look at what happens the moment your boss increases the resolution of your monitor:
  • Your book of heuristics becomes worthless
  • Your boss can now fire you anytime and hire anyone else to retrieve his photos
  • Most importantly, your boss can now request a photo from 1000s by describing the photo he wants in vivid detail, and can be fairly certain that he will receive the photo he request (if the photo exists), so he can say things like "I need some photos for my homepage, get me all photos of me taken when I still had a beard, and taken outdoors wearing no suit, at my home, or taken at a bar with anyone I know"
Now think of the description of a resource as a photo of that resource, and each statement (triple) involving that resource as a pixel that makes up the photo. Because documents were the atomic unit of information, the web had a really, really, really low resolution, and Google held a very valuable book of heuristics. As we increase the resolution of the web, the emphasis on "search" will evaporate.

The Trend Towards a Web in HD

What we’re moving towards with the Linked Data Movement, and the Semantic Web movement at large, is what can be described as a High Definition Web (i.e. Web 3.0, where each version increment roughly corresponds to a decade). The web has always been about describing things.

Web 1.0 contained statements where documents referred to nouns and you only had one verb isSomehowRelatedTo. Anchor tag is a reference to the relationship isSomehowRelatedTo. If you think of information (i.e. a statement) as a pixel, Web 1.0, if a document only contained one hyperlink, the pixels that make up it’s photo were few, or it may have many inbound and outbound links, but because each link means the same thing, it had no color (i.e. the link had no distinction)

Web 2.0 introduced subjects of several new nouns types, same monolithic verb isSomhowRelatedTo, and an object of type ambiguous term i.e. tag. Web 2.0 increased the number of pixels just slightly, but still no real color.

Web HD completes this transition by offering all named entities as subjects and direct objects, and any relationship as verb. Web HD is like having a life-like photograph of a thing, we can say this is a person, we can describe their phenotype, their genotype, likes, dislikes, social relationships…, each statement can now offer distinctly different information (so you have this wide range of color), and because you have this rich and inexhaustive vocabulary, the number of pixels in the photograph explode.

Labels: , , , , ,

Tuesday, June 17, 2008

Linked Data Example

I'm blogging live from the LDP conference, and have seen some very exciting technologies and heard some excellent presentations of the linked data vision. In my talk on tomorrow, I discuss the differences between todays web (Web 1.0 & 2.0), which is primarily a web of opaque documents and the simple "isRelatedTo" links between them, verses tomorrow's web vision which offers links between granular semantic (i.e. non-ambiguous references to self-described) concepts. Thus, instead of the document (and links between them) being the atomic unit of information, the database becomes the container.

But Kingsley today demoed something I had not thought a lot about... what if you make the document the container of these richer, semantic statements. RDFa is a standard for embedding RDF into HTML documents. But take a look at Kingsley's keynote presentation (which is a Powerpoint document), or rather, the linked data embedded in it. This graph allows you to explore the slides in the presentation, the concepts it discusses, resources and photos it contains, people related to it and the concepts it mentions, etc.

Labels: , , ,

Friday, June 13, 2008

Linked Data Planet

Next week is the first annual Linked Data Planet conference, which will be held in NY. I was really excited when I first hear about this, and excited about attending, because two of the keynotes are visionaries who I have been wanted to hear speak for such a long time but haven't yet had the opportunity: Kingsley Idehen and Tim Berners-Lee. I'm really excited about this particular event, because it puts a concentrated focus on the momentum building around Linked Data, which is one of the chief byproducts of the Semantic Web. I believe that this event will mark a critical turning point for the Semantic Web movement.

I will also be doing a talk on Dbpedia, Ontowiki, and Cypher, and a new service called Cynapse. In addition, I will have a demo of some of the latest Cypher features and improvements both in presentation and in the exhibition.

Labels: , , , , , , , ,