Monrai Blog

News about Cypher, Semantic Web, Natural Language Processing, and Computational Linguistics

Tuesday, August 01, 2006

Transcography - Part 1

Cypher is based on a sub-discipline of natural language processing called Transcography, which was developed by Monrai with the goal of merging the field of natural language processing with the increasingly popular Semantic Web movement. Transcography is the process of parsing the phrase structure of a natural language construct, and translating the grammar output into a semantic representation. The output of each NL construct is three things: 1) a URI representation of the NL construct, 2) a set of one or more subject-object-value triples involving the URI, and 3) the set of all triples produced by sub-phrases. So, Cypher views any and all lingusitic input as a URI + related triples. Knowing this is key to understanding why the Cypher lexicon is such a powerful NL resource.

As an example of transcographic output, consider the phrase: John's coach. The transcographic process produces a URI representing the phrase, for example: http://john.mysite.com/MrDouglass, and a set of triples representing the statements involved in the phrase:

{http://john.mysite.com/me} jo:isCoachedBy {http://john.mysite.com/MrDouglass}

Cypher leverages these triples to create either an RDF model or an SeRQL query. The mode of output is based on whether the NL construct is a clause or description, or if it's a noun phrase or question. The triples of sub-phrases are recursively merged to produce a root graph represeting the root NL phrase or clause. For example, consider: John's coach knows Martin. The URI produced will represent this clause (e.g. the URI of a reified RDF triple, or the URI of a semantic frame), and a graph containing:

{qv:node1} foaf:knows {http://john.mysite.com/MartinCrump}

The URI qv:node1 represents a SeRQL query variable of a SeRQL query which was serialized in RDF. This is because the phrase John's coach is a relational noun phrase, and thus, is anaphora reference. By re-constructing the SeRQL query for the variable (by following the links from qv:node1), and then executing the query, a program can retreive the resouce that represents John's coach at the time of the query. This technique is used because John may have a new coach at the time of the query. Transcography stipulates that any anaphora reference be represented by a query variable (linked to the RDF representation of the SeRQL query) unless the program is ready to apply the variable value (e.g. to presenting it to a human user in an interface).

The word transcography is the combination of transcode, which means "to convert media from one format to another", and -graphy which is "writing or text representation produced in a specified manner or by a specified process". Thus the literal meaning is "text transcoding". Knowledge representation frameworks used in the process include RDF and Frame Semantics.

1 Comments:

Blogger Fco. Javier said...

Great things can be achieved combining natural language processing and the Semantic Web. For example, this can be a great way of enhancing information retrieval systems.

My page: SeRQL y SPARQL - Recuperacion y organizacion de la informacion

5:50 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home