Monrai Blog

News about Cypher, Semantic Web, Natural Language Processing, and Computational Linguistics

Thursday, August 10, 2006

Release 0.7.0

A new release of Cypher is available. This is a feature enhancement release. Now Cypher can generate the integer representation of any arbitrary natural language number:

Version 0.7.0

Enhancements: from 0.6.9

-- added new NumberTranscoder_LITERAL; allows natural language numbers to generate integer representation, the integer is wrapped in RDF literals of type xsd:nonNegativeInteger and xsd:NegativeInteger, making it consumable for semantic web applications.

-- added new number pattern grammar example to exploit number transcoder

There are also a couple of new grammar definition files which cover natural language numbers in English e.g. Five hundred twenty eight million five. But extending them to cover numbers in other languages shouldn't be a problem. The extended example dataset covers numbers up to tresrigintillion (10^102 I think, but correct me if I'm wrong). Sense so many people have been waiting for an online demo, I plan to set up the number transcoder as an intermediate online demo, especially since the input set in this case is finite.

I will post a more detailed explanation of the new dataset most likely in an article to be posted on the main Monrai website. In the meantime, try starting Cypher and entering: Your Name is some long number, for example Chris is twenty two thousand forty nine. Then look at the output file. There should be a owl:sameAs triple near the top, and one object should be the number you said. The BE verb is set to output an owl:sameAs triple, but you can easily change it to set the subject's age ( e.g. myonto:age). Also, conjunctions are not covered by the number patterns I wrote, so nine hundred and two won't match, but nine hundred two will match. I leave as an exercises for the user, the task of extending the example number pattern grammar to cover conjunctions.

Natural language numbers are normally spoken as opposed to written/typed, so speech recognition systems are probably a more appropriate usecase for this dataset.

Have fun!


Blogger Sherman Monroe said...

The output below was generated by Cypher for the input: Tom is seventeen billion two hundred thirty eight million five hundred twenty nine thousand four hundred five

<?xml version="1.0" encoding="UTF-8"?>
<!-- This scheme was generated by FrameFactory.transferSyntaxOutputResults()
Date: Thu Aug 10 17:36:04 CDT 2006
User: smonroe
Base URL: -->
<rdf:Description rdf:about="">
<rdf:type rdf:resource=""/>
<rdfs:label>Be (owl:sameAs Triples)</rdfs:label>
<rdf:subject rdf:resource=""/>
<rdf:predicate rdf:resource=""/>

<owl:sameAs rdf:resource=""/>
<rdfs:label xml:lang="en">Tom</rdfs:label>
<owl:sameAs rdf:resource=""/>
<rdfs:label xml:lang="en">Tom is seventeen billion two hundred thirty eight million five
hundred twenty nine thousand four hundred five</rdfs:label>

Try testing it for yourself.

5:52 PM  
Blogger Sherman Monroe said...

Here is an application at Standford which does integer to English number. The default package only goes as high as 99. However, I been hard pressed to find software which does it in reverse, natural language number to integer.

6:05 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home