I’ve decided to investigate Google App Engine (GAE) in my spare time and I need a project to test it with.
So I’m going to try to produce an on-line version of my Arcanicity Index. It will be very simple system, and because I don’t have a fag packet on which to sketch it out, I’ll list the specification below.
The user will be asked to enter some text and then click a button market Process. DeepThought will then sit and ponder for 7 and half million years and respond with “42”. Either that or it will provide the visitor with some text statistics and an Arcanicity Index estimate for the text they entered.
That should do it for a test system. I did produce a “hello world” app using GAE a long time ago so I know the principles but there are a few areas that I’m not sure about:
- Security? Does Google protect the web server, the app, the code, etc?
- Embedded? How can I link or embed the application in my own web page or will I have to send users to Google and hope they come back?
- Cost? How much power will deep thought, sorry Google, provide me with free of charge? And what would be the cost of hosting it should it become moderately popular?
- NLTK? I’m using the Natural Language Toolkit’s Punkt Tokenizer to separate the text into sentences and words but I know GAE doesn’t support NLTK out of the box.
Of those issues it’s NLTK that gives me the most concern. NLTK provides a much better (although not perfect) mechanism for detecting the end of sentences. Most other methods I’ve seen treat…
Mr. T. Brown said come A.S.A.P.
…as either 3,4 or 5 sentences. So it’s important I can use the NLTK Tokenizer. I have read of some tricks to manually install NLTK so I will probably start there. But I’m not really a proper programmer so I might have to agree with the rest of the world that there are 5 sentences in the text above.
I’ll post about my exploits as I go.