Thursday, August 6, 2009

"Best In Class" Parsing for Conversational Interfaces

Several parsing techniques are available for Conversational Interfaces. Different approaches are used by different tools. For example, AIML (AI Markup Language) uses a process of substitution/reduction of common terms and phrases, and then works left-to-right interpreting each word, branching down a "trie" structure.

NLTK (Natural Language Tool Kit) helps you guess at the parts of speech, to attempt to derive semantics from user input. After determining the parts of speech, then you can attempt to parse Verb Phrases and Noun Phrases to determine the meaning of the sentence or question.

Amy Iris can use any available parsing routine. Our objective with Amy Iris is to build a plug-able interface, so that as new parsers are invented, they can be dropped in, and improve the Amy Iris tool.

Case Study: Municipal Conversational Interface
sing your knowledge base for parse scoring

My municipal Conversational Interface has a limited knowledge base for this specific application. Municipal Amy Iris has a limited set of knowledge, and needs to be able to interpret interactions from a limited domain, and provide the canned answer based on the interpretation of the input.

Say I start with 100 common questions, and write answers for those questions. Once I have my collection of questions, I can analyze those questions, to try to determine the significant words from them.

Let's work through a few examples:

Say I have a collection of potential questions that include:

A. "How do I report a pothole?"
B. "What day is garbage day?"

Note, no words are in common. My goal is to be able to analyze user input and determine which question from my knowledge base is closest to the user's input. So it's important to know which words are significant in my knowledge base. Further, I need an automated way to do this.

Ideally, I could take user input, and "score" it against each knowledge-base question, to guess which knowledge-base answer to provide. An exact match should produce a perfect score. But ideally, synonyms of "significant" words should also generate high scores.

My intuition tells me that "report" , "pothole", and "garbage" are more significant words than "what" and "how". Further, words not listed, such as "trash", "collection", "pickup", and "pick up", should score well with question B. How do we write a parsing algorithm to score each user input against the knowledge base?

Answer: To do basic parsing, analyze your question set. Collect the entire vocabulary of your question set, as well as synonyms (which can be pulled out of NLTK), and analyze which words in your vocabulary are most significant. Nouns and verbs will tend to be more significant than connector words, adverbs, and other words. Then it's not difficult to write a parsing program that knows that "garbage" and "trash" questions are more related than "pothole" and "trash" questions.

Once the basic parsing has been done, support words like "what", "how", "when" will become more important. Let's say we determine that the user input is a question related to garbage. Look at these variations, and imagine how we might parse them:

B. What day is garbage day?
C. When is trash pick up?

D. What can I throw in the garbage?

All three are "garbage" -type questions, but the two questions of the format "What ... garbage ..." (B & D) are asking different things, while the first two questions are asking almost the same thing (even though there's only one word in common: "is").

Ideally, a parsing scoring system would rank all "garbage"-related questions as similar (whether the word "garbage" or "trash" is used, and two time-related ("What day", "When") as more similar than comparing a time-related question to a non-time related item ("What day" compared to "What [implied garbage items]").

That doesn't seem too tough, does it? Here's how I did it with my simple municipal bot.


My Conversational Interface should be able to accept a user's input, and parse it by running through the knowledge base, scoring each answer in the knowledge base. The highest score answer gets displayed to the user.

Further, certain thresholds should be met. If no answer's score exceeds the "did not understand" threshold, then a generic "I don't understand" message is displayed. And if no score exceeds a "pretty good answer" threshold, then maybe we respond with "I'm not sure what you are asking, but if you are asking this question, Q, then the answer is A." Otherwise, confidently give the answer with the highest score.

  1. Create a list of questions. I started with 55 questions.
  2. Create a list of answers to those questions. Word the answers so that they are mildly generic (so if the question is about city hall, the answer might provide the address and the hours). This is the beginning of your expandable knowledge base. Ultimately you'll have many more answers, and maybe even multiple questions that trigger each answer.
  3. Analyze the question list automatically, looking for "significant" words. My analysis parses each question, building a vocabulary. With 55 querstions, my conversational interface had a vocabulary of about 200 words. The more rare words and word combinations tell the difference between one question and another, so we want to identify those.
  4. Expand our vocabulary to define alternate ways to trigger the same answer. NLTK provides access to synonyms and other word data. Synonyms took my bot's vocabulary up to about 2000 words.
  5. Define a scoring algorithm. Generally speaking, the more unique words that you hit on, the higher the score. Connector words and common words should count for very little. Direct matches of rare words should count a lot. Synonym matches should be somewhere in between.

After all is said and done, I ended up with an analysis program and a stand-alone bot program.

The analysis program reads in a CSV file of my questions and answers (Knowledge base), analyzes the questions looking for uniqueness, queries NLTK (Natural Language Too Kit) for Synonyms, and prepares a scoring algorithm. Then it loops, prompting the user for input, and scores each Answer in the Knowledge Base, and dumps out the best answer.

Once my analysis program looked half-way decent, I dumped out the contents of several fields (vocabulary, synonyms, and knowledge base), and stripped down the analysis program to be just the stand-alone bot for you to play with.

Results? Not too bad. Definitely something for me to be able to work with. And something that you can play with, too.

Issues: Certain words just have too many meanings, and so I'll be experimenting with reducing their weight. For example, "Can I burn leaves?", the word "leaves" scores well with yard items, as well as with "departs", "exits" and other meanings.

Let me know if you have any brilliant ideas as to how to improve the parsing algorithm.


Municipal Amy Iris Web Interface - Try her out here. (remember, there are only 55 answers, so be kind!)

Pastie of the Analysis program - You need NLTK and a Spreadsheet of questions and answers, but you can create your own Conversational Interface with this. It allows you to test your conversational interface.

Pastie of a Stand Alone Municipal Bot - This is way cool. You can experiment with the data that I created by running the Analysis program, above. It's JUST the conversational interface, with the knowledge base hard-coded in the program (to minimize your set-up requirements).

Let me know what you think!


riathomas said...

I think some of their primitives might be applied in your context

liagill said...

Nice post! GA is also my biggest earning. However, it

riathomas said...

Thanks for sharing such a valuable information. It’s really amazing.
nice sharing it is helpful me

riathomas said...

Thanks for sharing such a valuable information. It’s really amazing.
nice sharing it is Thanks for again sharing great