Thursday, 28 October 2010

Eliza Lab

Today you'll experiment with using regular expressions to mimic Eliza's responses. Using this tool, type in search patterns to match the kinds of sentences that people often type as input to Eliza. Then in the replacement pattern box, write a pattern to produce an Eliza-like response. For example, the search pattern
  • I am ( [a-z]+)
with the replacement pattern
  • How long have you been $2?
will match the sentence
  • I am bored
typed in the input text box and produce
  • How long have you been bored?
in the output text box.
So, now you try it. Write a regular expression search pattern to match sentences of the form:
  • I am sick
  • I am tired
  • I am hungry
Does your search pattern match the following strings and produce the kind of response you would expect?
  • Today I am bored
  • I feel very happy
  • I feel awfully tired
If not, fix your search pattern so that the system generates sensible outputs for each of these. Now try these
  • I feel cold and I want a nap
  • I feel cold and I want a hot drink
  • I like food and drink
  • I like to sing and dance
What if you wanted to generate the response:
  • Have you always enjoyed eating and drinking
in response to the input:
  • I like to eat and drink
Would your search pattern also work for
  • I like to sing and dance
Now write a search pattern that will match sentences like:
  • I am tired and hungry
and produce the response:
  • What do you think makes you hungry and tired?
Now play with this version of Eliza. Find cases where it doesn't respond very naturally and try and write search and replacement patterns that solve the problems you encounter. (Note: they may not all be solvable just by this simple search and replace method.)

Thursday, 14 October 2010

Tagging with regular expressions

For this exercise you will find a recipe online, scrape the text from the web page, and then tag it with HTML markup. For example, here is some text scraped from http://uktv.co.uk/food/recipe/aid/513585
You should work through the exercise with this text first.

Ingredients
8 tbsp olive oil
450 g cooked peeled prawns, (ideally freshly shelled), drained and patted dry
1 medium onion, finely chopped
4 stick celery
2 cloves garlic, finely chopped
1 red chilli, de-seeded and finely chopped
1 bunch of fresh mint, leaves chopped
100 g cherry tomatoes, halved
juice of 1 lemons
4 courgettes, grated
150 ml white wine
black pepper
500 g fresh tagliarini
200 ml fresh vegetable stock
25 g unsalted butter
40 g parmesan, freshly grated
Method
Heat the olive oil in a large, heavy-based frying pan. 
Add the prawns and fry until lightly browned. 
Add the onion, celery, garlic, chilli and mint. Fry until the prawns turn nutty brown. 
Add the cherry tomatoes and cook for 2 minutes. 
Stir in the lemon juice, then the courgettes. Cook for 5 minutes. 
Pour in the white wine, bring to the boil and cook for 2 minutes. Remove from direct heat. 
Bring a large pan of salted water to the boil. Add the fresh tagliarini and cook until al dente, around 2 minutes. Strain. 
Add the strained tagliarini to the prawn mixture, followed by the vegetable stock, butter and Parmesan cheese. 
Mix well together over a medium heat with a wooden spoon until glossy. Season with salt and freshly ground pepper and serve at once.

Most recipes on this site have a similar format.

Paste the text into the regular expression sandpit.
You will use this tool to mark up the text in html. Read the instructions on the sandpit, and experiment to make sure you understand how it works.

First, put the regular expression [0-9].* into the regexp box. Then put li into the tag box. This shows you the result of enclosing each match for the regular expression between tags <li> and </li>.

This almost lists all the ingredients, but not quite. Try using the following regular expressions, 1 by 1. Try to understand what is happening.../p>

  • [0-9].*
  • \n[0-9].*
  • \n([0-9]|juice).*
  • \n([0-9]|juice|black).*

Once you are satisfied that you have successfully picked out all the ingredients, click tag text.

To finish the markup of the ingredients, try the regular expression (<li>\n.*)*</li> first with b as the tag, and then ul. Again click tag text.

Now use similar methods to tag each step in the method with li and the whole sequence of steps with ol.

Complete the markup by tagging the words "Ingredients" and "Method" with the tag h2.

Now find a recipe online (choose one that is well-structured). Scrape the text using copy and paste and use appropriate regular expressions and tags to mark it up in HTML.

For extra fun, scrape the text from this page at www.parliament.uk and tag each address in bold.