CS 1MD3 - Winter 2006 - Assignment #5

Generated for Student ID: 0443166

Due Date: April 5th, 2006

(You are responsible for verifying that your student number is correct!)
NOTE: All submissions must be plain text format in ".py" files.

Instructions

Using the libraries sys, re, and urllib, write a Python program to look up the definitions of a word in dictionary.com. Here are some samples of the output your program should produce:
$ python dictionary.py albatross

albatross:
1. Any of several large web-footed birds constituting the family Diomedeidae, chiefly of the oceans of the Southern
Hemisphere, and having a hooked beak and long narrow wings.
2. A constant, worrisome burden.
3. An obstacle to success.

$ python dictionary.py apoplexy

apoplexy:
1. Sudden impairment of neurological function, especially that resulting from a cerebral hemorrhage; a stroke.
2. A sudden effusion of blood into an organ or tissue.
3. A fit of extreme anger; rage: The proud... members suffered collective apoplexy, and this year they are out for blood (David Finch).

$ python dictionary.py invective

invective:
1. Denunciatory or abusive language; vituperation.
2. Denunciatory or abusive expression or discourse.

The URL: http://dictionary.reference.com/search

What Not To Worry About

Unfortunately this site provides fairly inconsistent HTML. Most of the definitions I examined are formatted within an <OL> block, using <LI> tags for each alternative definition. Some, however, are formatted with <p> tags, and some are in tables (to see some of these weird cases, look up "platypus," and "filibuster," for example). Some of them involve multiple <OL> blocks to encapsulate radically different groups of definitions.

Don't worry about all these ridiculous forms. Just go for the first <OL> block and deal with everything inside that. Some of these will include sub-<OL> blocks too. In particular, you're likely to see tags like, <OL TYPE='a'>." Such a block will also contain <LI> tags, and you should handle these like the other <LI> tags.

Don't worry about words that aren't found in the dictionary (unless you want to). In other words, there's no need to provide a, "no matches" message if the user asks for the definition of say, "iblibibibftthp," or "nucular."

Don't worry about making sure the user typed in the correct command-line arguments. You can just go ahead and assume the user typed, "python dictionary.py <word-to-look-up>."

What to Worry About

Note that in the sample output, the definitions are enumerated. You have to do this yourself. The browser does it in response to the <LI> tags. In your case, though, you don't need to worry about multiple levels of enumeration. For example, looking up the word, "albatross" on dictionary.com using a web browser gives you the following list of definitions:
   1. Any of several large web-footed birds constituting the family Diomedeidae, chiefly of the oceans of the Southern Hemisphere, and having a hooked beak and long narrow wings.
   2.    a. A constant, worrisome burden.
         b. An obstacle to success.
You may number these as simply, "1.," "2.," and "3."

There's a slight catch, though. If you're not careful, you might end up with an extra space after "2." and "3." You should be able to avoid this with a regular expression.

Although you don't need to translate special HTML characters, you should remove them. Without doing so, your third definition for apoplexy might look like this:

3. A fit of extreme anger; rage: &#147;The proud... members suffered collective apoplexy, and this year they are out for blood&#148; (David Finch).