CS 1MD3 - Winter 2006 - Assignment #5

Generated for Student ID: 0042470

Due Date: April 5th, 2006

(You are responsible for verifying that your student number is correct!)
NOTE: All submissions must be plain text format in ".py" files.

Instructions

Using the libraries re and urllib, write a Python program to look up the show times for a movie at the theatres in a given city. It will prompt the user for a city along with part of a movie title, get the listings for that city from www.tribute.ca/jam_search/search.asp, and extract the relevant information from the web page received. Here are some samples of the output your program should produce:
$ python movie.py
Movie Title: vendetta
City: hamilton
EMPIRE JACKSON SQUARE 6 CINEMAS, HAMILTON
V FOR VENDETTA (14A)12:45 3:40 6:30 9:25Stereo; Coarse Language; Coarse Language; Coarse Language; G

SILVERCITY ANCASTER
V FOR VENDETTA (14A)12:40 3:55 7:15 10:30

SILVERCITY ANCASTER
V FOR VENDETTA (14A)12:10 3:25 6:45 10:00

UPPER JAMES CINEMAS, HAMILTON
V FOR VENDETTA (14A)1:00 4:00 7:00 10:00

$ python movie.py 
Movie Title: george
City: guelph
GALAXY CINEMAS @ GUELPH
CURIOUS GEORGE (G)12:00 2:20 4:35 6:55

$ python movie.py 
Movie Title: ultraviolet
City: london
HURON MARKET PLACE, LONDON
ULTRAVIOLET (PG)7:20 9:50

MUSTANG DRIVE-IN THEATRE, LONDON
ULTRAVIOLET (PG) 9:10SCREEN 2

SILVERCITY LONDON
ULTRAVIOLET (PG)6:35 9:20

The URL: http://www.tribute.ca/jam_search/search.asp

Notes:

I hate to say it, but this exercise can be quite tricky. To compensate for this, you are given a substantial number of hints (below)--far more than your fellow students.

You may assume that valid input has been provided from the user. No need to check the name of the city.

What you'll receive from www.tribute.ca is a big, ugly mess of HTML containing all the movies at all the cinemas in the given city. You'll need to break this up into logical pieces and discard all those HTML tags.

Suggested Approach:

There are many ways to attack this problem, and you are encouraged to do it in a way that makes sense to you. Below is an outline for a very straightforward approach. It is neither the most concise nor the most efficient way to do it, but it's reliable and less error-prone than some of the other myriad ways.

Leave the HTML tags alone for the moment because they'll make the page significantly easier to digest. The first thing we need to do is turn that big, awful string into a list of strings: one for each theatre.

If you examine the page source, you'll notice that each theatre name ends with a </FONT> tag and begins with the following string (which I call the "FONT FACE string"):

<FONT FACE="Arial,Helvetica,Verdana" COLOR="#000000" SIZE="-1"><br>
Put that string in a variable so you can refer to it easily within your code. The web page has the following structure: The first step is to convert the string into a list of strings, each one having the following form: Next we need to convert each of those list elements into a list itself, each element of which is a movie title followed by its schedule. If you examine the page source, you'll notice that each movie title is preceded by the following tag:
<A HREF="http://jam.canoe.ca/Movies/Reviews" target="_new">
This is useful to know when you need to break those schedules into a list. Now we have a list of "theatres," where each "theatre" is a list of movies. To clean up the name of each theatre, use the fact that the name is followed by "</FONT>." To clean up the schedule for each movie, simply strip out the html tags using a regular expression. Finally we can search our list of lists to find a movie title that matches the requested search string. It is strongly recommended that you make use of the split function. It can greatly simplify the otherwise arduous process of splitting a string into a list of strings.