Computer Science   722   Course Outline

COMPUTING PATTERNS IN STRINGS
[Winter Term (January-April) 2014-2015]

Instructor(s)

Bill Smyth

Course Assistance

Please contact the instructor (see above).

Schedule

We meet

	Mondays   	12:00 - 13:30   ITB-222
	Thursdays      	12:00 - 13:30   ITB-222 

The first class is on Thursday 08 January.

Calendar Description

This course deals with algorithms for finding "patterns" 
in strings, patterns of three main kinds: specific, generic, 
and intrinsic.  The importance of approximate patterns and
algorithms which identify them is made clear.  Applications
to DNA sequence analysis and other scientific areas are
emphasized.

Course Objectives

The study of string algorithms is increasingly important in
computer science, primarily due to its many applications,
in computational biology, data compression, cryptology,
and other areas.  Over the last 40 years a variety of
elegant algorithms have been discovered.  The objective of
this course is to provide students with some sense of this
mathematical elegance while at the same time making clear
the subject's relevance to applications.

See Calendar Description.

Outline of Topics

The topics are selected from the course text, Computing Patterns 
in Strings, whose main components are given below:

	PART I -- STRINGS & ALGORITHMS

	     Chapter  1 -- Properties of Strings
	     Chapter  2 -- Patterns?  What Patterns?
	     Chapter  3 -- Strings Famous & Infamous
	     Chapter  4 -- Good Algorithms & Good Test Data

	PART II -- COMPUTING INTRINSIC PATTERNS

	     Chapter  5 -- Trees Derived from Strings
	     Chapter  6 -- Decomposing a String

	PART III -- COMPUTING SPECIFIC PATTERNS

	     Chapter  7 -- Basic Algorithms
	     Chapter  8 -- Son of BM Rides Again!
	     Chapter  9 -- String Distance Algorithms
	     Chapter 10 -- Approximate Pattern-Matching
	     Chapter 11 -- Regular Expressions & Multiple Patterns

	PART IV -- COMPUTING GENERIC PATTERNS

	     Chapter 12 -- Repetitions (Periodicity)
	     Chapter 13 -- Extensions of Periodicity

Please note: This course is highly mathematical in nature;
students taking it should both enjoy mathematics and have a good
undergraduate foundation in discrete mathematics (the mathematics
of computer science).

Format

See Schedule.

Student Assessment (Grading)

The course will be run as a seminar course with heavy student 
participation throughout the term: lecturing, presentations,
and problem-solving.  Student assessment will be based primarily
on a major written project that will also be presented to the 
class.  The student's participation during the term may also 
contribute to the final mark.

Resources

The lecturer's monograph, Computing Patterns in Strings, will be
the main course text.  It is available both in the bookstore and in
the library.  Numerous other references will also be made available 
to students, both journal papers and books.  Among the books are

* Algorithms on Strings, Maxime Crochemore, Christophe Hancart
  & Thierry Lecroq, Cambridge University Press (2007)
* Jewels of Stringology, Maxime Crochemore & Wojciech Rytter
  World Scientific Publishing (2003)
* Algorithms on Strings, Trees & Sequences, Dan Gusfield
  Cambridge University Press (1997)
* Introduction to Computational Molecular Biology
  Joao Carlos Setubal & Joao Meidanis
  PWS Publishing (1997)
* String Searching Algorithms, Graham A. Stephen
  World Scientific Publishing (1994)

All of these books are available in the Thode Science Library.

Some useful files for testing purposes can be picked up at
the lecturer's website.

Instructor Specific Information

N/A

Academic Dishonesty

Academic dishonesty consists of misrepresentation by deception or by other 
fraudulent means and can result in serious consequences, e.g. the grade of 
zero on an assignment, loss of credit with a notation on the transcript 
(notation reads:  "Grade of F assigned for academic dishonesty"), and/or 
suspension or expulsion from the university. 

It is your responsibility to understand what constitutes academic dishonesty.  
For information on the various kinds of academic dishonesty please refer to 
the Academic Integrity Policy, specifically Appendix 3, located at 


http://www.mcmaster.ca/policy/Students-AcademicStudies/index.html


The following illustrates only three forms of academic dishonesty:
1. Plagiarism, e.g. the submission of work that is not one's own or for 
   which other credit has been obtained.  
2. Improper collaboration in group work. 
3. Copying or using unauthorized aids in tests and examinations.

The instructor in this course may be using a software package designed 
to reveal plagiarism.  In such cases, students will be required to submit 
their work electronically and in hard copy so that it can be checked 
for academic dishonesty.

Faculty Notices

"The Faculty of Engineering is concerned with ensuring an
environment that is free of all discrimination.  If there 
is a problem, individuals are reminded that they should
contact the Department Chair, the Sexual Harassment Officer 
or the Human Rights Consultant, as the problem occurs."
"The instructor and university reserve the right to modify elements of the
course during the term. The university may change the dates and deadlines
for any or all courses in extreme circumstances. If either type of
modification becomes necessary, reasonable notice and communication with the
students will be given with explanation and the opportunity to comment on
changes. It is the responsibility of the student to check their McMaster
email and course websites weekly during the term and to note any changes."


Outline revised: Jan 5, 2015
Script revised: Jan 26, 2010 J. Nakamura