COMPUTING PATTERNS IN STRINGS

[Winter Term (January-April) 2014-2015]

Bill Smyth

Please contact the instructor (see above).

We meet Mondays 12:00 - 13:30 ITB-222 Thursdays 12:00 - 13:30 ITB-222 The first class is on Thursday 08 January.

This course deals with algorithms for finding "patterns" in strings, patterns of three main kinds: specific, generic, and intrinsic. The importance of approximate patterns and algorithms which identify them is made clear. Applications to DNA sequence analysis and other scientific areas are emphasized.

The study of string algorithms is increasingly important in computer science, primarily due to its many applications, in computational biology, data compression, cryptology, and other areas. Over the last 40 years a variety of elegant algorithms have been discovered. The objective of this course is to provide students with some sense of this mathematical elegance while at the same time making clear the subject's relevance to applications. See Calendar Description.

The topics are selected from the course text,Computing Patterns in Strings, whose main components are given below: PART I -- STRINGS & ALGORITHMS Chapter 1 -- Properties of Strings Chapter 2 -- Patterns? What Patterns? Chapter 3 -- Strings Famous & Infamous Chapter 4 -- Good Algorithms & Good Test Data PART II -- COMPUTING INTRINSIC PATTERNS Chapter 5 -- Trees Derived from Strings Chapter 6 -- Decomposing a String PART III -- COMPUTING SPECIFIC PATTERNS Chapter 7 -- Basic Algorithms Chapter 8 -- Son of BM Rides Again! Chapter 9 -- String Distance Algorithms Chapter 10 -- Approximate Pattern-Matching Chapter 11 -- Regular Expressions & Multiple Patterns PART IV -- COMPUTING GENERIC PATTERNS Chapter 12 -- Repetitions (Periodicity) Chapter 13 -- Extensions of PeriodicityPlease note: This course is highly mathematical in nature; students taking it should both enjoy mathematics and have a good undergraduate foundation in discrete mathematics (the mathematics of computer science).

See Schedule.

The course will be run as a seminar course with heavy student participation throughout the term: lecturing, presentations, and problem-solving. Student assessment will be based primarily on a major written project that will also be presented to the class. The student's participation during the term may also contribute to the final mark.

The lecturer's monograph,Computing Patterns in Strings, will be the main course text. It is available both in the bookstore and in the library. Numerous other references will also be made available to students, both journal papers and books. Among the books are *Algorithms on Strings, Maxime Crochemore, Christophe Hancart & Thierry Lecroq, Cambridge University Press (2007) *Jewels of Stringology, Maxime Crochemore & Wojciech Rytter World Scientific Publishing (2003) *Algorithms on Strings, Trees & Sequences, Dan Gusfield Cambridge University Press (1997) *Introduction to Computational Molecular BiologyJoao Carlos Setubal & Joao Meidanis PWS Publishing (1997) *String Searching Algorithms, Graham A. Stephen World Scientific Publishing (1994) All of these books are available in the Thode Science Library. Some useful files for testing purposes can be picked up at the lecturer's website.

N/A

Academic dishonesty consists of misrepresentation by deception or by other fraudulent means and can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: "Grade of F assigned for academic dishonesty"), and/or suspension or expulsion from the university. It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3, located at http://www.mcmaster.ca/policy/Students-AcademicStudies/index.html The following illustrates only three forms of academic dishonesty: 1. Plagiarism, e.g. the submission of work that is not one's own or for which other credit has been obtained. 2. Improper collaboration in group work. 3. Copying or using unauthorized aids in tests and examinations. The instructor in this course may be using a software package designed to reveal plagiarism. In such cases, students will be required to submit their work electronically and in hard copy so that it can be checked for academic dishonesty.

"The Faculty of Engineering is concerned with ensuring an environment that is free of all discrimination. If there is a problem, individuals are reminded that they should contact the Department Chair, the Sexual Harassment Officer or the Human Rights Consultant, as the problem occurs."

"The instructor and university reserve the right to modify elements of the course during the term. The university may change the dates and deadlines for any or all courses in extreme circumstances. If either type of modification becomes necessary, reasonable notice and communication with the students will be given with explanation and the opportunity to comment on changes. It is the responsibility of the student to check their McMaster email and course websites weekly during the term and to note any changes."

Outline revised: Jan 5, 2015

Script revised: Jan 26, 2010 J. Nakamura