Systems and users are generating increasing amounts of heterogeneous data from diverse sources (e.g., system logs, sensor networks, databases, medical records, tweets, etc). To extract value from these datasets (via querying and analysis) first requires data preparation to clean, transform, and enrich the raw data into a structure that is desirable for analysis. The challenges of managing large-scale, heterogeneous, poor quality, and fast data, go beyond the capabilities of traditional database systems.
In this course, we will explore data-intensive applications, and the algorithms that are needed for different data management tasks. We will begin the course with a few lectures, followed by discussions of recent publications. Each student will be responsible for presenting one or more papers, participating in class discussions, writing paper reviews, and completing a course project.
Prerequisites: an undergraduate database course (equivalent to CS 3DB3/SE 3DB3), programming proficiency.
Course Time: Wednesdays 1:00-4:00pm, ITB 222
Professor: Fei Chiang
Email: fchiang [at] mcmaster [dot] ca
Office Hours: Wednesdays 4:00-5:00pm, ITB 122