McMaster University

For registration information
please contact:

Janet Delsey
905-525-9140, ext. 24910


Workshop on
Mathematical Programming in Data Mining
and Machine Learning

June 1-4, 2005

    McMaster University, Hamilton, ON Canada
Invited Talks

Kristin P. Bennett
Rensselaer Polytechnic Institute
New York, USA

Optimization Challenges in Capacity Control

In this talk, we examine different strategies for capacity control in learning and the optimization challenges that they present. Capacity control in learning is essential for good generalization. Support vector machines and related approaches use penalty terms to control capacity. But we must not forget the lessons learned from neural networks, namely that dimensionality reduction and early stopping can successfully control capacity as well. Beyond generalization, dimensionality reduction also provides low-dimensional representations of the data valuable for visualization and interpretation. Dimensionality reduction approaches to least-squares-loss based problems, such as PCA and CCA, frequently reduce to generalized eigenvalue problems that are readily solved. But dimensionality reduction approaches for learning tasks with other loss function remain a challenge. For example, how does one do dimensionality reduction for a ranking problem? The underlying optimization problems are typically nonconvex. Here we will explore recent representative research from the literature illustrating the modeling and optimization methods used to tackle these problems. We conclude with a novel approach for performing dimensionality reduction for a given convex loss function and show how it relates to early stopping and nonlinear conjugate gradient methods.


Kristin Bennett is an associate professor in the Mathematical Sciences Department at Rensselaer Polytechnic Institute. Starting with her Ph.D. from the University of Wisconsin-Madison in 1993, her research has focused on mathematical programming approaches to machine learning and their application to problems in science, engineering, and industry resulting in over 45 papers in these areas. Her professional service spans both the optimization and machine learning communities. She has served as an associate editor, guest editor, or advisory board member for SIAM Journal on Optimization, Naval Research Logistics, Machine Learning Journal, Journal of Machine Learning Research, and IEEE Transactions on Neural Networks. Currently she is program co-chair for the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. She has done organizational and reviewing work for conferences in both machine learning and optimization including INFORMS, ICCOPT, MPS, KDD, SIAM Conference on Data Mining, NIPS, AAAI, COLT, and ICML.

Peter Hammer

RUTCOR, Rutgers University
New Jersey, USA

Discrete Optimization Problems in the Logical Analysis of Data


Discrete nonlinear optimization plays a central role in the Logical Analysis of Data, e.g. in feature selection, in optimal pattern generation, in model formation, etc. All of these problems require the minimization of a polynomial objective function in 0-1 variables, subject to set-covering constraints, and sometimes also to a family of set-packing constraints. Efficient heuristics, computational results, and applications to the analysis of medical problems will be presented.


Professor Hammer is the Director of RUTCOR - Rutgers University Center for Operations Research. His research interest covers logical analysis of data, biomedical informatics,discrete mathematics and applications. He has published 4 monographs, 10 edited volumes and over 220 research papers. He is the founder and Editor-in-Chief for many leading journals in discrete mathematics and optimization including Discrete Mathematics, Discrete Applied Mathematics, Discrete Optimization, Annals of Discrete Mathematics, Annals Of Operations Research, Monographs on Discrete Mathematics and Applications (SIAM), Due to his excellent research contribution and service to the community, he has been awarded 3 honorary doctorates and several distinguished prizes.

Pierre Hansen
Data Mining Chair, Director of GERAD
HEC Montreal

A Mathematical Programming Approach to Discovery in Graph Theory

In this talk, we will discuss how a generic heuristic can be built to find graphs satisfying given constraints, optimize invariants, refute, corroborate or strengthen conjectures, find new conjectures, and get ideas of proof. This has been implemented in the AutoGraphiX system developed by Gilles Caporossi and myself, with the help of many students. Several hundred conjectures have been obtained, most of them in a fully automated way, about half of them proved automatically, and many others proved by various mathematicians.

John MacGregor
University Professor
Dofasco Professor of Process Automation and IT
McMaster University

Latent Variable Methods for Process Analysis, Monitoring and Design

This paper gives an overview of latent variable methods for utilizing large process data matrices. Latent variable methods such as Principal Component Analysis (PCA) and Projection to Latent Structures (PLS) are shown to be well suited for obtaining useful subspace models for treating a variety of important industrial problems. These models handle the reduced rank nature of the data, they are symmetric in the way they treat the X and Y spaces, and they provide models for both the X and Y spaces. The methods are illustrated with industrial examples in the following areas: (i) the analysis of historical databases and trouble-shooting process problems; (ii) process monitoring; (iii) extraction of information from multivariate sensors; and (iv) the development of new processes and products. In each of these problems latent variable models provide the framework on which solutions are based.


JOHN MACGREGOR received his PhD degree in Statistics, and his MSc degrees in both Statistics and in Chemical Engineering from the University of Wisconsin, and his Bachelor of Engineering degree from McMaster University in Canada. After working in industry for several years as a process specialist with Monsanto Company in Texas, he joined the Department of Chemical Engineering at McMaster University in 1972. He is currently holds the title of “University Professor” as well as the Dofasco Chair in Process Automation and Information Technology at McMaster University. He is a cofounder of the McMaster Advanced Control Consortium that is sponsored by many international companies.

Dr. MacGregor is a Fellow of the American Statistical Association, and has received many awards for his work in applied statistics and chemometrics, among them, the Shewhart Medal and the W.G. Hunter Award from the American Society for Quality, and the Herman Wold Medal from the Swedish Chemical Society. He is a member of the Canadian Academy of Engineering and has received many awards from engineering societies, among them, the Century of Achievement Award from the Canadian Society for Chemical Engineering, and the Computing and Systems Technology Award from the American Institute of Chemical Engineers.

Alex Rubinov
Professor and Director of Center for Informatics and Applied Optimization,
University of Ballarat, Australia

Unsupervised and Supervised Classification via Nonsmooth Optimization

Many problems of unsupervised classification (clustering) and supervised classification can be reduced to the minimization of sum-min or max-min of convex functions or even more complicated functions. In contrast with popular models, the dimension of corresponding optimization problems is relatively small and these problems do not contain integer variables. We discuss these problems and also a
numerical method for non-smooth optimization that can be used for their solution. This method is a combination of the local Discrete Gradient method introduced by Adil Bagirov and the global Cutting Angle method. We also discuss results of numerical experiments.


Professor Rubinov has worked on many areas in the field of mathematical programming and its application in data mining and economics. He has published 17 research monographs and more than 170 research papers. He has supervised more than 30 PhD students and served in the editor board of several well-know optimization journals J. Global Optimization, Optimization Methods and Software etc.

Yong Shi
Charles & Margre Durham Distinguished Professor of Information Technology College of Information Science and Technology
University of Nebraska at Omaha

Data Mining Techniques via Multiple Criteria Optimization Approaches

This talk presents a comprehensive view regarding multiple criteria optimization approaches to data mining. Traditionally, researchers have studied various methods by using linear programming (LP) to solve discriminent problems with a small sample size of data. These methods can be considered as LP approach to classification in data mining. Recently, the author and his colleagues extend such a research idea into classification via multiple criteria linear programming (MCLP) and multiple criteria quadratic programming (MCQP). This new approach has been tested by several large real-life databases and has outperformed some know data mining models. The tutorial will first outline the development of LP, MCLP and MCQP techniques. Then, it will focus on the development of various multiple criteria optimization based data mining techniques and algorithms. Finally, it will show the applications of this approach in the real-life credit card risk management decisions, information intrusion and bioinformatics.


Professor Shi is the director of Chinese Academy of Sciences Research Center on Data Technology & Knowledge Economy, President’s Assistant of the graduate school of Chinese Academy of Sciences. He had been the Charles W. and Margre H. Durham Distinguished Professor of Information Technology, College of Information Science and Technology, University of Nebraska at Omaha in 1999-2004. Dr. Shi's research interests covers data mining, information overload, multiple criteria decision making and telecommunication management. He has published seven books and more than sixty papers in various journals He is the Editor-in-Chief of International Journal of Information Technology and Decision Making (SCI) and an Area Editor of International Journal of Operations and Quantitative Management. Dr. Shi has received many distinguished awards including Outstanding Young Scientist Award, National Natural Science Foundation of China, 2001; Member of Overseas Assessor for the Chinese Academy of Sciences, May 2000; and Speaker of Distinguished Visitors Program (DVP) for 1997-2000, IEEE Computer Society. He has consulted in a number of famous companies.

Martin Wainwright
EECS, Berkeley

Mathematical Programming and Statistical Models Based on Graphs


Graphical models provide a useful framework for representing and manipulating statistical dependencies among large collections of random variables. Accordingly, these models are used and studied in various fields, including machine learning, computational biology, image processing, and error-control coding. At the core of applying a graphical model lies a common set of challenging problems, including the computation of marginals, modes and log likelihoods, and learning parameters from data. Exact solutions are computationally prohibitive for general graphical models, which motivates the development of approximate methods.

In this talk, we discuss the use of variational methods to generate deterministic approximations to marginals and likelihoods. Working within the framework of exponential families, we describe how the computation of marginals, modes, and log likelihoods can be formulated as the solution of low-dimensional convex optimization problems. For tree-structured graphs, these optimization problems can be solved efficiently by local message-passing algorithms (i.e., a form of dynamic programming). In general, however, these convex programs are intractable to solve exactly, but nonetheless can be approximated and relaxed in a number of ways. For intractable models, we discuss how a variety of known algorithms --- including mean field, belief propagation (sum-product), max-product, and cluster variational methods --- can all be understood as solving particular non-convex approximations to the true variational principle. We also touch on other new methods that are based on convex relaxations, and describe various open research directions that are suggested by this perspective.


Martin Wainwright is currently an assistant professor at University of California at Berkeley, with a joint appointment between the Department of Statistics and the Department of Electrical Engineering and Computer Sciences. He received a Bachelor's degree in Mathematics from the University of Waterloo, Canada, and the Ph.D. degree in Electrical Engineering and Computer Science from Massachusetts Institute of Technology (MIT) in Cambridge, MA. His research interests include statistical signal and image processing, source and channel coding, optimization, and machine learning.

Stephen Wright
Department of Computer Science
University of Wisconsin-Madison

A Review of Some Optimization Techniques in Machine Learning and Statistics


We review some optimization techniques that have been proposed recently for use in machine learning and statistics applications. These include interior-point and pivoting methods for solving the (parametrized) quadratic and linear programs that arise in support vector machines and in regression problems such as the LASSO and its extensions. Issues of formulating the problems as optimization problems and of using existing software packages will also be discussed.


Professor Wright is in the Computer Sciences Department at the University of Wisconsin-Madison. From 1990-2001, he worked at Argonne National Laboratory, and prior to that held appointments at North Carolina State University and the University of Arizona. His interests lie in continuous optimization and its applications. He is the author of more than 60 papers, and three well-received books on interior-point methods, numerical optimization and optimization software. He is in the editorial board of SIAM Journal on Scientific Computing, and editor-in-chief of Math Programming (series B).

Stanley Young
Assistant Director for Bioinformatics

National Institute of Statistical Sciences

Identifying and Solving Important/Complex Problems


With a large data set you can ask lots of questions. Will you trick yourself and/or your customer? Suppose you have tens of thousands to millions of predictors. What to do? How can you sample densely from a ten dimensional space? What can you do to explore a large block of data when there are missing data points and outliers? These are important questions that come up in applied drug discovery research. There is a need for a work strategy to identify and find workable solutions for important problems. My idea has always been to entice bright people to join in the fun of finding a solution. Solutions usually involve subject matter knowledge, statistics and computer science. Several small ideas or insights can often be put together for a workable solution. This talk will cover several problems and offer thoughts on why things worked out well. The hope is to inspire you to identify and tackle your own important problems.


S. Stanley Young has worked on statistical aspects of biology and chemistry problems for over 25 years, first at Eli Lilly and then at GlaxoSmithKline. He led internal and external research teams developing novel statistical methods and software for drug discovery with two issued patents for statistical algorithms and over twenty papers on the application of statistical methods to drug discovery. He is a Fellow of the American Statistical Association and has five “best paper” awards.

Dr. Young was one of the first to apply the data mining method, recursive partitioning, to chemistry problems. He led a team of computational chemists, computer scientists and statisticians to develop a highly efficient version of RP for the analysis of high throughput screening data. For some time this program held the “land speed” record for RP – 200k records and 2.8M 0/1 descriptors. RP is an important technique as it is capable of discerning multiple mechanisms of biological action within the same data set.

Large blocks data are becoming quite important, e.g. micro array data. These large data sets often contain missing data and outliers. A robust method of singular value decomposition was developed that deals with missing data and outliers.

Dr. Young is currently the Assistant Director for Bioinofomatics at the National Institute of Statistical Sciences. He is also an adjunct professor at North Carolina State University, the University of Waterloo and the University of British Columbia.

Contact Us | Legal & Privacy Policy