Invited
Talks
Kristin
P. Bennett
Rensselaer Polytechnic Institute
New York, USA
Optimization
Challenges in Capacity Control
In this talk, we examine
different strategies for capacity control in learning and the optimization
challenges that they present. Capacity control in learning is essential
for good generalization. Support vector machines and related approaches
use penalty terms to control capacity. But we must not forget the
lessons learned from neural networks, namely that dimensionality
reduction and early stopping can successfully control capacity as
well. Beyond generalization, dimensionality reduction also provides
low-dimensional representations of the data valuable for visualization
and interpretation. Dimensionality reduction approaches to least-squares-loss
based problems, such as PCA and CCA, frequently reduce to generalized
eigenvalue problems that are readily solved. But dimensionality
reduction approaches for learning tasks with other loss function
remain a challenge. For example, how does one do dimensionality
reduction for a ranking problem? The underlying optimization problems
are typically nonconvex. Here we will explore recent representative
research from the literature illustrating the modeling and optimization
methods used to tackle these problems. We conclude with a novel
approach for performing dimensionality reduction for a given convex
loss function and show how it relates to early stopping and nonlinear
conjugate gradient methods.
Bio:
Kristin
Bennett is an associate professor in the Mathematical Sciences Department
at Rensselaer Polytechnic Institute. Starting with her Ph.D. from
the University of Wisconsin-Madison in 1993, her research has focused
on mathematical programming approaches to machine learning and their
application to problems in science, engineering, and industry resulting
in over 45 papers in these areas. Her professional service spans
both the optimization and machine learning communities. She has
served as an associate editor, guest editor, or advisory board member
for SIAM Journal on Optimization, Naval Research Logistics, Machine
Learning Journal, Journal of Machine Learning Research, and IEEE
Transactions on Neural Networks. Currently she is program co-chair
for the 2005 ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. She has done organizational and reviewing work
for conferences in both machine learning and optimization including
INFORMS, ICCOPT, MPS, KDD, SIAM Conference on Data Mining, NIPS,
AAAI, COLT, and ICML.
Peter
Hammer
RUTCOR, Rutgers University
New Jersey, USA
Discrete
Optimization Problems in the Logical Analysis of Data
(SLIDES) (PAPER)
Discrete nonlinear optimization
plays a central role in the Logical Analysis of Data, e.g. in feature
selection, in optimal pattern generation, in model formation, etc.
All of these problems require the minimization of a polynomial objective
function in 0-1 variables, subject to set-covering constraints,
and sometimes also to a family of set-packing constraints. Efficient
heuristics, computational results, and applications to the analysis
of medical problems will be presented.
Bio:
Professor Hammer is the
Director of RUTCOR - Rutgers University Center for Operations Research.
His research interest covers logical analysis of data, biomedical
informatics,discrete mathematics and applications. He has published
4 monographs, 10 edited volumes and over 220 research papers. He
is the founder and Editor-in-Chief for many leading journals in
discrete mathematics and optimization including Discrete Mathematics,
Discrete Applied Mathematics, Discrete Optimization, Annals
of Discrete Mathematics, Annals Of Operations Research,
Monographs on Discrete Mathematics and Applications (SIAM),
Due to his excellent research contribution and service to the community,
he has been awarded 3 honorary doctorates and several distinguished
prizes.
Pierre
Hansen
Data Mining Chair, Director of GERAD
HEC Montreal
A Mathematical Programming Approach to Discovery
in Graph Theory
In this talk, we will
discuss how a generic heuristic can be built to find graphs satisfying
given constraints, optimize invariants, refute, corroborate or strengthen
conjectures, find new conjectures, and get ideas of proof. This
has been implemented in the AutoGraphiX system developed by Gilles
Caporossi and myself, with the help of many students. Several hundred
conjectures have been obtained, most of them in a fully automated
way, about half of them proved automatically, and many others proved
by various mathematicians.
John
MacGregor
University Professor
Dofasco Professor of Process Automation and IT
McMaster University
Latent
Variable Methods for Process Analysis, Monitoring and Design
This paper gives an overview
of latent variable methods for utilizing large process data matrices.
Latent variable methods such as Principal Component Analysis (PCA)
and Projection to Latent Structures (PLS) are shown to be well suited
for obtaining useful subspace models for treating a variety of important
industrial problems. These models handle the reduced rank nature
of the data, they are symmetric in the way they treat the X and
Y spaces, and they provide models for both the X and Y spaces. The
methods are illustrated with industrial examples in the following
areas: (i) the analysis of historical databases and trouble-shooting
process problems; (ii) process monitoring; (iii) extraction of information
from multivariate sensors; and (iv) the development of new processes
and products. In each of these problems latent variable models provide
the framework on which solutions are based.
Bio:
JOHN MACGREGOR received
his PhD degree in Statistics, and his MSc degrees in both Statistics
and in Chemical Engineering from the University of Wisconsin, and
his Bachelor of Engineering degree from McMaster University in Canada.
After working in industry for several years as a process specialist
with Monsanto Company in Texas, he joined the Department of Chemical
Engineering at McMaster University in 1972. He is currently holds
the title of “University Professor” as well as the Dofasco
Chair in Process Automation and Information Technology at McMaster
University. He is a cofounder of the McMaster Advanced Control Consortium
that is sponsored by many international companies.
Dr. MacGregor is a Fellow
of the American Statistical Association, and has received many awards
for his work in applied statistics and chemometrics, among them,
the Shewhart Medal and the W.G. Hunter Award from the American Society
for Quality, and the Herman Wold Medal from the Swedish Chemical
Society. He is a member of the Canadian Academy of Engineering and
has received many awards from engineering societies, among them,
the Century of Achievement Award from the Canadian Society for Chemical
Engineering, and the Computing and Systems Technology Award from
the American Institute of Chemical Engineers.
Alex
Rubinov
Professor and Director of Center for Informatics and Applied
Optimization,
University of Ballarat, Australia
Unsupervised and Supervised Classification
via Nonsmooth Optimization
Many problems of unsupervised
classification (clustering) and supervised classification can be
reduced to the minimization of sum-min or max-min of convex functions
or even more complicated functions. In contrast with popular models,
the dimension of corresponding optimization problems is relatively
small and these problems do not contain integer variables. We discuss
these problems and also a
numerical method for non-smooth optimization that can be used for
their solution. This method is a combination of the local Discrete
Gradient method introduced by Adil Bagirov and the global Cutting
Angle method. We also discuss results of numerical experiments.
Bio:
Professor Rubinov has
worked on many areas in the field of mathematical programming and
its application in data mining and economics. He has published 17
research monographs and more than 170 research papers. He has supervised
more than 30 PhD students and served in the editor board of several
well-know optimization journals J. Global Optimization, Optimization
Methods and Software etc.
Yong
Shi
Charles & Margre Durham Distinguished Professor of Information
Technology College of Information Science and Technology
University of Nebraska at Omaha
Data Mining Techniques
via Multiple Criteria Optimization Approaches
This talk presents a
comprehensive view regarding multiple criteria optimization approaches
to data mining. Traditionally, researchers have studied various
methods by using linear programming (LP) to solve discriminent problems
with a small sample size of data. These methods can be considered
as LP approach to classification in data mining. Recently, the author
and his colleagues extend such a research idea into classification
via multiple criteria linear programming (MCLP) and multiple criteria
quadratic programming (MCQP). This new approach has been tested
by several large real-life databases and has outperformed some know
data mining models. The tutorial will first outline the development
of LP, MCLP and MCQP techniques. Then, it will focus on the development
of various multiple criteria optimization based data mining techniques
and algorithms. Finally, it will show the applications of this approach
in the real-life credit card risk management decisions, information
intrusion and bioinformatics.
Bio:
Professor Shi is the
director of Chinese Academy of Sciences Research Center on Data
Technology & Knowledge Economy, President’s Assistant
of the graduate school of Chinese Academy of Sciences. He had been
the Charles W. and Margre H. Durham Distinguished Professor of Information
Technology, College of Information Science and Technology, University
of Nebraska at Omaha in 1999-2004. Dr. Shi's research interests
covers data mining, information overload, multiple criteria decision
making and telecommunication management. He has published seven
books and more than sixty papers in various journals He is the Editor-in-Chief
of International Journal of Information Technology and Decision
Making (SCI) and an Area Editor of International Journal of Operations
and Quantitative Management. Dr. Shi has received many distinguished
awards including Outstanding Young Scientist Award, National Natural
Science Foundation of China, 2001; Member of Overseas Assessor for
the Chinese Academy of Sciences, May 2000; and Speaker of Distinguished
Visitors Program (DVP) for 1997-2000, IEEE Computer Society. He
has consulted in a number of famous companies.
Martin
Wainwright
EECS,
Berkeley
www.eecs.berkeley.edu/~wainwrig
Mathematical
Programming and Statistical Models Based on Graphs
(SLIDES)
Graphical models provide
a useful framework for representing and manipulating statistical
dependencies among large collections of random variables. Accordingly,
these models are used and studied in various fields, including machine
learning, computational biology, image processing, and error-control
coding. At the core of applying a graphical model lies a common
set of challenging problems, including the computation of marginals,
modes and log likelihoods, and learning parameters from data. Exact
solutions are computationally prohibitive for general graphical
models, which motivates the development of approximate methods.
In this talk, we discuss
the use of variational methods to generate deterministic approximations
to marginals and likelihoods. Working within the framework of exponential
families, we describe how the computation of marginals, modes, and
log likelihoods can be formulated as the solution of low-dimensional
convex optimization problems. For tree-structured graphs, these
optimization problems can be solved efficiently by local message-passing
algorithms (i.e., a form of dynamic programming). In general, however,
these convex programs are intractable to solve exactly, but nonetheless
can be approximated and relaxed in a number of ways. For intractable
models, we discuss how a variety of known algorithms --- including
mean field, belief propagation (sum-product), max-product, and cluster
variational methods --- can all be understood as solving particular
non-convex approximations to the true variational principle. We
also touch on other new methods that are based on convex relaxations,
and describe various open research directions that are suggested
by this perspective.
Bio:
Martin Wainwright is
currently an assistant professor at University of California at
Berkeley, with a joint appointment between the Department of Statistics
and the Department of Electrical Engineering and Computer Sciences.
He received a Bachelor's degree in Mathematics from the University
of Waterloo, Canada, and the Ph.D. degree in Electrical Engineering
and Computer Science from Massachusetts Institute of Technology
(MIT) in Cambridge, MA. His research interests include statistical
signal and image processing, source and channel coding, optimization,
and machine learning.
Stephen
Wright
Department
of Computer Science
University of Wisconsin-Madison
A Review of Some
Optimization Techniques in Machine Learning and Statistics
(SLIDES)
We review some optimization
techniques that have been proposed recently for use in machine learning
and statistics applications. These include interior-point and pivoting
methods for solving the (parametrized) quadratic and linear programs
that arise in support vector machines and in regression problems
such as the LASSO and its extensions. Issues of formulating the
problems as optimization problems and of using existing software
packages will also be discussed.
Bio:
Professor Wright is in
the Computer Sciences Department at the University of Wisconsin-Madison.
From 1990-2001, he worked at Argonne National Laboratory, and prior
to that held appointments at North Carolina State University and
the University of Arizona. His interests lie in continuous optimization
and its applications. He is the author of more than 60 papers, and
three well-received books on interior-point methods, numerical optimization
and optimization software. He is in the editorial board of SIAM
Journal on Scientific Computing, and editor-in-chief of Math Programming
(series B).
Stanley
Young
Assistant Director for Bioinformatics
National
Institute of Statistical Sciences
Identifying and
Solving Important/Complex Problems
(SLIDES1) (SLIDES2)
With a large data set
you can ask lots of questions. Will you trick yourself and/or your
customer? Suppose you have tens of thousands to millions of predictors.
What to do? How can you sample densely from a ten dimensional space?
What can you do to explore a large block of data when there are
missing data points and outliers? These are important questions
that come up in applied drug discovery research. There is a need
for a work strategy to identify and find workable solutions for
important problems. My idea has always been to entice bright people
to join in the fun of finding a solution. Solutions usually involve
subject matter knowledge, statistics and computer science. Several
small ideas or insights can often be put together for a workable
solution. This talk will cover several problems and offer thoughts
on why things worked out well. The hope is to inspire you to identify
and tackle your own important problems.
Bio:
S. Stanley Young has
worked on statistical aspects of biology and chemistry problems
for over 25 years, first at Eli Lilly and then at GlaxoSmithKline.
He led internal and external research teams developing novel statistical
methods and software for drug discovery with two issued patents
for statistical algorithms and over twenty papers on the application
of statistical methods to drug discovery. He is a Fellow of the
American Statistical Association and has five “best paper”
awards.
Dr. Young was one of
the first to apply the data mining method, recursive partitioning,
to chemistry problems. He led a team of computational chemists,
computer scientists and statisticians to develop a highly efficient
version of RP for the analysis of high throughput screening data.
For some time this program held the “land speed” record
for RP – 200k records and 2.8M 0/1 descriptors. RP is an important
technique as it is capable of discerning multiple mechanisms of
biological action within the same data set.
Large blocks data are
becoming quite important, e.g. micro array data. These large data
sets often contain missing data and outliers. A robust method of
singular value decomposition was developed that deals with missing
data and outliers.
Dr. Young is currently
the Assistant Director for Bioinofomatics at the National Institute
of Statistical Sciences. He is also an adjunct professor at North
Carolina State University, the University of Waterloo and the University
of British Columbia.
|