Dr. Sivan Sabato

Associate Professor

Department of Computing and Software, McMaster University

Canada CIFAR AI Chair, Vector Institute

sabatos [at] mcmaster [dot] ca

Differentially Private Source-Target Clustering

The following github project provides the code of the algorithm and experiments described in the following paper:

S. Schnapp, S. Sabato, "Differentially Private Source-Target Clustering", Transactions of Machine Learning Research, 2025. [link to publication]

Fast Distributed k-Means with a Small Number of Rounds

The following github project provides the code of the algorithm and experiments described in the following paper:

T. Hess, R. Visbord, S. Sabato, "Fast Distributed k-Means with a Small Number of Rounds", Proceedings of the Twenty Sixth International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 206:850--874, 2023. [link to publication]

Active Structure Learning of Bayesian Networks

The following github project provides the code of the algorithm and experiments described in the following paper:

N. Ben-David, S. Sabato, "Active Structure Learning of Bayesian Networks in an Observational Setting", Journal of Machine Learning Research, 23(188):1--38, 2022. [link to publication]

Fast Combinatorial Pure Exploration

The following github project provides the code of the algorithm and experiments described in the following paper:

N. Ben-David, S. Sabato, "A Fast Algorithm for PAC Combinatorial Pure Exploration; AAAI-22.

Approximation using Weight Queries

The following github project provides the code of the algorithm and experiments described in the following paper:

N. Barak, S. Sabato, "Approximating a Distribution using Weight Queries", ICML 2021.

Active Feature Selection

The following github project provides the code of the algorithm and experiments described in the following paper:

S. Schnapp, S. Sabato, "Active Feature Selection for the Mutual Information Criterion", AAAI 2021.

Classifier Fairness and Accuracy

The following github project provides the code of the algorithm and experiments described in the following paper:

S. Sabato, E. Yom-Tov, "Bounding the fairness and accuracy of classifiers from population statistics", The 37th International Conference on Machine Learning (ICML), 2020,

Sequential no-Substitution Clustering

The following github project provides the code of the algorithm and experiments described in the following paper:

T. Hess, S. Sabato, "Sequential no-Substitution k-Median-Clustering", The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.

Temporal Anomaly Detection

The following github project provides the code for the anomaly detection algorithm, as well as the TDA data set, which are described in the following paper:

E. Gutflaish, A. Kontorovich, S. Sabato, O. Biller, O. Sofer, "Temporal anomaly detection: calibrating the surprise", 33rd AAAI Conference on Artificial Intelligence (AAAI 2019),

Bacterial Pathogenicity Classification

The following github project provides the code of the algorithm and experiments described in the following paper:

E. Barash, N. Sal-Man, S. Sabato, M. Ziv-Ukelson, "BacPaCS-Bacterial Pathogenicity Classification via Sparse-SVM", Bioinformatics, 35(12):2001-2008, 2018.

Regression with Heavy Tails

The following Matlab code implements the linear regression algorithms described in

D. Hsu and S. Sabato, "Heavy-tailed Regression with a Generalized Median-of-means", The 31st International Conference on Machine Learning (ICML), 2014.

Matlab code for regression with heavy tails

Since this algorithm is geared towards low squared loss with high probability, its benefits are best demonstrated by running a large number of experiments with different random samples, and estimating the worst loss achieved in, for instance, 95% of the experiments.

Feature Multi-Selection

The Matlab code below implements the algorithms described in

S. Sabato and A. Kalai, "Feature Multi-Selection among Subjective Features", The 30th International Conference on Machine Learning (ICML), 2013.
The algorithms perform feature multi-selection for regression, for features that are obtained by noisy measurements.

Input: a labeled training set with two measurements for each feature value, and a limit to the total number of feature measurements allowed for an example during test time.
Output: a linear predictor, along with counts of the number of times each feature should be measured at prediction time.

Matlab Code: