196 research outputs found
Totally Corrective Multiclass Boosting with Binary Weak Learners
In this work, we propose a new optimization framework for multiclass boosting
learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two
successful multiclass boosting algorithms, which can use binary weak learners.
We explicitly derive these two algorithms' Lagrange dual problems based on
their regularized loss functions. We show that the Lagrange dual formulations
enable us to design totally-corrective multiclass algorithms by using the
primal-dual optimization technique. Experiments on benchmark data sets suggest
that our multiclass boosting can achieve a comparable generalization capability
with state-of-the-art, but the convergence speed is much faster than stage-wise
gradient descent boosting. In other words, the new totally corrective
algorithms can maximize the margin more aggressively.Comment: 11 page
Rule Generation for Classification: Scalability, Interpretability, and Fairness
We introduce a new rule-based optimization method for classification with
constraints. The proposed method leverages column generation for linear
programming, and hence, is scalable to large datasets. The resulting pricing
subproblem is shown to be NP-Hard. We recourse to a decision tree-based
heuristic and solve a proxy pricing subproblem for acceleration. The method
returns a set of rules along with their optimal weights indicating the
importance of each rule for learning. We address interpretability and fairness
by assigning cost coefficients to the rules and introducing additional
constraints. In particular, we focus on local interpretability and generalize
separation criterion in fairness to multiple sensitive attributes and classes.
We test the performance of the proposed methodology on a collection of datasets
and present a case study to elaborate on its different aspects. The proposed
rule-based learning method exhibits a good compromise between local
interpretability and fairness on the one side, and accuracy on the other side
Responsible AI (RAI) Games and Ensembles
Several recent works have studied the societal effects of AI; these include
issues such as fairness, robustness, and safety. In many of these objectives, a
learner seeks to minimize its worst-case loss over a set of predefined
distributions (known as uncertainty sets), with usual examples being perturbed
versions of the empirical distribution. In other words, aforementioned problems
can be written as min-max problems over these uncertainty sets. In this work,
we provide a general framework for studying these problems, which we refer to
as Responsible AI (RAI) games. We provide two classes of algorithms for solving
these games: (a) game-play based algorithms, and (b) greedy stagewise
estimation algorithms. The former class is motivated by online learning and
game theory, whereas the latter class is motivated by the classical statistical
literature on boosting, and regression. We empirically demonstrate the
applicability and competitive performance of our techniques for solving several
RAI problems, particularly around subpopulation shift
Sparse machine learning methods with applications in multivariate signal processing
This thesis details theoretical and empirical work that draws from two main subject areas: Machine
Learning (ML) and Digital Signal Processing (DSP). A unified general framework is given for the application
of sparse machine learning methods to multivariate signal processing. In particular, methods that
enforce sparsity will be employed for reasons of computational efficiency, regularisation, and compressibility.
The methods presented can be seen as modular building blocks that can be applied to a variety
of applications. Application specific prior knowledge can be used in various ways, resulting in a flexible
and powerful set of tools. The motivation for the methods is to be able to learn and generalise from a set
of multivariate signals.
In addition to testing on benchmark datasets, a series of empirical evaluations on real world
datasets were carried out. These included: the classification of musical genre from polyphonic audio
files; a study of how the sampling rate in a digital radar can be reduced through the use of Compressed
Sensing (CS); analysis of human perception of different modulations of musical key from
Electroencephalography (EEG) recordings; classification of genre of musical pieces to which a listener
is attending from Magnetoencephalography (MEG) brain recordings. These applications demonstrate
the efficacy of the framework and highlight interesting directions of future research
Specificity, Privacy, and Degeneracy in the CD4 T Cell Receptor Repertoire Following Immunization.
T cells recognize antigen using a large and diverse set of antigen-specific receptors created by a complex process of imprecise somatic cell gene rearrangements. In response to antigen-/receptor-binding-specific T cells then divide to form memory and effector populations. We apply high-throughput sequencing to investigate the global changes in T cell receptor sequences following immunization with ovalbumin (OVA) and adjuvant, to understand how adaptive immunity achieves specificity. Each immunized mouse contained a predominantly private but related set of expanded CDR3β sequences. We used machine learning to identify common patterns which distinguished repertoires from mice immunized with adjuvant with and without OVA. The CDR3β sequences were deconstructed into sets of overlapping contiguous amino acid triplets. The frequencies of these motifs were used to train the linear programming boosting (LPBoost) algorithm LPBoost to classify between TCR repertoires. LPBoost could distinguish between the two classes of repertoire with accuracies above 80%, using a small subset of triplet sequences present at defined positions along the CDR3. The results suggest a model in which such motifs confer degenerate antigen specificity in the context of a highly diverse and largely private set of T cell receptors
- …