100 research outputs found
Cost-sensitive classification based on Bregman divergences
The main object of this PhD. Thesis is the identification, characterization and
study of new loss functions to address the so-called cost-sensitive classification. Many
decision problems are intrinsically cost-sensitive. However, the dominating preference
for cost-insensitive methods in the machine learning literature is a natural consequence
of the fact that true costs in real applications are di fficult to evaluate.
Since, in general, uncovering the correct class of the data is less costly than any
decision error, designing low error decision systems is a reasonable (but suboptimal)
approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad.
Our proposal relies on Bayes decision theory where the goal is to assign instances
to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability
estimates at the classifier output requires a suitable learning machine, a large enough
representative data set as well as an adequate loss function to be minimized during
learning. The design of the loss function can be aided by the costs: classical decision
theory shows that cost matrices de ne class boundaries determined by posterior class
probability estimates. Strictly speaking, in order to make optimal decisions, accurate
probability estimates are only required near the decision boundaries. It is key to
point out that the election of the loss function becomes especially relevant when
the prior knowledge about the problem is limited or the available training examples
are somehow unsuitable. In those cases, different loss functions lead to dramatically
different posterior probabilities estimates. We focus our study on the set of Bregman
divergences. These divergences offer a rich family of proper losses that has recently
become very popular in the machine learning community [Nock and Nielsen, 2009,
Reid and Williamson, 2009a].
The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost
matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome
binary tasks. Following this idea, two lines are explored:
Cost-sensitive supervised classification: We derive several asymptotic results.
The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems.
Cost-sensitive semi-supervised classification: When labeled data is
scarce but unlabeled data is widely available, semi-supervised learning is an
useful tool to make the most of the unlabeled data. We discuss an optimization
problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation.
The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas:
Foundations of sequences of Bregman divergences: We generalize some
previous results about the design and characterization of Bregman divergences
that are suitable for learning and their relationship with convexity. In addition,
we aim to broaden the subset of Bregman divergences that are interesting for
cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases.
Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true.
We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates
New information inequalities on new generalized f-divergence and applications
In this work, we introduce new information inequalities on new generalized f-divergence in terms of well known Chi-square divergence. Further we obtain relations of other standard divergence as an application of new inequalities by using Logarithmic power mean and Identric mean, together with numerical verification by taking two discrete probability distributions: Binomial and Poisson
Evolving Clustering Algorithms And Their Application For Condition Monitoring, Diagnostics, & Prognostics
Applications of Condition-Based Maintenance (CBM) technology requires effective yet generic data driven methods capable of carrying out diagnostics and prognostics tasks without detailed domain knowledge and human intervention. Improved system availability, operational safety, and enhanced logistics and supply chain performance could be achieved, with the widespread deployment of CBM, at a lower cost level. This dissertation focuses on the development of a Mutual Information based Recursive Gustafson-Kessel-Like (MIRGKL) clustering algorithm which operates recursively to identify underlying model structure and parameters from stream type data. Inspired by the Evolving Gustafson-Kessel-like Clustering (eGKL) algorithm, we applied the notion of mutual information to the well-known Mahalanobis distance as the governing similarity measure throughout. This is also a special case of the Kullback-Leibler (KL) Divergence where between-cluster shape information (governed by the determinant and trace of the covariance matrix) is omitted and is only applicable in the case of normally distributed data. In the cluster assignment and consolidation process, we proposed the use of the Chi-square statistic with the provision of having different probability thresholds. Due to the symmetry and boundedness property brought in by the mutual information formulation, we have shown with real-world data that the algorithm’s performance becomes less sensitive to the same range of probability thresholds which makes system tuning a simpler task in practice. As a result, improvement demonstrated by the proposed algorithm has implications in improving generic data driven methods for diagnostics, prognostics, generic function approximations and knowledge extractions for stream type of data.
The work in this dissertation demonstrates MIRGKL’s effectiveness in clustering and knowledge representation and shows promising results in diagnostics and prognostics applications
Uterine contractions clustering based on surface electromyography: an input for pregnancy monitoring
Tese de mestrado em BioestatÃstica, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, em 2018Inicialmente a investigação da contratilidade uterina recorria à utilização de dois métodos: o tocograma externo e o cateter de pressão intrauterino. Ambos os métodos apresentam limitações ao nÃvel da avaliação do risco de parto prematuro e na monitorização da gravidez. O EHG (Electrohisterograma) é um método alternativo ao tocograma externo e ao cateter de pressão intrauterino. Este método pode ser aplicado de forma invasiva no músculo uterino, ou de forma não invasiva através de elétrodos colocados no abdómen. O EHG tem sido considerado uma ferramenta adequada para a monitorização da gravidez e do parto. O Ãndice de massa corporal tem um impacto quase impercetÃvel no EHG, sendo esta uma das principais caracterÃsticas deste método. O EHG pode também ser utilizado para identificar as mulheres que vão entrar em trabalho de parto e ainda auxiliar na tomada de decisão médica quanto à utilização da terapia tocolÃtica (antagonista da oxitocina), evitando deste modo a ingestão de medicação desnecessária e os consequentes efeitos secundários. Na literatura existem apenas cinco casos publicados em que foi realizada uma separação dos principais eventos do sinal EHG: contrações, movimentos fetais, ondas Alvarez e ondas LDBF (Longue Durée Basse Fréquence). Em três das publicações a separação dos eventos foi feita manualmente e nos restantes casos algoritmos, como redes neuronais, foram aplicados ao EHG. As ondas Alvarez e as Braxton-Hicks são as mais reconhecidas. As ondas Alvarez foram descritas pela primeira vez nos anos cinquenta e as Braxton-Hicks foram descritas pela primeira vez em 1872 sendo detetadas através de palpação. As ondas Alvarez são ocasionalmente sentidas pela mulher. Estas ondas estão localizadas numa pequena área do tecido uterino sem propagação e podem levar a contrações com maior intensidade e, consequentemente, ao parto pré-termo. As Braxton-Hicks são contrações ineficientes registadas a partir da 20ª semana de gravidez que se tornam mais frequentes e intensas com o decorrer da gravidez. Estas contrações são menos localizadas que as ondas Alvarez e, durante o parto, propagam-se por todo o tecido uterino num curto perÃodo de tempo. As Braxton-Hicks estão associadas a uma diminuição do ritmo cardÃaco fetal. As ondas LDBF são contrações de longa duração associadas a hipertonia uterina, quando há contração do tecido uterino sem retorno ao relaxamento muscular, o que representa um risco na gravidez. Neste trabalho foram utilizadas duas bases de dados. Na base de dados da Islândia existem 122 registos de 45 mulheres, dos quais apenas 4 correspondem a partos pré-termo. Na base de dados TPEHG (Term-Preterm EHG) existem 300 registos, dos quais 38 correspondem a partos pré-termo. Neste trabalho foram escolhidos canais bipolares, visto que estes reduzem o ruÃdo idêntico, como o ECG (Eletrocardiograma) materno ou movimentos respiratórios. Para ambas as bases de dados os sinais originais de EHG foram processados e filtrados. Na estimação espetral foram considerados dois métodos: paramétricos e não paramétricos. O método Welch foi escolhido pois representa um bom compromisso entre ambos. Este método foi utilizado para calcular o espectro de cada evento detetado no sinal EHG. Para detetar os eventos no sinal EHG foram considerados cinco métodos baseados na energia ou amplitude. O método Wavelet foi o escolhido pois após uma inspeção visual, este era o método que delineava melhor as contrações. Na base de dados da Islândia foram identificadas 3136 contrações e na TPEHG foram encontradas 4622 contrações. O objetivo principal desta tese é obter clusters de contrações detetadas no sinal EHG. No entanto, as contrações são séries temporais não estacionárias, e a sua classificação visual é inviável a longo termo e também difÃcil de aplicar na prática clÃnica. Existem vários parâmetros que podem ser extraÃdos do sinal EHG, mas o espectro das contrações foi o método escolhido visto que este representa o sinal EHG e tem sempre a mesma dimensão, independentemente da duração da contração. As distâncias espetrais têm sido utilizadas com sucesso no reconhecimento áudio. Neste trabalho foi realizada uma aplicação desse método ao processamento do EHG, no qual foram realizados os ajustes necessários. Para comparar os espectros foram estudadas 8 distâncias diferentes: Itakura-Saito, COSH, Itakura, Itakura simétrica, Kullback-Leibler, Jeffrey, Rényi e Jensen-Rényi. Apenas as distâncias simétricas foram selecionadas para um estudo mais detalhado visto que estas são, segundo a literatura, as distâncias mais adequadas aquando do clustering. Após comparação das distâncias simétricas, a divergência de Jeffrey foi a selecionada para a comparação dos espectros. Nesta tese foram avaliados três métodos diferentes de clustering: o linkage, o K-means e o K-medoids. O linkage é um método hierárquico. Os clusters que resultam do agrupamento hierárquico estão organizados numa estrutura chamada dendrograma. No agrupamento hierárquico, não é necessário predeterminar o número de clusters, o que torna este um método ideal na exploração dos dados. O K-means e o K-medoids são métodos de partição, nos quais os dados são separados em k clusters decididos previamente. Os clusters são definidos de forma a otimizar a função da distância. No algoritmo K-means, os clusters baseiam-se na proximidade entre si de acordo com uma distância predeterminada. A diferença entre o K-medoids e o K-means é que o K-medoids escolhe pontos de dados como centros, chamados de medoides, enquanto K-means usa centróides. Após uma comparação dos diferentes métodos de clustering foi escolhido neste trabalho foi o average linkage, visto que este apresentava melhores resultados quer na separação dos espectros quer na silhueta. É então apresentado um método inovador no qual se utiliza todo o espectro das contrações detetadas automaticamente no EHG para o clustering não supervisionado. Esta técnica é uma contribuição para a classificação automática das diferentes contrações, especialmente aquelas mais reconhecidas na literatura: Alvarez e Braxton-Hicks. Era expectável encontrar um cluster isolado com as ondas LDBF, visto que estas representam um risco para o feto. O principal objetivo era juntar num cluster os espectros semelhantes das contrações, e relacioná-lo com o respetivo tipo de contração. Essa tarefa foi concluÃda através da identificação positiva de Alvarez e Braxton-Hicks. O clustering forneceu ainda algumas pistas sobre ondas Alvarez que não foram encontradas com o algoritmo de deteção de contrações, situação para a qual um método alternativo é apresentado. É sugerido que as ondas Alvarez sejam detetadas com métodos baseados na frequência, como, por exemplo, a frequência instantânea, no entanto este método não foi desenvolvido neste trabalho. Em relação à s ondas LDBF, estas foram encontradas no cluster das Braxton-Hicks. É sugerido que a deteção das ondas LDBF seja baseada na sua caraterÃstica mais distinta: a longa duração. Verificou-se que os casos pré-termo e os registos pré-parto não ficaram isolados num cluster, não se tendo encontrado uma relação entre a idade gestacional e o tipo de contração. Conclui-se que as contrações mais curtas apresentam maior amplitude do que as contrações com maior duração. Baseado em estudos anteriores sobre a eletrofisiologia do útero, supõem-se que o inÃcio do trabalho de parto pré-termo e termo esteja associado a sequências especÃficas de diferentes tipos de contrações, nas quais as ondas Alvares desempenham um papel importante. As contrações identificadas como Alvarez e Braxton-Hicks não são usadas como tal na prática clÃnica apesar de a maioria das contrações detetadas pelo tocograma serem Braxton-Hicks. O interesse pelas ondas Alvarez diminuiu rapidamente visto que estas ondas são praticamente indetetáveis pelo método de referência de deteção de contrações: o tocograma. As capacidades e a resolução do EHG levaram à renovação do estudo das contrações mais subtis, incluindo as Alvarez. Este trabalho é uma contribuição para a investigação nesta área.An innovative technique is introduced wherein where an unsupervised clustering method using as feature the whole spectrum of automatically detected contractions on the EHG (Electrohysterogram) is presented as a contribution to the automatic classification of the different uterine contractions, at least those that have been most recognized in the literature: Alvarez and Braxton-Hicks. It was expected to also be able to cluster the LDBF (Longue Durée Basse Fréquence) components, as these pose a fetal risk. The main task was to have the spectral contractions descriptions clustered and linked to the respective contraction type. That task was completed with positive identification of the Alvarez and Braxton-Hicks. The clustering process also provided clues regarding the missed Alvarez waves in the contraction detection algorithm, for which an alternative technique is suggested but not developed in this work. Regarding the LDBF they were found in the Braxton-Hicks cluster. It is suggested the LDBF´s to be detected based in their most prominent feature: the long duration. It is presented the rationale behind the selection of a cost function to be used in the spectral distance’s algorithm. Spectral distances have been successfully used in audio recognition and this works represents an application to the EHG processing, for which the necessary adjustments have to be implemented. It was found that no single cluster pointed to the preterm cases, or indeed to the pre-labor subject recordings. It is hypothesized, based on previous studies in uterine electrophysiology, that the initiation of pre-term or term labor should be associated with triggering contraction sequences of different types, where the Alvarez waves play a major role. Alvarez and Braxton-Hicks, labeled as such, are not typically used in the clinical environment despite most of the Tocogram detected contractions being the latter. Alvarez waves are not usually detectable by the Tocogram. Alvarez were firstly detected invasively in the early fifties, and Braxton-Hicks in 1872 using routine palpation techniques. The interest in Alvarez components declined rapidly since being practically undetectable by the de facto reference in the contraction detection: the Tocogram. The EHG capabilities and resolution made it possible to revive the research on the most subtle uterine contractions, Alvarez included and this work is a contribution in this research area
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Learning in the Real World: Constraints on Cost, Space, and Privacy
The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand.
However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems.
We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data
Recommended from our members
Learning and validating clinically meaningful phenotypes from electronic health data
The ever-growing adoption of electronic health records (EHR) to record patients' health journeys has resulted in vast amounts of heterogeneous, complex, and unwieldy information [Hripcsak and Albers, 2013]. Distilling this raw data into clinical insights presents great opportunities and challenges for the research and medical communities. One approach to this distillation is called computational phenotyping. Computational phenotyping is the process of extracting clinically relevant and interesting characteristics from a set of clinical documentation, such as that which is recorded in electronic health records (EHRs). Clinicians can use computational phenotyping, which can be viewed as a form of dimensionality reduction where a set of phenotypes form a latent space, to reason about populations, identify patients for randomized case-control studies, and extrapolate patient disease trajectories. In recent years, high-throughput computational approaches have made strides in extracting potentially clinically interesting phenotypes from data contained in EHR systems.
Tensor factorization methods have shown particular promise in deriving phenotypes. However, phenotyping methods via tensor factorization have the following weaknesses: 1) the extracted phenotypes can lack diversity, which makes them more difficult for clinicians to reason about and utilize in practice, 2) many of the tensor factorization methods are unsupervised and do not utilize side information that may be available about the population or about the relationships between the clinical characteristics in the data (e.g., diagnoses and medications), and 3) validating the clinical relevance of the extracted phenotypes requires domain training and expertise. This dissertation addresses all three of these limitations. First, we present tensor factorization methods that discover sparse and concise phenotypes in unsupervised, supervised, and semi-supervised settings. Second, via two tools we built, we show how to leverage domain expertise in the form of publicly available medical articles to evaluate the clinical validity of the discovered phenotypes. Third, we combine tensor factorization and the phenotype validation tools to guide the discovery process to more clinically relevant phenotypes.Computational Science, Engineering, and Mathematic
- …