69 research outputs found

    Transductive Ordinal Regression

    Full text link
    Ordinal regression is commonly formulated as a multi-class problem with ordinal constraints. The challenge of designing accurate classifiers for ordinal regression generally increases with the number of classes involved, due to the large number of labeled patterns that are needed. The availability of ordinal class labels, however, is often costly to calibrate or difficult to obtain. Unlabeled patterns, on the other hand, often exist in much greater abundance and are freely available. To take benefits from the abundance of unlabeled patterns, we present a novel transductive learning paradigm for ordinal regression in this paper, namely Transductive Ordinal Regression (TOR). The key challenge of the present study lies in the precise estimation of both the ordinal class label of the unlabeled data and the decision functions of the ordinal classes, simultaneously. The core elements of the proposed TOR include an objective function that caters to several commonly used loss functions casted in transductive settings, for general ordinal regression. A label swapping scheme that facilitates a strictly monotonic decrease in the objective function value is also introduced. Extensive numerical studies on commonly used benchmark datasets including the real world sentiment prediction problem are then presented to showcase the characteristics and efficacies of the proposed transductive ordinal regression. Further, comparisons to recent state-of-the-art ordinal regression methods demonstrate the introduced transductive learning paradigm for ordinal regression led to the robust and improved performance

    Data mining using concepts of independence, unimodality and homophily

    Get PDF
    With the widespread use of information technologies, more and more complex data is generated and collected every day. Such complex data is various in structure, size, type and format, e.g. time series, texts, images, videos and graphs. Complex data is often high-dimensional and heterogeneous, which makes the separation of the wheat (knowledge) from the chaff (noise) more difficult. Clustering is a main mode of knowledge discovery from complex data, which groups objects in such a way that intra-group objects are more similar than inter-group objects. Traditional clustering methods such as k-means, Expectation-Maximization clustering (EM), DBSCAN and spectral clustering are either deceived by "the curse of dimensionality" or spoiled by heterogenous information. So, how to effectively explore complex data? In some cases, people may only have some partial information about the complex data. For example, in social networks, not every user provides his/her profile information such as the personal interests. Can we leverage the limited user information and friendship network wisely to infer the likely labels of the unlabeled users so that the advertisers can do accurate advertising? This is the problem of learning from labeled and unlabeled data, which is literarily attributed to semi-supervised classification. To gain insights into these problems, this thesis focuses on developing clustering and semi-supervised classification methods that are driven by the concepts of independence, unimodality and homophily. The proposed methods leverage techniques from diverse areas, such as statistics, information theory, graph theory, signal processing, optimization and machine learning. Specifically, this thesis develops four methods, i.e. FUSE, ISAAC, UNCut, and wvGN. FUSE and ISAAC are clustering techniques to discover statistically independent patterns from high-dimensional numerical data. UNCut is a clustering technique to discover unimodal clusters in attributed graphs in which not all the attributes are relevant to the graph structure. wvGN is a semi-supervised classification technique using the theory of homophily to infer the labels of the unlabeled vertices in graphs. We have verified our clustering and semi-supervised classification methods on various synthetic and real-world data sets. The results are superior to those of the state-of-the-art.Täglich werden durch den weit verbreiteten Einsatz von Informationstechnologien mehr und mehr komplexe Daten generiert und gesammelt. Diese komplexen Daten unterscheiden sich in der Struktur, Größe, Art und Format. Häufig anzutreffen sind beispielsweise Zeitreihen, Texte, Bilder, Videos und Graphen. Dabei sind diese Daten meist hochdimensional und heterogen, was die Trennung des Weizens ( Wissen ) von der Spreu ( Rauschen ) erschwert. Die Cluster Analyse ist dabei eine der wichtigsten Methoden um aus komplexen Daten wssen zu extrahieren. Dabei werden die Objekte eines Datensatzes in einer solchen Weise gruppiert, dass intra-gruppierte Objekte ähnlicher sind als Objekte anderer Gruppen. Der Einsatz von traditionellen Clustering-Methoden wie k-Means, Expectation-Maximization (EM), DBSCAN und Spektralclustering wird dabei entweder "durch der Fluch der Dimensionalität" erschwert oder ist angesichts der heterogenen Information nicht möglich. Wie erforscht man also solch komplexe Daten effektiv? Darüber hinaus ist es oft der Fall, dass für Objekte solcher Datensätze nur partiell Informationen vorliegen. So gibt in sozialen Netzwerken nicht jeder Benutzer seine Profil-Informationen wie die persönlichen Interessen frei. Können wir diese eingeschränkten Benutzerinformation trotzdem in Kombination mit dem Freundschaftsnetzwerk nutzen, um von von wenigen, einer Klasse zugeordneten Nutzern auf die anderen zu schließen. Beispielsweise um zielgerichtete Werbung zu schalten? Dieses Problem des Lernens aus klassifizierten und nicht klassifizierten Daten wird dem semi-supversised Learning zugeordnet. Um Einblicke in diese Probleme zu gewinnen, konzentriert sich diese Arbeit auf die Entwicklung von Clustering- und semi-überwachten Klassifikationsmethoden, die von den Konzepten der Unabhängigkeit, Unimodalität und Homophilie angetrieben werden. Die vorgeschlagenen Methoden nutzen Techniken aus verschiedenen Bereichen der Statistik, Informationstheorie, Graphentheorie, Signalverarbeitung, Optimierung und des maschinelles Lernen. Dabei stellt diese Arbeit vier Techniken vor: FUSE, ISAAC, UNCut, sowie wvGN. FUSE und ISAAC sind Clustering-Techniken, um statistisch unabhängige Muster aus hochdimensionalen numerischen Daten zu entdecken. UNCut ist eine Clustering-Technik, um unimodale Cluster in attributierten Graphen zu entdecken, in denen die Kanten und Attribute heterogene Informationen liefern. wvGN ist eine halbüberwachte Klassifikationstechnik, die Homophilie verwendet, um von gelabelten Kanten auf ungelabelte Kanten im Graphen zu schließen. Wir haben diese Clustering und semi-überwachten Klassifizierungsmethoden auf verschiedenen synthetischen und realen Datensätze überprüft. Die Ergebnisse sind denen von bisherigen State-of-the-Art-Methoden überlegen

    A submodular optimization framework for never-ending learning : semi-supervised, online, and active learning.

    Get PDF
    The revolution in information technology and the explosion in the use of computing devices in people\u27s everyday activities has forever changed the perspective of the data mining and machine learning fields. The enormous amounts of easily accessible, information rich data is pushing the data analysis community in general towards a shift of paradigm. In the new paradigm, data comes in the form a stream of billions of records received everyday. The dynamic nature of the data and its sheer size makes it impossible to use the traditional notion of offline learning where the whole data is accessible at any time point. Moreover, no amount of human resources is enough to get expert feedback on the data. In this work we have developed a unified optimization based learning framework that approaches many of the challenges mentioned earlier. Specifically, we developed a Never-Ending Learning framework which combines incremental/online, semi-supervised, and active learning under a unified optimization framework. The established framework is based on the class of submodular optimization methods. At the core of this work we provide a novel formulation of the Semi-Supervised Support Vector Machines (S3VM) in terms of submodular set functions. The new formulation overcomes the non-convexity issues of the S3VM and provides a state of the art solution that is orders of magnitude faster than the cutting edge algorithms in the literature. Next, we provide a stream summarization technique via exemplar selection. This technique makes it possible to keep a fixed size exemplar representation of a data stream that can be used by any label propagation based semi-supervised learning technique. The compact data steam representation allows a wide range of algorithms to be extended to incremental/online learning scenario. Under the same optimization framework, we provide an active learning algorithm that constitute the feedback between the learning machine and an oracle. Finally, the developed Never-Ending Learning framework is essentially transductive in nature. Therefore, our last contribution is an inductive incremental learning technique for incremental training of SVM using the properties of local kernels. We demonstrated through this work the importance and wide applicability of the proposed methodologies

    General supervision via probabilistic transformations

    Get PDF
    Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training samples. This paper presents a unifying framework for supervised classification with general ensembles of training samples, and proposes the learning methodology of generalized robust risk minimization (GRRM). The paper shows how current and novel supervision schemes can be addressed under the proposed framework by representing the relationship between examples at prediction and training via probabilistic transformations. The results show that GRRM can handle different types of training samples in a unified manner, and enable new supervision schemes that aggregate general ensembles of training samples.RYC-2016-1938

    Towards multi-modal face recognition in the wild

    Get PDF
    Face recognition aims at utilizing the facial appearance for the identification or verification of human individuals, and has been one of the fundamental research areas in computer vision. Over the past a few decades, face recognition has drawn significant attention due to its potential use in biometric authentication, surveillance, security, robotics and so on. Many existing face recognition methods are evaluated with faces collected in labs, and does not generalize well in reality. Compared with faces captured in labs, faces in the wild are inherently multi-modal distributed. The multi-modality issue leads to significant intra-class variations, and usually requires a large amount of labeled samples to cover the wide range of modalities. These difficulties make unconstrained face recognition even more challenging, and pose a considerable gap between laboratorial research and industrial practice. To bridge the gap, we set focus on multi-modal face recognition in the unconstrained environment in this thesis. This thesis introduces several approaches to address the aforementioned specific challenges. Accordingly, the approaches included can be generally categorized into two research directions. The first direction explores a series of deep learning based methods in handling the large intra-class variations in multi-modal face recognition. The combination of modalities in the wild is unpredictable, and thus is difficult to explicitly define in advance. It is desirable to design a framework adaptive to the modality-driven variations in the specific scenarios. To this end, Deep Neural Network (DNN) is adopted as the basis, as DNN learns the feature representation and the classifier with reference to the specific target objective directly. To begin with, we aims to learn a part-based facial representation with deep neural networks to address face verification in the wild. In particular, the proposed framework consists of two deliberate components: a Deep Mixture Model (DMM) to find accurate patch correspondence and a Convolutional Fusion Network (CFN) to learn the fusion of multiple patch-specific facial features. This framework is specifically designed to handle local distortions caused by modalities such as pose and illumination. The next work introduces the conditional partition of the sample space into deep learning to tackle face recognition with regard to modalities in a general sense. Without any prior knowledge of modality, the proposed network learns the hidden modalities of faces, based on which the initial sample space is partitioned so that modality-specific feature representation can be learnt accordingly. The other direction is Semi-Supervised Learning with videos to tackle the deficiency of labeled training samples. In particular, a novel Semi-Supervised Learning strategy is proposed for the problem of celebrity identification by harvesting the “confident” unlabeled samples from the vast video sources. The video context information is adopted to iteratively enrich the diversity of the initial labeled set so that the performance of learnt classifier can be gradually improved. In this thesis, all these works are evaluated with extensive experiments in the corresponding sections. The connection and difference among the three approaches are further discussed in the conclusion section.Open Acces

    Large-scale inference in the focally damaged human brain

    Get PDF
    Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings

    Transfer learning from multiple source domains via consensus regularization

    Full text link
    Recent years have witnessed an increased interest in trans-fer learning. Despite the vast amount of research performed in this field, there are remaining challenges in applying the knowledge learnt from multiple source domains to a target domain. First, data from multiple source domains can be se-mantically related, but have different distributions. It is not clear how to exploit the distribution differences among mul-tiple source domains to boost the learning performance in a target domain. Second, many real-world applications de-mand this transfer learning to be performed in a distributed manner. To meet these challenges, we propose a consensus regularization framework for transfer learning from multi-ple source domains to a target domain. In this framework, a local classifier is trained by considering both local data available in a source domain and the prediction consensus with the classifiers from other source domains. In addition, the training algorithm can be implemented in a distributed manner, in which all the source-domains are treated as slave nodes and the target domain is used as the master node. To combine the training results from multiple source domains, it only needs share some statistical data rather than the full contents of their labeled data. This can modestly relieve the privacy concerns and avoid the need to upload all data to a central location. Finally, our experimental results show the effectiveness of our consensus regularization learning
    corecore