163 research outputs found

    The Emerging Trends of Multi-Label Learning

    Full text link
    Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202

    ๊ฐ•์ธํ•œ ์ €์ฐจ์› ๊ณต๊ฐ„์˜ ํ•™์Šต๊ณผ ๋ถ„๋ฅ˜: ํฌ์†Œ ๋ฐ ์ €๊ณ„์ˆ˜ ํ‘œํ˜„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์˜ค์„ฑํšŒ.Learning a subspace structure based on sparse or low-rank representation has gained much attention and has been widely used over the past decade in machine learning, signal processing, computer vision, and robotic literatures to model a wide range of natural phenomena. Sparse representation is a powerful tool for high-dimensional data such as images, where the goal is to represent or compress the cumbersome data using a few representative samples. Low-rank representation is a generalization of the sparse representation in 2D space. Behind the successful outcomes, many efforts have been made for learning sparse or low-rank representation effciently. However, they are still ineffcient for complex data structures and lack robustness under the existence of various noises including outliers and missing data, because many existing algorithms relax the ideal optimization problem to a tractable one without considering computational and memory complexities. Thus, it is important to use a good representation algorithm which is effciently solvable and robust against unwanted corruptions. In this dissertation, our main goal is to learn algorithms with both robustness and effciency under noisy environments. As for sparse representation, most of the optimization problems are relaxed to convex ones based on surrogate measures, such as the l1-norm, to resolve the computational intractability and high noise sensitivity of the original sparse representation problem based on the l0-norm. However, if the system at interest, other than the sparsity measure, is inherently nonconvex, then using a convex sparsity measure may not be the best choice for the problems. From this perspective, we propose desirable criteria to be a good nonconvex sparsity measure and suggest a corresponding family of measure. The proposed family of measures allows a simple measure, which enables effcient computation and embraces the benefits of both l0- and l1-norms, and most importantly, its gradient vanishes slowly unlike the l0-norm, which is suitable from an optimization perspective. For low-rank representation, we first present an effcient l1-norm based low-rank matrix approximation algorithm using the proposed alternating rectified gradient methods to solve an l1-norm minimization problem, since conventional algorithms are very slow to solve the l1-norm based alternating minimization problem. The proposed methods try to find an optimal direction with a proper constraint which limits the search domain to avoid the diffculty that arises from the ambiguity in representing the two optimization variables. It is extended to an algorithm with an explicit smoothness regularizer and an orthogonality constraint for better effciency and solve it under the augmented Lagrangian framework. To give more stable solution with flexible rank estimation in the presence of heavy corruptions, we present a new solution based on the elastic-net regularization of singular values, which allows a faster algorithm than existing rank minimization methods without any heavy operations and is more stable than the state-of-the-art low-rank approximation algorithms due to its strong convexity. As a result, the proposed method leads to a holistic approach which enables both rank minimization and bilinear factorization. Moreover, as an extension to the previous methods performing on an unstructured matrix, we apply recent advances in rank minimization to a structured matrix for robust kernel subspace estimation under noisy scenarios. Lastly, but not least, we extend a low-rank approximation problem, which assumes a single subspace, to a problem which lies in a union of multiple subspaces, which is closely related to subspace clustering. While many recent studies are based on sparse or low-rank representation, the grouping effect among similar samples has not been often considered with the sparse or low-rank representation. Thus, we propose a robust group subspace clustering lgorithms based on sparse and low-rank representation with explicit subspace grouping. To resolve the fundamental issue on computational complexity of existing subspace clustering algorithms, we suggest a full scalable low-rank subspace clustering approach, which achieves linear complexity in the number of samples. Extensive experimental results on various applications, including computer vision and robotics, using benchmark and real-world data sets verify that our suggested solutions to the existing issues on sparse and low-rank representations are considerably robust, effective, and practically applicable.1 Introduction 1 1.1 Main Challenges 4 1.2 Organization of the Dissertation 6 2 Related Work 11 2.1 Sparse Representation 11 2.2 Low-Rank Representation 14 2.2.1 Low-rank matrix approximation 14 2.2.2 Robust principal component analysis 17 2.3 Subspace Clustering 18 2.3.1 Sparse subspace clustering 18 2.3.2 Low-rank subspace clustering 20 2.3.3 Scalable subspace clustering 20 2.4 Gaussian Process Regression 21 3 Effcient Nonconvex Sparse Representation 25 3.1 Analysis of the l0-norm approximation 26 3.1.1 Notations 26 3.1.2 Desirable criteria for a nonconvex measure 27 3.1.3 A representative family of measures: SVG 29 3.2 The Proposed Nonconvex Sparsity Measure 32 3.2.1 Choosing a simple one among the SVG family 32 3.2.2 Relationships with other sparsity measures 34 3.2.3 More analysis on SVG 36 3.2.4 Learning sparse representations via SVG 38 3.3 Experimental Results 40 3.3.1 Evaluation for nonconvex sparsity measures 41 3.3.2 Low-rank approximation of matrices 42 3.3.3 Sparse coding 44 3.3.4 Subspace clustering 46 3.3.5 Parameter Analysis 49 3.4 Summary 51 4 Robust Fixed Low-Rank Representations 53 4.1 The Alternating Rectified Gradient Method for l1 Minimization 54 4.1.1 l1-ARGA as an approximation method 54 4.1.2 l1-ARGD as a dual method 65 4.1.3 Experimental results 74 4.2 Smooth Regularized Fixed-Rank Representation 88 4.2.1 Robust orthogonal matrix factorization (ROMF) 89 4.2.2 Rank estimation for ROMF (ROMF-RE) 95 4.2.3 Experimental results 98 4.3 Structured Low-Rank Representation 114 4.3.1 Kernel subspace learning 115 4.3.2 Structured kernel subspace learning in GPR 119 4.3.3 Experimental results 125 4.4 Summary 133 5 Robust Lower-Rank Subspace Representations 135 5.1 Elastic-Net Subspace Representation 136 5.2 Robust Elastic-Net Subspace Learning 140 5.2.1 Problem formulation 140 5.2.2 Algorithm: FactEN 145 5.3 Joint Subspace Estimation and Clustering 151 5.3.1 Problem formulation 151 5.3.2 Algorithm: ClustEN 152 5.4 Experiments 156 5.4.1 Subspace learning problems 157 5.4.2 Subspace clustering problems 167 5.5 Summary 174 6 Robust Group Subspace Representations 175 6.1 Group Subspace Representation 176 6.2 Group Sparse Representation (GSR) 180 6.2.1 GSR with noisy data 180 6.2.2 GSR with corrupted data 181 6.3 Group Low-Rank Representation (GLR) 184 6.3.1 GLR with noisy or corrupted data 184 6.4 Experimental Results 187 6.5 Summary 197 7 Scalable Low-Rank Subspace Clustering 199 7.1 Incremental Affnity Representation 201 7.2 End-to-End Scalable Subspace Clustering 205 7.2.1 Robust incremental summary representation 205 7.2.2 Effcient affnity construction 207 7.2.3 An end-to-end scalable learning pipeline 210 7.2.4 Nonlinear extension for SLR 213 7.3 Experimental Results 215 7.3.1 Synthetic data 216 7.3.2 Motion segmentation 219 7.3.3 Face clustering 220 7.3.4 Handwritten digits clustering 222 7.3.5 Action clustering 224 7.4 Summary 227 8 Conclusion and Future Work 229 Appendices 233 A Derivations of the LRA Problems 235 B Proof of Lemma 1 237 C Proof of Proposition 1 239 D Proof of Theorem 1 241 E Proof of Theorem 2 247 F Proof of Theorems in Chapter 6 251 F.1 Proof of Theorem 3 251 F.2 Proof of Theorem 4 252 F.3 Proof of Theorem 5 253 G Proof of Theorems in Chapter 7 255 G.1 Proof of Theorem 6 255 G.2 Proof of Theorem 7 256 Bibliography 259 ์ดˆ๋ก 275Docto

    Simultaneous subspace clustering and cluster number estimating based on triplet relationship

    Get PDF
    In this paper we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, stateof- the-art subspace clustering approaches often optimize a selfrepresentation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method

    A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

    Full text link
    Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, auto-regressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as dialogue generation, text summarization, grammar error correction, semantic parsing, speech synthesis, and automatic speech recognition. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, dynamic length prediction, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.Comment: 25 pages, 11 figures, 4 table

    Collaborative-demographic hybrid for financial: product recommendation

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsDue to the increased availability of mature data mining and analysis technologies supporting CRM processes, several financial institutions are striving to leverage customer data and integrate insights regarding customer behaviour, needs, and preferences into their marketing approach. As decision support systems assisting marketing and commercial efforts, Recommender Systems applied to the financial domain have been gaining increased attention. This thesis studies a Collaborative- Demographic Hybrid Recommendation System, applied to the financial services sector, based on real data provided by a Portuguese private commercial bank. This work establishes a framework to support account managersโ€™ advice on which financial product is most suitable for each of the bankโ€™s corporate clients. The recommendation problem is further developed by conducting a performance comparison for both multi-output regression and multiclass classification prediction approaches. Experimental results indicate that multiclass architectures are better suited for the prediction task, outperforming alternative multi-output regression models on the evaluation metrics considered. Withal, multiclass Feed-Forward Neural Networks, combined with Recursive Feature Elimination, is identified as the topperforming algorithm, yielding a 10-fold cross-validated F1 Measure of 83.16%, and achieving corresponding values of Precision and Recall of 84.34%, and 85.29%, respectively. Overall, this study provides important contributions for positioning the bankโ€™s commercial efforts around customersโ€™ future requirements. By allowing for a better understanding of customersโ€™ needs and preferences, the proposed Recommender allows for more personalized and targeted marketing contacts, leading to higher conversion rates, corporate profitability, and customer satisfaction and loyalty

    ๊ฐœ์ธํ™” ๊ฒ€์ƒ‰ ๋ฐ ํŒŒํŠธ๋„ˆ์‰ฝ ์„ ์ •์„ ์œ„ํ•œ ์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์น˜์˜๊ณผํ•™๊ณผ, 2014. 2. ๊น€ํ™๊ธฐ.The secret of change is to focus all of your energy not on fighting the old, but on building the new. - Socrates The automatic identification of user intention is an important but highly challenging research problem whose solution can greatly benefit information systems. In this thesis, I look at the problem of identifying sources of user interests, extracting latent semantics from it, and modelling it as a user profile. I present algorithms that automatically infer user interests and extract hidden semantics from it, specifically aimed at improving personalized search. I also present a methodology to model user profile as a buyer profile or a seller profile, where the attributes of the profile are populated from a controlled vocabulary. The buyer profiles and seller profiles are used in partnership match. In the domain of personalized search, first, a novel method to construct a profile of user interests is proposed which is based on mining anchor text. Second, two methods are proposed to builder a user profile that gather terms from a folksonomy system where matrix factorization technique is explored to discover hidden relationship between them. The objective of the methods is to discover latent relationship between terms such that contextually, semantically, and syntactically related terms could be grouped together, thus disambiguating the context of term usage. The profile of user interests is also analysed to judge its clustering tendency and clustering accuracy. Extensive evaluation indicates that a profile of user interests, that can correctly or precisely disambiguate the context of user query, has a significant impact on the personalized search quality. In the domain of partnership match, an ontology termed as partnership ontology is proposed. The attributes or concepts, in the partnership ontology, are features representing context of work. It is used by users to lay down their requirements as buyer profiles or seller profiles. A semantic similarity measure is defined to compute a ranked list of matching seller profiles for a given buyer profile.1 Introduction 1 1.1 User Profiling for Personalized Search . . . . . . . . 9 1.1.1 Motivation . . . . . . . . . . . . . . . . . . . 10 1.1.2 Research Problems . . . . . . . . . . . . . . 11 1.2 User Profiling for Partnership Match . . . . . . . . 18 1.2.1 Motivation . . . . . . . . . . . . . . . . . . . 19 1.2.2 Research Problems . . . . . . . . . . . . . . 24 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . 25 1.4 System Architecture - Personalized Search . . . . . 29 1.5 System Architecture - Partnership Match . . . . . . 31 1.6 Organization of this Dissertation . . . . . . . . . . 32 2 Background 35 2.1 Introduction to Social Web . . . . . . . . . . . . . . 35 2.2 Matrix Decomposition Methods . . . . . . . . . . . 40 2.3 User Interest Profile For Personalized Web Search Non Folksonomy based . . . . . . . . . . . . . . . . 43 2.4 User Interest Profile for Personalized Web Search Folksonomy based . . . . . . . . . . . . . . . . . . . 45 2.5 Personalized Search . . . . . . . . . . . . . . . . . . 47 2.6 Partnership Match . . . . . . . . . . . . . . . . . . 52 3 Mining anchor text for building User Interest Profile: A non-folksonomy based personalized search 56 3.1 Exclusively Yours' . . . . . . . . . . . . . . . . . . . 59 3.1.1 Infer User Interests . . . . . . . . . . . . . . 61 3.1.2 Weight Computation . . . . . . . . . . . . . 64 3.1.3 Query Expansion . . . . . . . . . . . . . . . 67 3.2 Exclusively Yours' Algorithm . . . . . . . . . . . . 68 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1 DataSet . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Evaluation Metrics . . . . . . . . . . . . . . 73 3.3.3 User Profile Efficacy . . . . . . . . . . . . . 74 3.3.4 Personalized vs. Non-Personalized Results . 76 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . 80 4 Matrix factorization for building Clustered User Interest Profile: A folksonomy based personalized search 82 4.1 Aggregating tags from user search history . . . . . 86 4.2 Latent Semantics in UIP . . . . . . . . . . . . . . . 90 4.2.1 Computing the tag-tag Similarity matrix . . 90 4.2.2 Tag Clustering to generate svdCUIP and modSvdCUIP 98 4.3 Personalized Search . . . . . . . . . . . . . . . . . . 101 4.4 Experimental Evaluation . . . . . . . . . . . . . . . 103 4.4.1 Data Set and Experiment Methodology . . . 103 4.4.1.1 Custom Data Set and Evaluation Metrics . . . . . . . . . . . . . . . 103 4.4.1.2 AOL Query Data Set and Evaluation Metrics . . . . . . . . . . . . . 107 4.4.1.3 Experiment set up to estimate the value of k and d . . . . . . . . . . 107 4.4.1.4 Experiment set up to compare the proposed approaches with other approaches . . . . . . . . . . . . . . . 109 4.4.2 Experiment Results . . . . . . . . . . . . . . 111 4.4.2.1 Clustering Tendency . . . . . . . . 111 4.4.2.2 Determining the value for dimension parameter, k, for the Custom Data Set . . . . . . . . . . . . . . . 113 4.4.2.3 Determining the value of distinctness parameter, d, for the Custom data set . . . . . . . . . . . . . . . 115 4.4.2.4 CUIP visualization . . . . . . . . . 117 4.4.2.5 Determining the value of the dimension reduction parameter k for the AOL data set. . . . . . . . . . . . 119 4.4.2.6 Determining the value of distinctness parameter, d, for the AOL data set . . . . . . . . . . . . . . . . . . 120 4.4.2.7 Time to generate svdCUIP and modSvd-CUIP . . . . . . . . . . . . . . . . 122 4.4.2.8 Comparison of the svdCUIP, modSvd-CUIP, and tfIdfCUIP for different classes of queries . . . . . . . . . . 123 4.4.2.9 Comparing all five methods - Improvement . . . . . . . . . . . . . . 124 4.4.3 Discussion . . . . . . . . . . . . . . . . . . . 126 5 User Profiling for Partnership Match 133 5.1 Supplier Selection . . . . . . . . . . . . . . . . . . . 137 5.2 Criteria for Partnership Establishment . . . . . . . 140 5.3 Partnership Ontology . . . . . . . . . . . . . . . . . 143 5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . 147 5.4.1 Buyer Profile and Seller Profile . . . . . . . 153 5.4.2 Semantic Similarity Measure . . . . . . . . . 155 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 160 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . 162 6 Conclusion 164 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . 167 6.1.1 Degree of Personalization . . . . . . . . . . . 167 6.1.2 Filter Bubble . . . . . . . . . . . . . . . . . 168 6.1.3 IPR issues in Partnership Match . . . . . . . 169 Bibliography 170 Appendices 193 .1 Pairs of Query and target URL . . . . . . . . . . . 194 .2 Examples of Expanded Queries . . . . . . . . . . . 197 .3 An example of svdCUIP, modSvdCUIP, tfIdfCUIP 198Docto

    Probabilistic Inference in Piecewise Graphical Models

    No full text
    In many applications of probabilistic inference the models contain piecewise densities that are differentiable except at partition boundaries. For instance, (1) some models may intrinsically have finite support, being constrained to some regions; (2) arbitrary density functions may be approximated by mixtures of piecewise functions such as piecewise polynomials or piecewise exponentials; (3) distributions derived from other distributions (via random variable transformations) may be highly piecewise; (4) in applications of Bayesian inference such as Bayesian discrete classification and preference learning, the likelihood functions may be piecewise; (5) context-specific conditional probability density functions (tree-CPDs) are intrinsically piecewise; (6) influence diagrams (generalizations of Bayesian networks in which along with probabilistic inference, decision making problems are modeled) are in many applications piecewise; (7) in probabilistic programming, conditional statements lead to piecewise models. As we will show, exact inference on piecewise models is not often scalable (if applicable) and the performance of the existing approximate inference techniques on such models is usually quite poor. This thesis fills this gap by presenting scalable and accurate algorithms for inference in piecewise probabilistic graphical models. Our first contribution is to present a variation of Gibbs sampling algorithm that achieves an exponential sampling speedup on a large class of models (including Bayesian models with piecewise likelihood functions). As a second contribution, we show that for a large range of models, the time-consuming Gibbs sampling computations that are traditionally carried out per sample, can be computed symbolically, once and prior to the sampling process. Among many potential applications, the resulting symbolic Gibbs sampler can be used for fully automated reasoning in the presence of deterministic constraints among random variables. As a third contribution, we are motivated by the behavior of Hamiltonian dynamics in optics โ€”in particular, the reflection and refraction of light on the refractive surfacesโ€” to present a new Hamiltonian Monte Carlo method that demonstrates a significantly improved performance on piecewise models. Hopefully, the present work represents a step towards scalable and accurate inference in an important class of probabilistic models that has largely been overlooked in the literature

    Mathematical and statistical methods for single cell data

    Get PDF
    The availability of single-cell data has increased rapidly in recent years and presents interesting new challenges in the analysis of such data and the modelling of the processes that generate it. In this thesis, we attempt to deal with some of those challenges by developing and exploring mathematical and statistical models for the evolution of population distributions over time, and methods for using aggregated single-cell data from individual patients in predictive diagnostic models of disease. In the first part of the thesis, we explore structured population models โ€“ a class of partial differential equations for describing the evolution of individual-level cell properties in a population over time. We begin by analysing an age-structured model of cell growth in which rates of proliferation and cell death are controlled by an external resource. We follow this with a method for extracting properties of a more general class of structured population models directly from single-cell data. In the final part of the thesis, we develop a flexible Bayesian statistical framework for building predictive models from possibly high-dimensional data collected from patients using single-cell technologies and find that the performance is promising compared to a number of existing methods
    • โ€ฆ
    corecore