7,903 research outputs found

    Multi-view constrained clustering with an incomplete mapping between views

    Full text link
    Multi-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios

    Efficient Mining of Heterogeneous Star-Structured Data

    Get PDF
    Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

    EC3: Combining Clustering and Classification for Ensemble Learning

    Full text link
    Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method.Comment: 14 pages, 7 figures, 11 table

    Clustering and Community Detection in Directed Networks: A Survey

    Full text link
    Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
    • …
    corecore