Search CORE

5 research outputs found

Clustering in Aggregated User Profiles Across Multiple Social Networks

Author: Juneja Dimple
Pillai Anuradha
Virmani Charu
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2017
Field of study

A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble K-means clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest

IAES journal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

EC3: Combining Clustering and Classification for Ensemble Learning

Author: Chakraborty Tanmoy
Publication venue
Publication date: 29/08/2017
Field of study

Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method.Comment: 14 pages, 7 figures, 11 table

arXiv.org e-Print Archive

Crossref

Guaranteed clustering and biclustering via semidefinite programming

Author: A Ng
B Ames
B Recht
B Recht
Brendan P. W. Ames
D Aloise
D Donoho
D Gross
E Berg Van Den
E Birgin
E Candès
E Candès
E Candès
G Van Golub
J Peng
K Rohe
L Tunçel
R Kannan
R Shamir
RT Rockafellar
S Balakrishnan
S Boyd
S Boyd
S Busygin
S Geman
V Singh
W Hoeffding
Z Füredi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation for the biclustering problem with similar recovery guarantees. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k-disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided

arXiv.org e-Print Archive

CiteSeerX

Crossref

Caltech Authors

Optimized classification predictions with a new index combining machine learning algorithms

Author: Anagnostopoulos Christos-Nikolaos
Niros Antonios D.
Spatharis Sofie
Tamvakis Androniki
Tsirtsis George
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/05/2018
Field of study

Voting is a commonly used ensemble method aiming to optimize classification predictions by combining results from individual base classifiers. However, the selection of appropriate classifiers to participate in voting algorithm is currently an open issue. In this study we developed a novel Dissimilarity-Performance (DP) index which incorporates two important criteria for the selection of base classifiers to participate in voting: their differential response in classification (dissimilarity) when combined in triads and their individual performance. To develop this empirical index we firstly used a range of different datasets to evaluate the relationship between voting results and measures of dissimilarity among classifiers of different types (rules, trees, lazy classifiers, functions and Bayes). Secondly, we computed the combined effect on voting performance of classifiers with different individual performance and/or diverse results in the voting performance. Our DP index was able to rank the classifier combinations according to their voting performance and thus to suggest the optimal combination. The proposed index is recommended for individual machine learning users as a preliminary tool to identify which classifiers to combine in order to achieve more accurate classification predictions avoiding computer intensive and time-consuming search

Crossref

Enlighten

Ensemble clustering using semidefinite programming with applications

Author: A. D. Gordon
A. Strehl
C. Liu
J. F. Sturm
J. Peng
J. Shi
Jiming Peng
Jinhui Xu
K. C. Toh
L. Vandenberghe
L. Xu
Lopamudra Mukherjee
M. Charikar
M. Turk
M. X. Goemans
P. F. Felzenszwalb
S. K. Warfield
S. Monti
S. S. Vempala
T. Liu
T. Rohlfing
V. Perlibakas
Vikas Singh
W. B. Johnson
Y. Boykov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref