2 research outputs found
Co-clustering of Fuzzy Lagged Data
The paper focuses on mining patterns that are characterized by a fuzzy lagged
relationship between the data objects forming them. Such a regulatory mechanism
is quite common in real life settings. It appears in a variety of fields:
finance, gene expression, neuroscience, crowds and collective movements are but
a limited list of examples. Mining such patterns not only helps in
understanding the relationship between objects in the domain, but assists in
forecasting their future behavior. For most interesting variants of this
problem, finding an optimal fuzzy lagged co-cluster is an NP-complete problem.
We thus present a polynomial-time Monte-Carlo approximation algorithm for
mining fuzzy lagged co-clusters. We prove that for any data matrix, the
algorithm mines a fuzzy lagged co-cluster with fixed probability, which
encompasses the optimal fuzzy lagged co-cluster by a maximum 2 ratio columns
overhead and completely no rows overhead. Moreover, the algorithm handles
noise, anti-correlations, missing values and overlapping patterns. The
algorithm was extensively evaluated using both artificial and real datasets.
The results not only corroborate the ability of the algorithm to efficiently
mine relevant and accurate fuzzy lagged co-clusters, but also illustrate the
importance of including the fuzziness in the lagged-pattern model.Comment: Under consideration for publication in Knowledge and Information
Systems. The final publication is available at Springer via
http://dx.doi.org/10.1007/s10115-014-0758-
Mutual Clustering on Comparative Texts via Heterogeneous Information Networks
Currently, many intelligence systems contain the texts from multi-sources,
e.g., bulletin board system (BBS) posts, tweets and news. These texts can be
``comparative'' since they may be semantically correlated and thus provide us
with different perspectives toward the same topics or events. To better
organize the multi-sourced texts and obtain more comprehensive knowledge, we
propose to study the novel problem of Mutual Clustering on Comparative Texts
(MCCT), which aims to cluster the comparative texts simultaneously and
collaboratively. The MCCT problem is difficult to address because 1)
comparative texts usually present different data formats and structures and
thus they are hard to organize, and 2) there lacks an effective method to
connect the semantically correlated comparative texts to facilitate clustering
them in an unified way. To this aim, in this paper we propose a Heterogeneous
Information Network-based Text clustering framework HINT. HINT first models
multi-sourced texts (e.g. news and tweets) as heterogeneous information
networks by introducing the shared ``anchor texts'' to connect the comparative
texts. Next, two similarity matrices based on HINT as well as a transition
matrix for cross-text-source knowledge transfer are constructed. Comparative
texts clustering are then conducted by utilizing the constructed matrices.
Finally, a mutual clustering algorithm is also proposed to further unify the
separate clustering results of the comparative texts by introducing a
clustering consistency constraint. We conduct extensive experimental on three
tweets-news datasets, and the results demonstrate the effectiveness and
robustness of the proposed method in addressing the MCCT problem