24,552 research outputs found
Clustering-based approaches to SAGE data mining
Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation
Dynamic load balancing in parallel KD-tree k-means
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis.
Techniques for improving the efficiency of k-Means have been
largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing
issue. Three solutions have been developed and tested. Two
approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
On the discovery of social roles in large scale social systems
The social role of a participant in a social system is a label
conceptualizing the circumstances under which she interacts within it. They may
be used as a theoretical tool that explains why and how users participate in an
online social system. Social role analysis also serves practical purposes, such
as reducing the structure of complex systems to rela- tionships among roles
rather than alters, and enabling a comparison of social systems that emerge in
similar contexts. This article presents a data-driven approach for the
discovery of social roles in large scale social systems. Motivated by an
analysis of the present art, the method discovers roles by the conditional
triad censuses of user ego-networks, which is a promising tool because they
capture the degree to which basic social forces push upon a user to interact
with others. Clusters of censuses, inferred from samples of large scale network
carefully chosen to preserve local structural prop- erties, define the social
roles. The promise of the method is demonstrated by discussing and discovering
the roles that emerge in both Facebook and Wikipedia. The article con- cludes
with a discussion of the challenges and future opportunities in the discovery
of social roles in large social systems
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
- …