Search CORE

24,552 research outputs found

Clustering-based approaches to SAGE data mining

Author: C Keime
D Porter
F Rioult
Francisco Azuaje
GM Boratyn
H Chen
H Thygesen
H Wang
H Wang
H Wang
H Zheng
Haiying Wang
Huiru Zheng
I Mechaly
J Handl
J Lu
J Sander
J Stollberg
JB Vos
JM Ruijter
K Kim
KA Baggerly
KA Baggerly
L Cai
MA El-Meanawy
MA Gilchrist
MB Eisen
MC Abba
MZ Man
N Bolshakova
P Buckhaults
P Divina
P Tamayo
RT Ng
RZ Vêncio
S Audic
S Blackshaw
S Mclntosh
S Saha
SD Zuyderduyn
T Beißbarth
T Chu
T Kohonen
T Lee
VE Velculescu
VR Akmaev
VR Akmaev
W Chan
W Yasui
WD Patino
X Jin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dynamic load balancing in parallel KD-tree k-means

Author: Di Fatta Giuseppe
Pettinger David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/06/2010
Field of study

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

Central Archive at the University of Reading

Crossref

On the discovery of social roles in large scale social systems

Author: Doran Derek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2015
Field of study

The social role of a participant in a social system is a label conceptualizing the circumstances under which she interacts within it. They may be used as a theoretical tool that explains why and how users participate in an online social system. Social role analysis also serves practical purposes, such as reducing the structure of complex systems to rela- tionships among roles rather than alters, and enabling a comparison of social systems that emerge in similar contexts. This article presents a data-driven approach for the discovery of social roles in large scale social systems. Motivated by an analysis of the present art, the method discovers roles by the conditional triad censuses of user ego-networks, which is a promising tool because they capture the degree to which basic social forces push upon a user to interact with others. Clusters of censuses, inferred from samples of large scale network carefully chosen to preserve local structural prop- erties, define the social roles. The promise of the method is demonstrated by discussing and discovering the roles that emerge in both Facebook and Wikipedia. The article con- cludes with a discussion of the challenges and future opportunities in the discovery of social roles in large social systems

arXiv.org e-Print Archive

CORE

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato