Search CORE

20,529 research outputs found

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

A critical cluster analysis of 44 indicators of author-level performance

Author: Wildgaard Lorna
Publication venue: 'Elsevier BV'
Publication date: 18/05/2015
Field of study

This paper explores the relationship between author-level bibliometric indicators and the researchers the "measure", exemplified across five academic seniorities and four disciplines. Using cluster methodology, the disciplinary and seniority appropriateness of author-level indicators is examined. Publication and citation data for 741 researchers across Astronomy, Environmental Science, Philosophy and Public Health was collected in Web of Science (WoS). Forty-four indicators of individual performance were computed using the data. A two-step cluster analysis using IBM SPSS version 22 was performed, followed by a risk analysis and ordinal logistic regression to explore cluster membership. Indicator scores were contextualized using the individual researcher's curriculum vitae. Four different clusters based on indicator scores ranked researchers as low, middle, high and extremely high performers. The results show that different indicators were appropriate in demarcating ranked performance in different disciplines. In Astronomy the h2 indicator, sum pp top prop in Environmental Science, Q2 in Philosophy and e-index in Public Health. The regression and odds analysis showed individual level indicator scores were primarily dependent on the number of years since the researcher's first publication registered in WoS, number of publications and number of citations. Seniority classification was secondary therefore no seniority appropriate indicators were confidently identified. Cluster methodology proved useful in identifying disciplinary appropriate indicators providing the preliminary data preparation was thorough but needed to be supplemented by other analyses to validate the results. A general disconnection between the performance of the researcher on their curriculum vitae and the performance of the researcher based on bibliometric indicators was observed.Comment: 28 pages, 7 tables, 2 figures, 2 appendice

arXiv.org e-Print Archive

Copenhagen University Research Information System

Optimal Data Split Methodology for Model Validation

Author: Bryant Corey
Miki Kenji
Morrison Rebecca
Prudhomme Serge
Terejanu Gabriel
Publication venue
Publication date: 01/01/2011
Field of study

The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision maker's requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).Comment: Submitted to International Conference on Modeling, Simulation and Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 201

arXiv.org e-Print Archive

CiteSeerX

Clustering Algorithms: Their Application to Gene Expression Data

Author: Agrawal R.
Alizadeh A.A.
Bandyopadhyay S.
Bandyopadhyay S.
Bezdek J.C.
Bezdek J.C.
Bezdek† J.C.
Bhargavi M.S.
Blatt M.
Bochkov Y.A.
Brunet J.P.
Bryan K.
Buitinck L.
Bunnik E.M.
Caliński T.
Chandrasekhar T.
Cheng Y.
Costa I.G.
Cover T.M.
D'haeseleer P.
Dave R.N.
Davies D.L.
De Morsier F.
Dempster A.P.
Dharmarajan A.
Dhillon I.S.
Divina F.
Do C.B.
Domany E.
Du Z.
Dunn† J.C.
Edla D.R.
Eisen M.B.
Ferguson T.S.
Frey B.J.
Fu L.
Fukuyama Y.
Galluccio L.
Gath I.
Getz G.
Gordon G.J.
Gu J.
Guha S.
Handhayani T.
Handl J.
Hatamlou A.
Heard N.A.
Heyer L.J.
Hinneburg A.
Hinneburg A.
Hu X.
Hubert L.J.
Jain A.K.
Jiang D.
Jiang H.
Joopudi S.
Kao Y.T.
Karmilasari S.W.
Karypis G.
Kaufman L.
Kerr G.
Kluger Y.
Kohonen T.
Kohonen T.
Krzanowski W.J.
Leone M.
Lu Y.
Lu Y.
Ma'sum M.A.
MacQueen J.
Madeira S.C.
Mann A.K.
Masciari E.
Maulik U.
Milligan G.W.
Mitra S.
Moon T.K.
Moore W.C.
Müllner D.
Nagpal A.
Nasser S.
Neal R.M.
Ng R.T.
Pakhira M.K.
Pal N.R.
Pedregosa F.
Pirim H.
Pitman J.
Prelić A.
Qin Z.S.
Raman S.
Rasmussen C.E.
Rezaee B.
Rezaee M.R.
Ruspini E.H.
Saha S.
Saha S.
Saha S.
Sathishkumar K.
Sheikholeslami G.
Sheng Q.
Sirinukunwattana K.
Sokal R.R.
Sun J.
Talaat A.M.
Tamayo P.
Tanay A.
Tang C.
Thalamuthu A.
Tibshirani R.
Wan M.
Wang L.
Wang W.
Williams G.
Wu J.
Wu K.L.
Wu S.
Xie X.L.
Xu R.
Xu Y.
Yu H.
Zhang D.
Zhang T.
Zhang Y.
Zhang Z.Y.
Zhao L.
Zhong C.
Zitnik M.
Řehůřek R.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

Covenant University Repository

Crossref

Directory of Open Access Journals

PubMed Central

Creating successful collaborative relationships

Author: Vanpoucke Evelyne
Vereecke Ann
Publication venue: Universiteit Gent. Faculteit Economie en Bedrijfskunde
Publication date: 01/01/2007
Field of study

Vlerick Repository

Ghent University Academic Bibliography

Archivsystem Ask23

A validation of the Oswestry Spinal Risk Index

Author: A Leithner
A. Tambe
B Balain
C Wibmer
CS Lee
DA Karnofsky
HC Bauer
I Laufer
Irfan Siddique
J. Gregory
J. Stephenson
K Tomita
Mohammad Saeed
NA Quraishi
P Paholpak
PS Rose
R. Verma
S Yang
S. Whitehouse
V. Sinclair
Y Tokuhashi
Y Tokuhashi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Purpose The purpose of this study was to validate the Oswestry Spinal Risk Index (OSRI) in an external population. The OSRI predicts survival in patients with metastatic spinal cord compression (MSCC). Methods We analysed the data of 100 patients undergoing surgical intervention for MSCC at a tertiary spinal unit and recorded the primary tumour pathology and Karnofsky performance status to calculate the OSRI. Logistic regression models and survival plots were applied to the data in accordance with the original paper. Results Lower OSRI scores predicted longer survival. The OSRI score predicted survival accurately in 74% of cases (p = 0.004). Conclusions Our study has found that the OSRI is a significant predictor of survival at levels similar to those of the original authors and is a useful and simple tool in aiding complex decision making in patients presenting with MSC

Crossref

University of Huddersfield Repository

Huddersfield Research Portal