20,529 research outputs found
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
A critical cluster analysis of 44 indicators of author-level performance
This paper explores the relationship between author-level bibliometric
indicators and the researchers the "measure", exemplified across five academic
seniorities and four disciplines. Using cluster methodology, the disciplinary
and seniority appropriateness of author-level indicators is examined.
Publication and citation data for 741 researchers across Astronomy,
Environmental Science, Philosophy and Public Health was collected in Web of
Science (WoS). Forty-four indicators of individual performance were computed
using the data. A two-step cluster analysis using IBM SPSS version 22 was
performed, followed by a risk analysis and ordinal logistic regression to
explore cluster membership. Indicator scores were contextualized using the
individual researcher's curriculum vitae. Four different clusters based on
indicator scores ranked researchers as low, middle, high and extremely high
performers. The results show that different indicators were appropriate in
demarcating ranked performance in different disciplines. In Astronomy the h2
indicator, sum pp top prop in Environmental Science, Q2 in Philosophy and
e-index in Public Health. The regression and odds analysis showed individual
level indicator scores were primarily dependent on the number of years since
the researcher's first publication registered in WoS, number of publications
and number of citations. Seniority classification was secondary therefore no
seniority appropriate indicators were confidently identified. Cluster
methodology proved useful in identifying disciplinary appropriate indicators
providing the preliminary data preparation was thorough but needed to be
supplemented by other analyses to validate the results. A general disconnection
between the performance of the researcher on their curriculum vitae and the
performance of the researcher based on bibliometric indicators was observed.Comment: 28 pages, 7 tables, 2 figures, 2 appendice
Optimal Data Split Methodology for Model Validation
The decision to incorporate cross-validation into validation processes of
mathematical models raises an immediate question - how should one partition the
data into calibration and validation sets? We answer this question
systematically: we present an algorithm to find the optimal partition of the
data subject to certain constraints. While doing this, we address two critical
issues: 1) that the model be evaluated with respect to predictions of a given
quantity of interest and its ability to reproduce the data, and 2) that the
model be highly challenged by the validation set, assuming it is properly
informed by the calibration set. This framework also relies on the interaction
between the experimentalist and/or modeler, who understand the physical system
and the limitations of the model; the decision-maker, who understands and can
quantify the cost of model failure; and the computational scientists, who
strive to determine if the model satisfies both the modeler's and decision
maker's requirements. We also note that our framework is quite general, and may
be applied to a wide range of problems. Here, we illustrate it through a
specific example involving a data reduction model for an ICCD camera from a
shock-tube experiment located at the NASA Ames Research Center (ARC).Comment: Submitted to International Conference on Modeling, Simulation and
Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 201
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
A validation of the Oswestry Spinal Risk Index
Purpose
The purpose of this study was to validate the Oswestry Spinal Risk Index (OSRI) in an external population. The OSRI predicts survival in patients with metastatic spinal cord compression (MSCC).
Methods
We analysed the data of 100 patients undergoing surgical intervention for MSCC at a tertiary spinal unit and recorded the primary tumour pathology and Karnofsky performance status to calculate the OSRI. Logistic regression models and survival plots were applied to the data in accordance with the original paper.
Results
Lower OSRI scores predicted longer survival. The OSRI score predicted survival accurately in 74% of cases (p = 0.004).
Conclusions
Our study has found that the OSRI is a significant predictor of survival at levels similar to those of the original authors and is a useful and simple tool in aiding complex decision making in patients presenting with MSC
- …