37 research outputs found

    Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

    Full text link
    For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities between features, have major theoretical drawbacks. We introduce new adjusted stability measures that overcome these drawbacks. We compare them to each other and to existing stability measures based on both artificial and real sets of selected features. Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable

    Predicting disease progression in behavioral variant frontotemporal dementia

    Get PDF
    Introduction: The behavioral variant of frontotemporal dementia (bvFTD) is a rare neurodegenerative disease. Reliable predictors of disease progression have not been sufficiently identified. We investigated multivariate magnetic resonance imaging (MRI) biomarker profiles for their predictive value of individual decline. Methods: One hundred five bvFTD patients were recruited from the German frontotemporal lobar degeneration (FTLD) consortium study. After defining two groups ("fast progressors" vs. "slow progressors"), we investigated the predictive value of MR brain volumes for disease progression rates performing exhaustive screenings with multivariate classification models. Results: We identified areas that predict disease progression rate within 1 year. Prediction measures revealed an overall accuracy of 80% across our 50 top classification models. Especially the pallidum, middle temporal gyrus, inferior frontal gyrus, cingulate gyrus, middle orbitofrontal gyrus, and insula occurred in these models. Discussion: Based on the revealed marker combinations an individual prognosis seems to be feasible. This might be used in clinical studies on an individualized progression model

    Ensemble of a subset of kNN classifiers

    Get PDF
    Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

    A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

    Get PDF
    Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

    Differentiation of multiple types of pancreatico-biliary tumors by molecular analysis of clinical specimens

    No full text
    Timely and accurate diagnosis of pancreatic ductal adenocarcinoma (PDAC) is critical in order to provide adequate treatment to patients. However, the clinical signs and symptoms of PDAC are shared by several types of malignant or benign tumors which may be difficult to differentiate from PDAC with conventional diagnostic procedures. Among others, these include ampullary cancers, solid pseudopapillary tumors, and adenocarcinomas of the distant bile duct, as well as inflammatory masses developing in chronic pancreatitis. Here, we report an approach to accurately differentiate between these different types of pancreatic masses based on molecular analysis of biopsy material. A total of 156 bulk tissue and fine needle aspiration biopsy samples were analyzed using a dedicated diagnostic cDNA array and a composite classification algorithm developed based on linear support vector machines. All five histological subtypes of pancreatic masses were clearly separable with 100\% accuracy when using all 156 individual samples for classification. Generalized performance of the classification system was tested by 10x10-fold cross validation (100 test runs). Correct classification into the five diagnostic groups was demonstrated for 81.5\% of 1,560 test set predictions. Performance increased to 85.3\% accuracy when PDAC and distant bile duct carcinomas were combined in a single diagnostic class. Importantly, overall sensitivity of detection of malignant disease was 92.2\%. The molecular diagnostic approach presented here is suitable to significantly aid in the differential diagnosis of undetermined pancreatic masses. To our knowledge, this is the first study reporting accurate differentiation between several types of pancreatico-biliary tumors in a single molecular analytical procedure
    corecore