62 research outputs found

    Enhancing random forests performance in microarray data classification

    Get PDF
    Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies

    Assessing similarity of feature selection techniques in high-dimensional domains

    Get PDF
    Recent research efforts attempt to combine multiple feature selection techniques instead of using a single one. However, this combination is often made on an “ad hoc” basis, depending on the specific problem at hand, without considering the degree of diversity/similarity of the involved methods. Moreover, though it is recognized that different techniques may return quite dissimilar outputs, especially in high dimensional/small sample size domains, few direct comparisons exist that quantify these differences and their implications on classification performance. This paper aims to provide a contribution in this direction by proposing a general methodology for assessing the similarity between the outputs of different feature selection methods in high dimensional classification problems. Using as benchmark the genomics domain, an empirical study has been conducted to compare some of the most popular feature selection methods, and useful insight has been obtained about their pattern of agreement

    BioCloud Search EnGene: Surfing Biological Data on the Cloud

    Get PDF
    The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/

    Exploiting biomedical web resources: a case study

    Get PDF
    An increasing number of web resources continue to be extensively used by healthcare operators to obtain more accurate diagnostic results. In particular, health care is reaping the benefits of technological advances in genomic for facing the demand of genetic tests that allow a better comprehension of diagnostic results. Within this context, Gene Ontology (GO) is a popular and effective mean for extracting knowledge from a list of genes and evaluating their semantic similarity. This paper investigates about the potential and any limits of GO ontology as support for capturing information about a set of genes which are supposed to play a significant role in a pathological condition. In particular, we present a case study that exploits some biomedical web resources for devising several groups of functionally coherent genes and experiments about the evaluation of their semantic similarity over GO. Due to the GO structure and content, results reveal limitations that not affect the evaluation of the semantic similarity when genes exhibit simple correlations but influence the estimation of the relatedness of genes belonging to complex organizations

    A comparative analysis of biomarker selection techniques

    Get PDF
    Feature selection has become the essential step in biomarker discovery from high-dimensional genomics data. It is recognized that different feature selection techniques may result in different set of biomarkers, i.e. different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist that quantify these differences in a systematic way. In this paper, we propose a general methodology for comparing the outcomes of different selection techniques in the context of biomarker discovery. The comparison is carried out along two dimensions: (i) measuring the similarity/dissimilarity of selected gene sets, (ii) evaluating the implications of these differences in terms of both predictive performance and stability of selected gene sets. As a case study, we considered three benchmarks deriving from DNA micro-array experiments and conducted a comparative analysis among eight selection methods, representative of different classes of feature selection techniques. Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques

    Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data

    Get PDF
    Ensemble classification is a well-established approach that involves fusing the decisions of multiple predictive models. A similar “ensemble logic” has been recently applied to challenging feature selection tasks aimed at identifying the most informative variables (or features) for a given domain of interest. In this work, we discuss the rationale of ensemble feature selection and evaluate the effects and the implications of a specific ensemble approach, namely the data perturbation strategy. Basically, it consists in combining multiple selectors that exploit the same core algorithm but are trained on different perturbed versions of the original data. The real potential of this approach, still object of debate in the feature selection literature, is here investigated in conjunction with different kinds of core selection algorithms (both univariate and multivariate). In particular, we evaluate the extent to which the ensemble implementation improves the overall performance of the selection process, in terms of predictive accuracy and stability (i.e., robustness with respect to changes in the training data). Furthermore, we measure the impact of the ensemble approach on the final selection outcome, i.e. on the composition of the selected feature subsets. The results obtained on ten public genomic benchmarks provide useful insight on both the benefits and the limitations of such ensemble approach, paving the way to the exploration of new and wider ensemble schemes

    An evolutionary approach for balancing effectiveness and representation level in gene selection

    Get PDF
    As data mining develops and expands to new application areas, feature selection also reveals various aspects to be considered. This paper underlines two aspects that seem to categorize the large body of available feature selection algorithms: the effectiveness and the representation level. The effectiveness deals with selecting the minimum set of variables that maximize the accuracy of a classifier and the representation level concerns discovering how relevant the variables are for the domain of interest. For balancing the above aspects, the paper proposes an evolutionary framework for feature selection that expresses a hybrid method, organized in layers, each of them exploits a specific model of search strategy. Extensive experiments on gene selection from DNA-microarray datasets are presented and discussed. Results indicate that the framework compares well with different hybrid methods proposed in literature as it has the capability of finding well suited subsets of informative features while improving classification accurac

    Care pathways models and clinical outcomes in disorders of consciousness

    Get PDF
    Objective: Patients with Disorders of consciousness, are persons with extremely low functioning levels and represent a challenge for health care systems due to their high needs of facilitating environmental factors. Despite a common Italian health care path-way for these patients, no studies have analyzed information on how each region have implemented it in its welfare system correlating data with patients’ clinical outcomes. Materials and Methods: A multicenter observational pilot study was realized. Clinicians collected data on the care pathways of patients with Disorder of consciousness by ask-ing 90 patients’ caregivers to complete an ad hoc questionnaire through a structured phone interview. Questionnaire consisted of three sections: sociodemographic data, description of the care pathway done by the patient, and caregiver evaluation of health services and information received.Results: Seventy- three patients were analyzed. Length of hospital stay was different across the health care models and it was associated with improvement in clinical diag-nosis. In long- term care units, the diagnosis at admission and the number of caregivers available for each patient (median value=3) showed an indirect relationship with worsening probability in clinical outcome. Caregivers reported that communication with professionals (42%) and the answer to the need of information were the most critical points in the acute phase, whereas presence of Non- Governmental Organizations (25%) and availability of psychologists for caregivers (21%) were often missing during long-term care. The 65% of caregivers reported they did not know the UN Convention on the Rights of Persons with Disabilities. Conclusion: This study highlights relevant differences in analyzed models, despite a recommended national pathway of care. Future public health considerations and ac-tions are needed to guarantee equity and standardization of the care process in all European countries

    ACORN (A Clinically-Oriented Antimicrobial Resistance Surveillance Network) II: protocol for case based antimicrobial resistance surveillance

    Get PDF
    Background: Antimicrobial resistance surveillance is essential for empiric antibiotic prescribing, infection prevention and control policies and to drive novel antibiotic discovery. However, most existing surveillance systems are isolate-based without supporting patient-based clinical data, and not widely implemented especially in low- and middle-income countries (LMICs). Methods: A Clinically-Oriented Antimicrobial Resistance Surveillance Network (ACORN) II is a large-scale multicentre protocol which builds on the WHO Global Antimicrobial Resistance and Use Surveillance System to estimate syndromic and pathogen outcomes along with associated health economic costs. ACORN-healthcare associated infection (ACORN-HAI) is an extension study which focuses on healthcare-associated bloodstream infections and ventilator-associated pneumonia. Our main aim is to implement an efficient clinically-oriented antimicrobial resistance surveillance system, which can be incorporated as part of routine workflow in hospitals in LMICs. These surveillance systems include hospitalised patients of any age with clinically compatible acute community-acquired or healthcare-associated bacterial infection syndromes, and who were prescribed parenteral antibiotics. Diagnostic stewardship activities will be implemented to optimise microbiology culture specimen collection practices. Basic patient characteristics, clinician diagnosis, empiric treatment, infection severity and risk factors for HAI are recorded on enrolment and during 28-day follow-up. An R Shiny application can be used offline and online for merging clinical and microbiology data, and generating collated reports to inform local antibiotic stewardship and infection control policies. Discussion: ACORN II is a comprehensive antimicrobial resistance surveillance activity which advocates pragmatic implementation and prioritises improving local diagnostic and antibiotic prescribing practices through patient-centred data collection. These data can be rapidly communicated to local physicians and infection prevention and control teams. Relative ease of data collection promotes sustainability and maximises participation and scalability. With ACORN-HAI as an example, ACORN II has the capacity to accommodate extensions to investigate further specific questions of interest
    • …
    corecore