569 research outputs found

    A comparative analysis of multi-level computer-assisted decision making systems for traumatic injuries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper focuses on the creation of a predictive computer-assisted decision making system for traumatic injury using machine learning algorithms. Trauma experts must make several difficult decisions based on a large number of patient attributes, usually in a short period of time. The aim is to compare the existing machine learning methods available for medical informatics, and develop reliable, rule-based computer-assisted decision-making systems that provide recommendations for the course of treatment for new patients, based on previously seen cases in trauma databases. Datasets of traumatic brain injury (TBI) patients are used to train and test the decision making algorithm. The work is also applicable to patients with traumatic pelvic injuries.</p> <p>Methods</p> <p>Decision-making rules are created by processing patterns discovered in the datasets, using machine learning techniques. More specifically, CART and C4.5 are used, as they provide grammatical expressions of knowledge extracted by applying logical operations to the available features. The resulting rule sets are tested against other machine learning methods, including AdaBoost and SVM. The rule creation algorithm is applied to multiple datasets, both with and without prior filtering to discover significant variables. This filtering is performed via logistic regression prior to the rule discovery process.</p> <p>Results</p> <p>For survival prediction using all variables, CART outperformed the other machine learning methods. When using only significant variables, neural networks performed best. A reliable rule-base was generated using combined C4.5/CART. The average predictive rule performance was 82% when using all variables, and approximately 84% when using significant variables only. The average performance of the combined C4.5 and CART system using significant variables was 89.7% in predicting the exact outcome (home or rehabilitation), and 93.1% in predicting the ICU length of stay for airlifted TBI patients.</p> <p>Conclusion</p> <p>This study creates an efficient computer-aided rule-based system that can be employed in decision making in TBI cases. The rule-bases apply methods that combine CART and C4.5 with logistic regression to improve rule performance and quality. For final outcome prediction for TBI cases, the resulting rule-bases outperform systems that utilize all available variables.</p

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Dealing with Missing Data and Uncertainty in the Context of Data Mining

    Get PDF
    Missing data is an issue in many real-world datasets yet robust methods for dealing with missing data appropriately still need development. In this paper we conduct an investigation of how some methods for handling missing data perform when the uncertainty increases. Using benchmark datasets from the UCI Machine Learning repository we generate datasets for our experimentation with increasing amounts of data Missing Completely At Random (MCAR) both at the attribute level and at the record level. We then apply four classification algorithms: C4.5, Random Forest, Naïve Bayes and Support Vector Machines (SVMs). We measure the performance of each classifiers on the basis of complete case analysis, simple imputation and then we study the performance of the algorithms that can handle missing data. We find that complete case analysis has a detrimental effect because it renders many datasets infeasible when missing data increases, particularly for high dimensional data. We find that increasing missing data does have a negative effect on the performance of all the algorithms tested but the different algorithms tested either using preprocessing in the form of simple imputation or handling the missing data do not show a significant difference in performance

    Foodways in transition: food plants, diet and local perceptions of change in a Costa Rican Ngäbe community

    Get PDF
    Background Indigenous populations are undergoing rapid ethnobiological, nutritional and socioeconomic transitions while being increasingly integrated into modernizing societies. To better understand the dynamics of these transitions, this article aims to characterize the cultural domain of food plants and analyze its relation with current day diets, and the local perceptions of changes given amongst the Ngäbe people of Southern Conte-Burica, Costa Rica, as production of food plants by its residents is hypothesized to be drastically in recession with an decreased local production in the area and new conservation and development paradigms being implemented. Methods Extensive freelisting, interviews and workshops were used to collect the data from 72 participants on their knowledge of food plants, their current dietary practices and their perceptions of change in local foodways, while cultural domain analysis, descriptive statistical analyses and development of fundamental explanatory themes were employed to analyze the data. Results Results show a food plants domain composed of 140 species, of which 85 % grow in the area, with a medium level of cultural consensus, and some age-based variation. Although many plants still grow in the area, in many key species a decrease on local production–even abandonment–was found, with much reduced cultivation areas. Yet, the domain appears to be largely theoretical, with little evidence of use; and the diet today is predominantly dependent on foods bought from the store (more than 50 % of basic ingredients), many of which were not salient or not even recognized as ‘food plants’ in freelists exercises. While changes in the importance of food plants were largely deemed a result of changes in cultural preferences for store bought processed food stuffs and changing values associated with farming and being food self-sufficient, Ngäbe were also aware of how changing household livelihood activities, and the subsequent loss of knowledge and use of food plants, were in fact being driven by changes in social and political policies, despite increases in forest cover and biodiversity. Conclusions Ngäbe foodways are changing in different and somewhat disconnected ways: knowledge of food plants is varied, reflecting most relevant changes in dietary practices such as lower cultivation areas and greater dependence on food from stores by all families. We attribute dietary shifts to socioeconomic and political changes in recent decades, in particular to a reduction of local production of food, new economic structures and agents related to the State and globalization

    Muscle and tendon adaptations to moderate load eccentric vs. concentric resistance exercise in young and older males.

    Get PDF
    Resistance exercise training (RET) is well-known to counteract negative age-related changes in both muscle and tendon tissue. Traditional RET consists of both concentric (CON) and eccentric (ECC) contractions; nevertheless, isolated ECC contractions are metabolically less demanding and, thus, may be more suitable for older populations. However, whether submaximal (60% 1RM) CON or ECC contractions differ in their effectiveness is relatively unknown. Further, whether the time course of muscle and tendon adaptations differs to the above is also unknown. Therefore, this study aimed to establish the time course of muscle and tendon adaptations to submaximal CON and ECC RET. Twenty healthy young (24.5 ± 5.1 years) and 17 older males (68.1 ± 2.4 years) were randomly allocated to either isolated CON or ECC RET which took place 3/week for 8 weeks. Tendon biomechanical properties, muscle architecture and maximal voluntary contraction were assessed every 2 weeks and quadriceps muscle volume every 4 weeks. Positive changes in tendon Young's modulus were observed after 4 weeks in all groups after which adaptations in young males plateaued but continued to increase in older males, suggesting a dampened rate of adaptation with age. However, both CON and ECC resulted in similar overall changes in tendon Young's modulus, in all groups. Muscle hypertrophy and strength increases were similar between CON and ECC in all groups. However, pennation angle increases were greater in CON, and fascicle length changes were greater in ECC. Notably, muscle and tendon adaptations appeared to occur in synergy, presumably to maintain the efficacy of the muscle-tendon unit

    A particle swarm optimization approach using adaptive entropy-based fitness quantification of expert knowledge for high-level, real-time cognitive robotic control

    Get PDF
    Abstract: High-level, real-time mission control of semi-autonomous robots, deployed in remote and dynamic environments, remains a challenge. Control models, learnt from a knowledgebase, quickly become obsolete when the environment or the knowledgebase changes. This research study introduces a cognitive reasoning process, to select the optimal action, using the most relevant knowledge from the knowledgebase, subject to observed evidence. The approach in this study introduces an adaptive entropy-based set-based particle swarm algorithm (AE-SPSO) and a novel, adaptive entropy-based fitness quantification (AEFQ) algorithm for evidence-based optimization of the knowledge. The performance of the AE-SPSO and AEFQ algorithms are experimentally evaluated with two unmanned aerial vehicle (UAV) benchmark missions: (1) relocating the UAV to a charging station and (2) collecting and delivering a package. Performance is measured by inspecting the success and completeness of the mission and the accuracy of autonomous flight control. The results show that the AE-SPSO/AEFQ approach successfully finds the optimal state-transition for each mission task and that autonomous flight control is successfully achieved

    A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21<sup>st </sup>century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system.</p> <p>Methods</p> <p>The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets.</p> <p>Results</p> <p>(1) Feature selection: CAP has a more effective "modelling" focus than DA.</p> <p>(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males.</p> <p>Conclusion</p> <p>Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.</p

    Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse <it>et al. </it>in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.</p> <p>Results</p> <p>We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.</p> <p>Conclusions</p> <p>The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.</p

    Reconstructing cancer genomes from paired-end sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data.</p> <p>Results</p> <p>By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles.</p> <p>Conclusions</p> <p>We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at <url>http://compbio.cs.brown.edu/software/</url>.</p
    corecore