1,027 research outputs found

    Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes

    Full text link
    Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature

    Algorithms for the description of molecular sequences

    Get PDF
    Unambiguous sequence variant descriptions are important in reporting the outcome of clinical diagnostic DNA tests. The standard nomenclature of the Human Genome Variation Society (HGVS) describes the observed variant sequence relative to a given reference sequence. We propose an efficient algorithm for the extraction of HGVS descriptions from two DNA sequences. Our algorithm is able to compute the HGVS~descriptions of complete chromosomes or other large DNA strings in a reasonable amount of computation time and its resulting descriptions are relatively small. Additional applications include updating of gene variant database contents and reference sequence liftovers. Next, we adapted our method for the extraction of descriptions for protein sequences in particular for describing frame shifted variants. We propose an addition to the HGVS nomenclature for accommodating the (complex) frame shifted variants that can be described with our method. Finally, we applied our method to generate descriptions for Short Tandem Repeats (STRs), a form of self-similarity. We propose an alternative repeat variant that can be added to the existing HGVS nomenclature. The final chapter takes an explorative approach to classification in large cohort studies. We provide a ``cross-sectional'' investigation on this data to see the relative power of the different groups.  Algorithms and the Foundations of Software technolog

    Balanced Filtering via Non-Disclosive Proxies

    Full text link
    We study the problem of non-disclosively collecting a sample of data that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at collection time. Specifically, our collection mechanism does not reveal significantly more about group membership of any individual sample than can be ascertained from base rates alone. To do this, we adopt a fairness pipeline perspective, in which a learner can use a small set of labeled data to train a proxy function that can later be used for this filtering task. We then associate the range of the proxy function with sampling probabilities; given a new candidate, we classify it using our proxy function, and then select it for our sample with probability proportional to the sampling probability corresponding to its proxy classification. Importantly, we require that the proxy classification itself not reveal significant information about the sensitive group membership of any individual sample (i.e., it should be sufficiently non-disclosive). We show that under modest algorithmic assumptions, we find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze generalization properties

    Modeling Faceted Browsing with Category Theory for Reuse and Interoperability

    Get PDF
    Faceted browsing (also called faceted search or faceted navigation) is an exploratory search model where facets assist in the interactive navigation of search results. Facets are attributes that have been assigned to describe resources being explored; a faceted taxonomy is a collection of facets provided by the interface and is often organized as sets, hierarchies, or graphs. Faceted browsing has become ubiquitous with modern digital libraries and online search engines, yet the process is still difficult to abstractly model in a manner that supports the development of interoperable and reusable interfaces. We propose category theory as a theoretical foundation for faceted browsing and demonstrate how the interactive process can be mathematically abstracted in order to support the development of reusable and interoperable faceted systems. Existing efforts in facet modeling are based upon set theory, formal concept analysis, and light-weight ontologies, but in many regards they are implementations of faceted browsing rather than a specification of the basic, underlying structures and interactions. We will demonstrate that category theory allows us to specify faceted objects and study the relationships and interactions within a faceted browsing system. Resulting implementations can then be constructed through a category-theoretic lens using these models, allowing abstract comparison and communication that naturally support interoperability and reuse. In this context, reuse and interoperability are at two levels: between discrete systems and within a single system. Our model works at both levels by leveraging category theory as a common language for representation and computation. We will establish facets and faceted taxonomies as categories and will demonstrate how the computational elements of category theory, including products, merges, pushouts, and pullbacks, extend the usefulness of our model. More specifically, we demonstrate that categorical constructions such as the pullback and pushout operations can help organize and reorganize facets; these operations in particular can produce faceted views containing relationships not found in the original source taxonomy. We show how our category-theoretic model of facets relates to database schemas and discuss how this relationship assists in implementing the abstractions presented. We give examples of interactive interfaces from the biomedical domain to help illustrate how our abstractions relate to real-world requirements while enabling systematic reuse and interoperability. We introduce DELVE (Document ExpLoration and Visualization Engine), our framework for developing interactive visualizations as modular Web-applications in order to assist researchers with exploratory literature search. We show how facets relate to and control visualizations; we give three examples of text visualizations that either contain or interact with facets. We show how each of these visualizations can be represented with our model and demonstrate how our model directly informs implementation. With our general framework for communicating consistently about facets at a high level of abstraction, we enable the construction of interoperable interfaces and enable the intelligent reuse of both existing and future efforts

    A comparison of various approaches to the exponential random graph model:A reanalysis of 102 student networks in school classes

    Get PDF
    This paper describes an empirical comparison of four specifications of the exponential family of random graph models (ERGM), distinguished by model specification (dyadic independence, Markov, partial conditional dependence) and, for the Markov model, by estimation method (Maximum Pseudolikelihood, Maximum Likelihood). This was done by reanalyzing 102 student networks in 57 junior high school classes. At the level of all classes combined, earlier substantive conclusions were supported by all specifications. However, the different specifications led to different conclusions for individual classes. PL produced unreliable estimates (when ML is regarded as the standard) and had more convergence problems than ML. Furthermore, the estimates of covariate effects were affected considerably by controlling for network structure, although the precise specification of the structural part (Markov or partial conditional dependence) mattered less. (C) 2007 Elsevier BX All rights reserved

    Machine learning suggests sleep as a core factor in chronic pain

    Get PDF
    Patients with chronic pain have complex pain profiles and associated problems. Subgroup analysis can help identify key problems. We used a data-based approach to define pain phenotypes and their most relevant associated problems in 320 patients undergoing tertiary pain management. Unsupervised machine learning analysis of parameters "pain intensity," "number of pain areas," "pain duration," "activity pain interference," and "affective pain interference," implemented as emergent self-organizing maps, identified 3 patient phenotype clusters. Supervised analyses, implemented as different types of decision rules, identified "affective pain interference" and the "number of pain areas" as most relevant for cluster assignment. These appeared 698 and 637 times, respectively, in 1000 cross-validation runs among the most relevant characteristics in an item categorization approach in a computed ABC analysis. Cluster assignment was achieved with a median balanced accuracy of 79.9%, a sensitivity of 74.1%, and a specificity of 87.7%. In addition, among 59 demographic, pain etiology, comorbidity, lifestyle, psychological, and treatment-related variables, sleep problems appeared 638 and 439 times among the most important characteristics in 1000 cross-validation runs where patients were assigned to the 2 extreme pain phenotype clusters. Also important were the parameters "fear of pain," "self-rated poor health," and "systolic blood pressure." Decision trees trained with this information assigned patients to the extreme pain phenotype with an accuracy of 67%. Machine learning suggested sleep problems as key factors in the most difficult pain presentations, therefore deserving priority in the treatment of chronic pain.Peer reviewe
    corecore