1,027 research outputs found
Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes
Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature
Algorithms for the description of molecular sequences
Unambiguous sequence variant descriptions are important in reporting
the outcome of clinical diagnostic DNA tests. The standard
nomenclature of the Human Genome Variation Society (HGVS) describes
the observed variant sequence relative to a given reference sequence.
We propose an efficient algorithm for the extraction of
HGVS descriptions from two DNA sequences.
Our algorithm is able to compute the HGVS~descriptions of complete
chromosomes or other large DNA strings in a reasonable amount of
computation time and its resulting descriptions are relatively small.
Additional applications include updating of gene variant database
contents and reference sequence liftovers.
Next, we adapted our method for the extraction of descriptions for protein sequences in particular for describing frame shifted variants. We propose an addition to the HGVS nomenclature for accommodating the (complex)
frame shifted variants that can be described with our method.
Finally, we applied our method to generate descriptions for Short Tandem Repeats (STRs), a form of self-similarity. We propose an alternative repeat variant that can be added to the existing
HGVS nomenclature.
The final chapter takes an explorative approach to classification in large cohort studies. We provide a ``cross-sectional'' investigation on this data to see the relative power of the different groups.
Algorithms and the Foundations of Software technolog
Balanced Filtering via Non-Disclosive Proxies
We study the problem of non-disclosively collecting a sample of data that is
balanced with respect to sensitive groups when group membership is unavailable
or prohibited from use at collection time. Specifically, our collection
mechanism does not reveal significantly more about group membership of any
individual sample than can be ascertained from base rates alone. To do this, we
adopt a fairness pipeline perspective, in which a learner can use a small set
of labeled data to train a proxy function that can later be used for this
filtering task. We then associate the range of the proxy function with sampling
probabilities; given a new candidate, we classify it using our proxy function,
and then select it for our sample with probability proportional to the sampling
probability corresponding to its proxy classification. Importantly, we require
that the proxy classification itself not reveal significant information about
the sensitive group membership of any individual sample (i.e., it should be
sufficiently non-disclosive). We show that under modest algorithmic
assumptions, we find such a proxy in a sample- and oracle-efficient manner.
Finally, we experimentally evaluate our algorithm and analyze generalization
properties
Modeling Faceted Browsing with Category Theory for Reuse and Interoperability
Faceted browsing (also called faceted search or faceted navigation) is an exploratory search model where facets assist in the interactive navigation of search results. Facets are attributes that have been assigned to describe resources being explored; a faceted taxonomy is a collection of facets provided by the interface and is often organized as sets, hierarchies, or graphs. Faceted browsing has become ubiquitous with modern digital libraries and online search engines, yet the process is still difficult to abstractly model in a manner that supports the development of interoperable and reusable interfaces. We propose category theory as a theoretical foundation for faceted browsing and demonstrate how the interactive process can be mathematically abstracted in order to support the development of reusable and interoperable faceted systems.
Existing efforts in facet modeling are based upon set theory, formal concept analysis, and light-weight ontologies, but in many regards they are implementations of faceted browsing rather than a specification of the basic, underlying structures and interactions. We will demonstrate that category theory allows us to specify faceted objects and study the relationships and interactions within a faceted browsing system. Resulting implementations can then be constructed through a category-theoretic lens using these models, allowing abstract comparison and communication that naturally support interoperability and reuse.
In this context, reuse and interoperability are at two levels: between discrete systems and within a single system. Our model works at both levels by leveraging category theory as a common language for representation and computation. We will establish facets and faceted taxonomies as categories and will demonstrate how the computational elements of category theory, including products, merges, pushouts, and pullbacks, extend the usefulness of our model. More specifically, we demonstrate that categorical constructions such as the pullback and pushout operations can help organize and reorganize facets; these operations in particular can produce faceted views containing relationships not found in the original source taxonomy. We show how our category-theoretic model of facets relates to database schemas and discuss how this relationship assists in implementing the abstractions presented.
We give examples of interactive interfaces from the biomedical domain to help illustrate how our abstractions relate to real-world requirements while enabling systematic reuse and interoperability. We introduce DELVE (Document ExpLoration and Visualization Engine), our framework for developing interactive visualizations as modular Web-applications in order to assist researchers with exploratory literature search. We show how facets relate to and control visualizations; we give three examples of text visualizations that either contain or interact with facets. We show how each of these visualizations can be represented with our model and demonstrate how our model directly informs implementation.
With our general framework for communicating consistently about facets at a high level of abstraction, we enable the construction of interoperable interfaces and enable the intelligent reuse of both existing and future efforts
A comparison of various approaches to the exponential random graph model:A reanalysis of 102 student networks in school classes
This paper describes an empirical comparison of four specifications of the exponential family of random graph models (ERGM), distinguished by model specification (dyadic independence, Markov, partial conditional dependence) and, for the Markov model, by estimation method (Maximum Pseudolikelihood, Maximum Likelihood). This was done by reanalyzing 102 student networks in 57 junior high school classes. At the level of all classes combined, earlier substantive conclusions were supported by all specifications. However, the different specifications led to different conclusions for individual classes. PL produced unreliable estimates (when ML is regarded as the standard) and had more convergence problems than ML. Furthermore, the estimates of covariate effects were affected considerably by controlling for network structure, although the precise specification of the structural part (Markov or partial conditional dependence) mattered less. (C) 2007 Elsevier BX All rights reserved
Machine learning suggests sleep as a core factor in chronic pain
Patients with chronic pain have complex pain profiles and associated problems. Subgroup analysis can help identify key problems. We used a data-based approach to define pain phenotypes and their most relevant associated problems in 320 patients undergoing tertiary pain management. Unsupervised machine learning analysis of parameters "pain intensity," "number of pain areas," "pain duration," "activity pain interference," and "affective pain interference," implemented as emergent self-organizing maps, identified 3 patient phenotype clusters. Supervised analyses, implemented as different types of decision rules, identified "affective pain interference" and the "number of pain areas" as most relevant for cluster assignment. These appeared 698 and 637 times, respectively, in 1000 cross-validation runs among the most relevant characteristics in an item categorization approach in a computed ABC analysis. Cluster assignment was achieved with a median balanced accuracy of 79.9%, a sensitivity of 74.1%, and a specificity of 87.7%. In addition, among 59 demographic, pain etiology, comorbidity, lifestyle, psychological, and treatment-related variables, sleep problems appeared 638 and 439 times among the most important characteristics in 1000 cross-validation runs where patients were assigned to the 2 extreme pain phenotype clusters. Also important were the parameters "fear of pain," "self-rated poor health," and "systolic blood pressure." Decision trees trained with this information assigned patients to the extreme pain phenotype with an accuracy of 67%. Machine learning suggested sleep problems as key factors in the most difficult pain presentations, therefore deserving priority in the treatment of chronic pain.Peer reviewe
- …