375 research outputs found

    Good methods for coping with missing data in decision trees

    Get PDF
    We propose a simple and effective method for dealing with missing data in decision trees used for classification. We call this approach 'missingness incorporated in attributes' (MIA). It is very closely related to the technique of treating 'missing' as a category in its own right, generalizing it for use with continuous as well as categorical variables. We show through a substantial data-based study of classification accuracy that MIA exhibits consistently good performance across a broad range of data types and of sources and amounts of missingness. It is competitive with the best of the rest (particularly, a multiple imputation EM algorithm method; EMMI) while being conceptually and computationally simpler. A simple combination of MIA and EMMI is slower but even more accurate

    Dynamic transitions between metastable states in a superconducting ring

    Full text link
    Applying the time-dependent Ginzburg-Landau equations, transitions between metastable states of a superconducting ring are investigated in the presence of an external magnetic field. It is shown that if the ring exhibits several metastable states at a particular magnetic field, the transition from one metastable state to another one is governed by both the relaxation time of the absolute value of the order parameter tau_{|psi|} and the relaxation time of the phase of the order parameter tau_{phi}. We found that the larger the ratio tau_{|psi|}tau_{phi} the closer the final state will be to the absolute minimum of the free energy, i.e. the thermodynamic equilibrium. The transition to the final state occurs through a subsequent set of single phase slips at a particular point along the ring.Comment: 7 pages, 6 figures, Revtex 4.0 styl

    E-Voting in an ubicomp world: trust, privacy, and social implications

    Get PDF
    The advances made in technology have unchained the user from the desktop into interactions where access is anywhere, anytime. In addition, the introduction of ubiquitous computing (ubicomp) will see further changes in how we interact with technology and also socially. Ubicomp evokes a near future in which humans will be surrounded by “always-on,” unobtrusive, interconnected intelligent objects where information is exchanged seamlessly. This seamless exchange of information has vast social implications, in particular the protection and management of personal information. This research project investigates the concepts of trust and privacy issues specifically related to the exchange of e-voting information when using a ubicomp type system

    Jahn-Teller polarons and their superconductivity in a molecular conductor

    Full text link
    We present a theoretical study of a possibility of superconductivity in a three dimensional molecular conductor in which the interaction between electrons in doubly degenerate molecular orbitals and an {\em intra}molecular vibration mode is large enough to lead to the formation of EβE\otimes \beta Jahn-Teller small polarons. We argue that the effective polaron-polaron interaction can be attractive for material parameters realizable in molecular conductors. This interaction is the source of superconductivity in our model. On analyzing superconducting instability in the weak and strong coupling regimes of this attractive interaction, we find that superconducting transition temperatures up to 100 K are achievable in molecular conductors within this mechanism. We also find, for two particles per molecular site, a novel Mott insulating state in which a polaron singlet occupies one of the doubly degenerate orbitals on each site. Relevance of this study in the search for new molecular superconductors is pointed out.Comment: Submitted to Phys. Rev.

    Taking things public: a contribution to address human dimensions of environmental change

    Get PDF
    This paper addresses the question of environmental change in Amazônia, by looking at the experiences of the large-scale biosphere–atmosphere (LBA) experiment in the Amazon, and three other enterprises—the extractive reserves, the Pilot Programme to Conserve the Brazilian Rain Forest (PPG7) and ecological-economic zoning—that address questions of sustainable development in the region. The LBA experience shows how the integration with the social sciences can be critical for science to explore its own outcomes for society, while the other programmes expose environmental change as a problem with too many intersections within society, so the outcomes of any initiative depends on placing it before a complex, tense and wide arena

    Potential application of item-response theory to interpretation of medical codes in electronic patient records

    Get PDF
    Background: electronic patient records are generally coded using extensive sets of codes but the significance of the utilisation of individual codes may be unclear. Item response theory (IRT) models are used to characterise the psychometric properties of items included in tests and questionnaires. This study asked whether the properties of medical codes in electronic patient records may be characterised through the application of item response theory models.Methods: data were provided by a cohort of 47,845 participants from 414 family practices in the UK General Practice Research Database (GPRD) with a first stroke between 1997 and 2006. Each eligible stroke code, out of a set of 202 OXMIS and Read codes, was coded as either recorded or not recorded for each participant. A two parameter IRT model was fitted using marginal maximum likelihood estimation. Estimated parameters from the model were considered to characterise each code with respect to the latent trait of stroke diagnosis. The location parameter is referred to as a calibration parameter, while the slope parameter is referred to as a discrimination parameter.Results: there were 79,874 stroke code occurrences available for analysis. Utilisation of codes varied between family practices with intraclass correlation coefficients of up to 0.25 for the most frequently used codes. IRT analyses were restricted to 110 Read codes. Calibration and discrimination parameters were estimated for 77 (70%) codes that were endorsed for 1,942 stroke patients. Parameters were not estimated for the remaining more frequently used codes. Discrimination parameter values ranged from 0.67 to 2.78, while calibration parameters values ranged from 4.47 to 11.58. The two parameter model gave a better fit to the data than either the one- or three-parameter models. However, high chi-square values for about a fifth of the stroke codes were suggestive of poor item fit.Conclusion: the application of item response theory models to coded electronic patient records might potentially contribute to identifying medical codes that offer poor discrimination or low calibration. This might indicate the need for improved coding sets or a requirement for improved clinical coding practice. However, in this study estimates were only obtained for a small proportion of participants and there was some evidence of poor model fit. There was also evidence of variation in the utilisation of codes between family practices raising the possibility that, in practice, properties of codes may vary for different coder

    Heritability of attention problems in children II: longitudinal results from a study of twins age 3 to 12.

    Get PDF
    this paper we present data of large samples of twin families, with an equal number of girls and boys. The well-known gender difference with boys displaying more OA and AP was observed at each age. Even at the age of 3, boys display more OA problems than girls. Clinical studies have indicated that severe problem behavior can be identified in very young children (see for review, Campbell, 1995; Keenan & Wakschlag, 2000; Shaw, Owens, Giovannelli, & Winslow, 2001) and that the onset of ADHD is during the pre-school period (Barkley, Fisher, Edelbrock, & Smallish, 1990; Table 6 Top part includes percentages of total variances (diagonal) and covariances (off-diagonal) explained by additive genetic, genetic dominance, and unique environmental components based on best fitting models. Percentages for boys and girls are reported below and above diagonal, respectively. Lower part includes correlations calculated for additive genetic, genetic dominance, and unique environmental sources of variance between different ages. Correlations for boys and girls are reported below and above diagonal, respectively Relative proportions of variance and covariance BoysnGirls A% D% E% OA 3 AP 7 AP 10 AP 12 OA 3 AP 7 AP 10 AP 12 OA 3 AP 7 AP 10 AP 12 OA 3 50n41 73 79 75 22n33 17 13 14 28n26 10 8 11 AP 7 59 33n57 50 53 31 39n16 31 28 10 28n27 19 19 AP 10 86 31 41n48 47 6 51 31n25 32 8 18 28n27 21 AP 12 71 24 31 40n54 16 55 45 30n18 13 21 24 30n28 Correlations between different ages BoysnGirls ADE OA 3 AP 7 AP 10 AP 12 OA 3 AP 7 AP 10 AP 12 OA 3 AP 7 AP 10 AP 12 OA 3 1.00 .60 .66 .57 1.00 .30 .16 .20 1.00 .15 .12 .14 AP 7 .57 1.00 .62 .57 .41 1.00 .99 1.00 .15 1.00 .46 .41 AP 10 .68 .56 1.00 .61 .08 .94 1.00 1.00 .11 .42 1.00 .50 AP 12 .49 .42 .53 1.00 .20 .98 .99 1.00 .14 .45 .58 1.00 ..

    Determination of dichlobenil and its major metabolite (BAM) in onions by PTV–GC–MS using PARAFAC2 and experimental design methodology

    Get PDF
    The optimization of a GC–MS analytical procedure which includes derivatization, Quick Easy Cheap Effective Rugged and Safe (QuEChERS) and programmed temperature vaporization (PTV) using design of experiments is performed to determine 2,6-dichlorobenzonitrile (dichlobenil) and 2,6-dichlorobenzamide (BAM) in onions, using 3,5-dichlorobenzonitrile and 2,4-dichlorobenzamide as internal standards. The use of a central composite design and two D-optimal designs, together with the desirability function, makes it possible to significantly reduce the economic, time and environmental cost of the study. The usefulness of PARAFAC2 for solving problems as the interference of unexpected derivatization artifacts unavoidably linked to some derivatization agents, or the presence of coeluents from the complex matrix, which share m/z ratios with the target compounds, is shown. The limits of decision (CCα) of the optimized procedure, 5.00 μg kg− 1 for dichlobenil and 1.55 μg kg− 1 for BAM (α = 0.05), are below the maximum residue limit (MRL) established by the EU for dichlobenil (20 μg kg− 1) in this commodity.Ministerio de Economía y Competitividad (CTQ2011-26022) and Junta de Castilla y León (BU108A11-2

    UNMASC: Tumor-only variant calling with unmatched normal controls

    Get PDF
    Despite years of progress, mutation detection in cancer samples continues to require significant manual review as a final step. Expert review is particularly challenging in cases where tumors are sequenced without matched normal control DNA. Attempts have been made to call somatic point mutations without a matched normal sample by removing well-known germline variants, utilizing unmatched normal controls, and constructing decision rules to classify sequencing errors and private germline variants. With budgetary constraints related to computational and sequencing costs, finding the appropriate number of controls is a crucial step to identifying somatic variants. Our approach utilizes public databases for canonical somatic variants as well as germline variants and leverages information gathered about nearby positions in the normal controls. Drawing from our cohort of targeted capture panel sequencing of tumor and normal samples with varying tumortypes and demographics, these served as a benchmark for our tumor-only variant calling pipeline to observe the relationship between our ability to correctly classify variants against a number of unmatched normals. With our benchmarked samples, approximately ten normal controls were needed to maintain 94% sensitivity, 99% specificity and 76% positive predictive value, far outperforming comparable methods. Our approach, called UNMASC, also serves as a supplement to traditional tumor with matched normal variant calling workflows and can potentially extend to other concerns arising from analyzing next generation sequencing data

    The prognostic significance of low-frequency somatic mutations in metastatic cutaneous melanoma

    Get PDF
    Background: Little is known about the prognostic significance of somatically mutated genes in metastatic melanoma (MM). We have employed a combined clinical and bioinformatics approach on tumor samples from cutaneous melanoma (SKCM) as part of The Cancer Genome Atlas project (TCGA) to identify mutated genes with potential clinical relevance. Methods: After limiting our DNA sequencing analysis to MM samples (n = 356) and to the CANCER CENSUS gene list, we filtered out mutations with low functional significance (snpEFF). We performed Cox analysis on 53 genes that were mutated in ≥3% of samples, and had ≥50% difference in incidence of mutations in deceased subjects versus alive subjects. Results: Four genes were potentially prognostic [RAC1, FGFR1, CARD11, CIITA; false discovery rate (FDR) 75% of the samples that exhibited corresponding DNA mutations. The low frequency, UV signature type and RNA expression of the 22 genes in MM samples were confirmed in a separate multi-institution validation cohort (n = 413). An underpowered analysis within a subset of this validation cohort with available patient follow-up (n = 224) showed that somatic mutations in SPEN and RAC1 reached borderline prognostic significance [log-rank favorable (p = 0.09) and adverse (p = 0.07), respectively]. Somatic mutations in SPEN, and to a lesser extent RAC1, were not associated with definite gene copy number or RNA expression alterations. High (>2+) nuclear plus cytoplasmic expression intensity for SPEN was associated with longer melanoma-specific overall survival (OS) compared to lower (≤ 2+) nuclear intensity (p = 0.048). We conclude that expressed somatic mutations in infrequently mutated genes beyond the well-characterized ones (e.g., BRAF, RAS, CDKN2A, PTEN, TP53), such as RAC1 and SPEN, may have prognostic significance in MM
    corecore