438 research outputs found

    Improving Risk Predictions by Preprocessing Imbalanced Credit Data

    Get PDF
    Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used

    Effective interaction between helical bio-molecules

    Get PDF
    The effective interaction between two parallel strands of helical bio-molecules, such as deoxyribose nucleic acids (DNA), is calculated using computer simulations of the "primitive" model of electrolytes. In particular we study a simple model for B-DNA incorporating explicitly its charge pattern as a double-helix structure. The effective force and the effective torque exerted onto the molecules depend on the central distance and on the relative orientation. The contributions of nonlinear screening by monovalent counterions to these forces and torques are analyzed and calculated for different salt concentrations. As a result, we find that the sign of the force depends sensitively on the relative orientation. For intermolecular distances smaller than 6A˚6\AA it can be both attractive and repulsive. Furthermore we report a nonmonotonic behaviour of the effective force for increasing salt concentration. Both features cannot be described within linear screening theories. For large distances, on the other hand, the results agree with linear screening theories provided the charge of the bio-molecules is suitably renormalized.Comment: 18 pages, 18 figures included in text, 100 bibliog

    Lattice permutations and Poisson-Dirichlet distribution of cycle lengths

    Get PDF
    We study random spatial permutations on Z^3 where each jump x -> \pi(x) is penalized by a factor exp(-T ||x-\pi(x)||^2). The system is known to exhibit a phase transition for low enough T where macroscopic cycles appear. We observe that the lengths of such cycles are distributed according to Poisson-Dirichlet. This can be explained heuristically using a stochastic coagulation-fragmentation process for long cycles, which is supported by numerical data.Comment: 18 pages, 14 figure

    Search for heavy neutrinos mixing with tau neutrinos

    Get PDF
    We report on a search for heavy neutrinos (\nus) produced in the decay D_s\to \tau \nus at the SPS proton target followed by the decay \nudecay in the NOMAD detector. Both decays are expected to occur if \nus is a component of ντ\nu_{\tau}.\ From the analysis of the data collected during the 1996-1998 runs with 4.1×10194.1\times10^{19} protons on target, a single candidate event consistent with background expectations was found. This allows to derive an upper limit on the mixing strength between the heavy neutrino and the tau neutrino in the \nus mass range from 10 to 190 MeV\rm MeV. Windows between the SN1987a and Big Bang Nucleosynthesis lower limits and our result are still open for future experimental searches. The results obtained are used to constrain an interpretation of the time anomaly observed in the KARMEN1 detector.\Comment: 20 pages, 7 figures, a few comments adde

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Mapping disparities in education across low- and middle-income countries

    Get PDF
    Educational attainment is an important social determinant of maternal, newborn, and child health1–3. As a tool for promoting gender equity, it has gained increasing traction in popular media, international aid strategies, and global agenda-setting4–6. The global health agenda is increasingly focused on evidence of precision public health, which illustrates the subnational distribution of disease and illness7,8; however, an agenda focused on future equity must integrate comparable evidence on the distribution of social determinants of health9–11. Here we expand on the available precision SDG evidence by estimating the subnational distribution of educational attainment, including the proportions of individuals who have completed key levels of schooling, across all low- and middle-income countries from 2000 to 2017. Previous analyses have focused on geographical disparities in average attainment across Africa or for specific countries, but—to our knowledge—no analysis has examined the subnational proportions of individuals who completed specific levels of education across all low- and middle-income countries12–14. By geolocating subnational data for more than 184 million person-years across 528 data sources, we precisely identify inequalities across geography as well as within populations

    Star clusters near and far; tracing star formation across cosmic time

    Get PDF
    © 2020 Springer-Verlag. The final publication is available at Springer via https://doi.org/10.1007/s11214-020-00690-x.Star clusters are fundamental units of stellar feedback and unique tracers of their host galactic properties. In this review, we will first focus on their constituents, i.e.\ detailed insight into their stellar populations and their surrounding ionised, warm, neutral, and molecular gas. We, then, move beyond the Local Group to review star cluster populations at various evolutionary stages, and in diverse galactic environmental conditions accessible in the local Universe. At high redshift, where conditions for cluster formation and evolution are more extreme, we are only able to observe the integrated light of a handful of objects that we believe will become globular clusters. We therefore discuss how numerical and analytical methods, informed by the observed properties of cluster populations in the local Universe, are used to develop sophisticated simulations potentially capable of disentangling the genetic map of galaxy formation and assembly that is carried by globular cluster populations.Peer reviewedFinal Accepted Versio
    corecore