438 research outputs found
Improving Risk Predictions by Preprocessing Imbalanced Credit Data
Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used
Effective interaction between helical bio-molecules
The effective interaction between two parallel strands of helical
bio-molecules, such as deoxyribose nucleic acids (DNA), is calculated using
computer simulations of the "primitive" model of electrolytes. In particular we
study a simple model for B-DNA incorporating explicitly its charge pattern as a
double-helix structure. The effective force and the effective torque exerted
onto the molecules depend on the central distance and on the relative
orientation. The contributions of nonlinear screening by monovalent counterions
to these forces and torques are analyzed and calculated for different salt
concentrations. As a result, we find that the sign of the force depends
sensitively on the relative orientation. For intermolecular distances smaller
than it can be both attractive and repulsive. Furthermore we report a
nonmonotonic behaviour of the effective force for increasing salt
concentration. Both features cannot be described within linear screening
theories. For large distances, on the other hand, the results agree with linear
screening theories provided the charge of the bio-molecules is suitably
renormalized.Comment: 18 pages, 18 figures included in text, 100 bibliog
Lattice permutations and Poisson-Dirichlet distribution of cycle lengths
We study random spatial permutations on Z^3 where each jump x -> \pi(x) is
penalized by a factor exp(-T ||x-\pi(x)||^2). The system is known to exhibit a
phase transition for low enough T where macroscopic cycles appear. We observe
that the lengths of such cycles are distributed according to Poisson-Dirichlet.
This can be explained heuristically using a stochastic
coagulation-fragmentation process for long cycles, which is supported by
numerical data.Comment: 18 pages, 14 figure
Search for heavy neutrinos mixing with tau neutrinos
We report on a search for heavy neutrinos (\nus) produced in the decay
D_s\to \tau \nus at the SPS proton target followed by the decay \nudecay in
the NOMAD detector. Both decays are expected to occur if \nus is a component
of .\
From the analysis of the data collected during the 1996-1998 runs with
protons on target, a single candidate event consistent with
background expectations was found. This allows to derive an upper limit on the
mixing strength between the heavy neutrino and the tau neutrino in the \nus
mass range from 10 to 190 . Windows between the SN1987a and Big Bang
Nucleosynthesis lower limits and our result are still open for future
experimental searches. The results obtained are used to constrain an
interpretation of the time anomaly observed in the KARMEN1 detector.\Comment: 20 pages, 7 figures, a few comments adde
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
Mapping disparities in education across low- and middle-income countries
Educational attainment is an important social determinant of maternal, newborn, and child health1–3. As a tool for promoting gender equity, it has gained increasing traction in popular media, international aid strategies, and global agenda-setting4–6. The global health agenda is increasingly focused on evidence of precision public health, which illustrates the subnational distribution of disease and illness7,8; however, an agenda focused on future equity must integrate comparable evidence on the distribution of social determinants of health9–11. Here we expand on the available precision SDG evidence by estimating the subnational distribution of educational attainment, including the proportions of individuals who have completed key levels of schooling, across all low- and middle-income countries from 2000 to 2017. Previous analyses have focused on geographical disparities in average attainment across Africa or for specific countries, but—to our knowledge—no analysis has examined the subnational proportions of individuals who completed specific levels of education across all low- and middle-income countries12–14. By geolocating subnational data for more than 184Â million person-years across 528 data sources, we precisely identify inequalities across geography as well as within populations
Star clusters near and far; tracing star formation across cosmic time
© 2020 Springer-Verlag. The final publication is available at Springer via https://doi.org/10.1007/s11214-020-00690-x.Star clusters are fundamental units of stellar feedback and unique tracers of their host galactic properties. In this review, we will first focus on their constituents, i.e.\ detailed insight into their stellar populations and their surrounding ionised, warm, neutral, and molecular gas. We, then, move beyond the Local Group to review star cluster populations at various evolutionary stages, and in diverse galactic environmental conditions accessible in the local Universe. At high redshift, where conditions for cluster formation and evolution are more extreme, we are only able to observe the integrated light of a handful of objects that we believe will become globular clusters. We therefore discuss how numerical and analytical methods, informed by the observed properties of cluster populations in the local Universe, are used to develop sophisticated simulations potentially capable of disentangling the genetic map of galaxy formation and assembly that is carried by globular cluster populations.Peer reviewedFinal Accepted Versio
- …