449 research outputs found
Evolutionary undersampling for extremely imbalanced big data classification under apache spark
The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become even more difficult when the number of the majority class examples is very big. In this scenario, the use of the evolutionary model becomes unpractical due to the memory and time constrictions. Divide-and-conquer approaches based on the MapReduce paradigm have already been proposed to handle this type of problems by dividing data into multiple subsets. However, in extremely imbalanced cases, these models may suffer from a lack of density from the minority class in the subsets considered. Aiming at addressing this problem, in this contribution we provide a new big data scheme based on the new emerging technology Apache Spark to tackle highly imbalanced datasets. We take advantage of its in-memory operations to diminish the effect of the small sample size. The key point of this proposal lies in the independent management of majority and minority class examples, allowing us to keep a higher number of minority class examples in each subset. In our experiments, we analyze the proposed model with several data sets with up to 17 million instances. The results show the goodness of this evolutionary undersampling model for extremely imbalanced big data classification
Octupole transitions in the 208Pb region
The 208Pb region is characterised by the existence of collective octupole states.
Here we populated such states in 208Pb + 208Pb deep-inelastic reactions. γ-ray angular
distribution measurements were used to infer the octupole character of several E3 transitions.
The octupole character of the 2318 keV 17− → 14+ in 208Pb, 2485 keV 19/2
− → 13/2
+ in
207Pb, 2419 keV 15/2
− → 9/2
+ in 209Pb and 2465 keV 17/2
+ → 11/2
− in 207Tl transitions was
demonstrated for the first time. In addition, shell model calculations were performed using two
different sets of two-body matrix elements. Their predictions were compared with emphasis on
collective octupole states.This work is supported by the Science and Technology Facilities Council
(STFC), UK, US Department of Energy, Office of Nuclear Physics, under Contract No. DEAC02-06CH11357
and DE-FG02-94ER40834, NSF grant PHY-1404442
Urban and Transport Planning Related Exposures and Mortality: A Health Impact Assessment for Cities
BACKGROUND: By 2050, almost 70% of people globally are projected
                to live in urban areas. As the environments we inhabit affect
                our health, urban and transport designs that promote healthy
                living are needed. OBJECTIVE: We estimated the number of
                premature deaths preventable under compliance with international
                exposure recommendations for physical activity (PA), air
                pollution, noise, heat, and access to green spaces. METHODS: We
                developed and applied the Urban and TranspOrt Planning Health
                Impact Assessment (UTOPHIA) tool to Barcelona. Exposure
                estimates and mortality data were available for 1357361
                residents. We compared recommended with current exposure levels.
                We quantified the associations between exposures and mortality
                and calculated population attributable fractions to estimate the
                number of premature deaths preventable. We also modeled
                life-expectancy and economic impacts. RESULTS: We estimated that
                annually almost 20% of mortality could be prevented if
                international recommendations for performance of PA, exposure to
                air pollution, noise, heat, and access to green space were
                complied with. Estimations showed that the biggest share in
                preventable deaths was attributable to increases in PA, followed
                by exposure reductions in air pollution, traffic noise and heat.
                Access to green spaces had smaller effects on mortality.
                Compliance was estimated to increase the average life expectancy
                by 360 (95% CI: 219, 493) days and result in economic savings of
                9.3 (95% CI: 4.9; 13.2) billion euro per year. CONCLUSIONS: PA
                factors and environmental exposures can be modified by changes
                in urban and transport planning. We emphasize the need for (1)
                the reduction of motorized traffic through the promotion of
                active and public transport and (2) the provision of green
                infrastructure, which are both suggested to provide PA
                opportunities and mitigation of air pollution, noise, and heat
Resonant Lifetime of Core-Excited Organic Adsorbates from First Principles
We investigate by first-principles simulations the resonant electron-transfer
lifetime from the excited state of an organic adsorbate to a semiconductor
surface, namely isonicotinic acid on rutile TiO(110). The
molecule-substrate interaction is described using density functional theory,
while the effect of a truly semi-infinite substrate is taken into account by
Green's function techniques. Excitonic effects due to the presence of
core-excited atoms in the molecule are shown to be instrumental to understand
the electron-transfer times measured using the so-called core-hole-clock
technique. In particular, for the isonicotinic acid on TiO(110), we find
that the charge injection from the LUMO is quenched since this state lies
within the substrate band gap. We compute the resonant charge-transfer times
from LUMO+1 and LUMO+2, and systematically investigate the dependence of the
elastic lifetimes of these states on the alignment among adsorbate and
substrate states.Comment: 24 pages, 6 figures, to appear in Journal of Physical Chemistry 
Blockade of CNG channels abrogates urethral relaxation induced by soluble guanylate cyclase activation
An ant colony-based semi-supervised approach for learning classification rules
Semi-supervised learning methods create models from a few labeled instances and a great number of unlabeled instances. They appear as a good option in scenarios where there is a lot of unlabeled data and the process of labeling instances is expensive, such as those where most Web applications stand. This paper proposes a semi-supervised self-training algorithm called Ant-Labeler. Self-training algorithms take advantage of supervised learning algorithms to iteratively learn a model from the labeled instances and then use this model to classify unlabeled instances. The instances that receive labels with high confidence are moved from the unlabeled to the labeled set, and this process is repeated until a stopping criteria is met, such as labeling all unlabeled instances. Ant-Labeler uses an ACO algorithm as the supervised learning method in the self-training procedure to generate interpretable rule-based models—used as an ensemble to ensure accurate predictions. The pheromone matrix is reused across different executions of the ACO algorithm to avoid rebuilding the models from scratch every time the labeled set is updated. Results showed that the proposed algorithm obtains better predictive accuracy than three state-of-the-art algorithms in roughly half of the datasets on which it was tested, and the smaller the number of labeled instances, the better the Ant-Labeler performance
A Taxonomy of Traffic Forecasting Regression Problems From a Supervised Learning Perspective
One contemporary policy to deal with traffic congestion is the design and implementation of forecasting methods that allow users to plan ahead of time and decision makers to improve traffic management. Current data availability and growing computational capacities have increased the use of machine learning (ML) to address traffic prediction, which is mostly modeled as a supervised regression problem. Although some studies have presented taxonomies to sort the literature in this field, they are mostly oriented to classify the ML methods applied and a little effort has been directed to categorize the traffic forecasting problems approached by them. As far as we know, there is no comprehensive taxonomy that classifies these problems from the point of view of both traffic and ML. In this paper, we propose a taxonomy to categorize the aforementioned problems from both traffic and a supervised regression learning perspective. The taxonomy aims at unifying and consolidating categorization criteria related to traffic and it introduces new criteria to classify the problems in terms of how they are modeled from a supervised regression approach. The traffic forecasting literature, from 2000 to 2019, is categorized using this taxonomy to illustrate its descriptive power. From this categorization, different remarks are discussed regarding the current gaps and trends in the addressed traffic forecasting area
Quantum chemical calculations of X-ray emission spectroscopy
The calculation of X-ray emission spectroscopy with equation of motion coupled cluster theory (EOM-CCSD), time dependent  density functional theory (TDDFT) and resolution of the identity single excitation configuration interaction with second order perturbation theory (RI-CIS(D)) is studied. These methods can be applied to calculate X-ray emission transitions by using a reference determinant with a core-hole, and they provide a convenient approach to compute the X-ray emission spectroscopy of large systems since all of the required states can be obtained within a single calculation removing the need to perform a separate calculation for each state. For all of the methods, basis sets with the inclusion of additional basis functions to describe core orbitals are necessary, particularly when studying transitions involving the 1s or- bitals of heavier nuclei. EOM-CCSD predicts accurate transition energies when compared with experiment, however, its application to larger systems is restricted by its computational cost and difficulty in converging the CCSD equations for a core-hole reference determinant, which become increasing problematic as the size of the system studied increases. While RI-CIS(D) gives accurate transition energies for small molecules containing first row nuclei, its application to larger systems is limited by the CIS states providing a poor zeroth order reference for perturbation theory which leads to very large errors in the computed transition energies for some states. TDDFT with standard exchange-correlation functionals predicts transition energies that are much larger than experiment. Optimization of a hybrid and short-range cor- rected functional to predict the X-ray emission transitions results in much closer agreement with EOM-CCSD. The most accurate exchange-correlation functional identified is a modified B3LYP hybrid functional with 66% Hartree-Fock exchange, denoted B66LYP, which predicts X-ray emission spectra for a range of molecules including fluorobenzene, nitrobenzene, ace- tone, dimethyl sulfoxide and CF3Cl in good agreement with experiment
- …
