254 research outputs found

    An Ensemble Self-Structuring Neural Network Approach to Solving Classification Problems with Virtual Concept Drift and its Application to Phishing Websites

    Get PDF
    Classification in data mining is one of the well-known tasks that aim to construct a classification model from a labelled input data set. Most classification models are devoted to a static environment where the complete training data set is presented to the classification algorithm. This data set is assumed to cover all information needed to learn the pertinent concepts (rules and patterns) related to how to classify unseen examples to predefined classes. However, in dynamic (non-stationary) domains, the set of features (input data attributes) may change over time. For instance, some features that are considered significant at time Ti might become useless or irrelevant at time Ti+j. This situation results in a phenomena called Virtual Concept Drift. Yet, the set of features that are dropped at time Ti+j might return to become significant again in the future. Such a situation results in the so-called Cyclical Concept Drift, which is a direct result of the frequently called catastrophic forgetting dilemma. Catastrophic forgetting happens when the learning of new knowledge completely removes the previously learned knowledge. Phishing is a dynamic classification problem where a virtual concept drift might occur. Yet, the virtual concept drift that occurs in phishing might be guided by some malevolent intelligent agent rather than occurring naturally. One reason why phishers keep changing the features combination when creating phishing websites might be that they have the ability to interpret the anti-phishing tool and thus they pick a new set of features that can circumvent it. However, besides the generalisation capability, fault tolerance, and strong ability to learn, a Neural Network (NN) classification model is considered as a black box. Hence, if someone has the skills to hack into the NN based classification model, he might face difficulties to interpret and understand how the NN processes the input data in order to produce the final decision (assign class value). In this thesis, we investigate the problem of virtual concept drift by proposing a framework that can keep pace with the continuous changes in the input features. The proposed framework has been applied to phishing websites classification problem and it shows competitive results with respect to various evaluation measures (Harmonic Mean (F1-score), precision, accuracy, etc.) when compared to several other data mining techniques. The framework creates an ensemble of classifiers (group of classifiers) and it offers a balance between stability (maintaining previously learned knowledge) and plasticity (learning knowledge from the newly offered training data set). Hence, the framework can also handle the cyclical concept drift. The classifiers that constitute the ensemble are created using an improved Self-Structuring Neural Networks algorithm (SSNN). Traditionally, NN modelling techniques rely on trial and error, which is a tedious and time-consuming process. The SSNN simplifies structuring NN classifiers with minimum intervention from the user. The framework evaluates the ensemble whenever a new data set chunk is collected. If the overall accuracy of the combined results from the ensemble drops significantly, a new classifier is created using the SSNN and added to the ensemble. Overall, the experimental results show that the proposed framework affords a balance between stability and plasticity and can effectively handle the virtual concept drift when applied to phishing websites classification problem. Most of the chapters of this thesis have been subject to publicatio

    The adoption of bitcoins technology: The difference between perceived future expectation and intention to use bitcoins: Does social influence matter?

    Get PDF
    Bitcoin is a decentralized system that tries to become a solution to the shortcomings of fiat and gold-based currencies. Considering its newness, the adoption level of bitcoin is yet understood. Hence, several variables are proposed in this work in examining user perceptions regarding performance expectancy, effort expectancy, trust, adoption risk, decentralization and social influence interplay, with the context of user’s future expectation and behavioral intentions to use bitcoins. Data were gathered from 293 completed questionnaire and analised using AMOS 18. The outcomes prove the sound predictability of the proposed model regarding user’s future expectations and intentions toward bitcoins. All hypotheses were supported, they were significantly affecting the dependent variables. Social influence was found as the highest predictor of behavioral intention to negatively utilize bitcoins. The significant impact of social influence, adoption risk and effort expectancy which affect behavioral intention to use bitcoins the most, are demonstrated in this study. Bitcoins should thus, present an effective, feasible and personalized program which will assist efficient usage among users. Additionally, the impacts of social influence, adoption risk and perceived trust on behavioral intention to utilize new technology were compared, and their direct path was tested together, for the first time in this context

    Hybrid feature selection method based on particle swarm optimization and adaptive local search method

    Get PDF
    Machine learning has been expansively examined with data classification as the most popularly researched subject. The accurateness of prediction is impacted by the data provided to the classification algorithm. Meanwhile, utilizing a large amount of data may incur costs especially in data collection and preprocessing. Studies on feature selection were mainly to establish techniques that can decrease the number of utilized features (attributes) in classification, also using data that generate accurate prediction is important. Hence, a particle swarm optimization (PSO) algorithm is suggested in the current article for selecting the ideal set of features. PSO algorithm showed to be superior in different domains in exploring the search space and local search algorithms are good in exploiting the search regions. Thus, we propose the hybridized PSO algorithm with an adaptive local search technique which works based on the current PSO search state and used for accepting the candidate solution. Having this combination balances the local intensification as well as the global diversification of the searching process. Hence, the suggested algorithm surpasses the original PSO algorithm and other comparable approaches, in terms of performance

    The global burden of cancer attributable to risk factors, 2010-19 : a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background Understanding the magnitude of cancer burden attributable to potentially modifiable risk factors is crucial for development of effective prevention and mitigation strategies. We analysed results from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 to inform cancer control planning efforts globally. Methods The GBD 2019 comparative risk assessment framework was used to estimate cancer burden attributable to behavioural, environmental and occupational, and metabolic risk factors. A total of 82 risk-outcome pairs were included on the basis of the World Cancer Research Fund criteria. Estimated cancer deaths and disability-adjusted life-years (DALYs) in 2019 and change in these measures between 2010 and 2019 are presented. Findings Globally, in 2019, the risk factors included in this analysis accounted for 4.45 million (95% uncertainty interval 4.01-4.94) deaths and 105 million (95.0-116) DALYs for both sexes combined, representing 44.4% (41.3-48.4) of all cancer deaths and 42.0% (39.1-45.6) of all DALYs. There were 2.88 million (2.60-3.18) risk-attributable cancer deaths in males (50.6% [47.8-54.1] of all male cancer deaths) and 1.58 million (1.36-1.84) risk-attributable cancer deaths in females (36.3% [32.5-41.3] of all female cancer deaths). The leading risk factors at the most detailed level globally for risk-attributable cancer deaths and DALYs in 2019 for both sexes combined were smoking, followed by alcohol use and high BMI. Risk-attributable cancer burden varied by world region and Socio-demographic Index (SDI), with smoking, unsafe sex, and alcohol use being the three leading risk factors for risk-attributable cancer DALYs in low SDI locations in 2019, whereas DALYs in high SDI locations mirrored the top three global risk factor rankings. From 2010 to 2019, global risk-attributable cancer deaths increased by 20.4% (12.6-28.4) and DALYs by 16.8% (8.8-25.0), with the greatest percentage increase in metabolic risks (34.7% [27.9-42.8] and 33.3% [25.8-42.0]). Interpretation The leading risk factors contributing to global cancer burden in 2019 were behavioural, whereas metabolic risk factors saw the largest increases between 2010 and 2019. Reducing exposure to these modifiable risk factors would decrease cancer mortality and DALY rates worldwide, and policies should be tailored appropriately to local cancer risk factor burden. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.Peer reviewe

    An embedding technique to determine ττ backgrounds in proton-proton collision data

    Get PDF
    An embedding technique is presented to estimate standard model tau tau backgrounds from data with minimal simulation input. In the data, the muons are removed from reconstructed mu mu events and replaced with simulated tau leptons with the same kinematic properties. In this way, a set of hybrid events is obtained that does not rely on simulation except for the decay of the tau leptons. The challenges in describing the underlying event or the production of associated jets in the simulation are avoided. The technique described in this paper was developed for CMS. Its validation and the inherent uncertainties are also discussed. The demonstration of the performance of the technique is based on a sample of proton-proton collisions collected by CMS in 2017 at root s = 13 TeV corresponding to an integrated luminosity of 41.5 fb(-1).Peer reviewe

    Studies of Beauty Suppression via Nonprompt D-0 Mesons in Pb-Pb Collisions at root s(NN)=5.02 TeV

    Get PDF
    The transverse momentum spectra of D-0 mesons from b hadron decays are measured at midrapidity (vertical bar y vertical bar D-0 yield is found to be suppressed in the measured p(T) range from 2 to 100 GeV/c as compared to pp collisions. The suppression is weaker than that of prompt D-0 mesons and charged hadrons for p(T) around 10 GeV/c. While theoretical calculations incorporating partonic energy loss in the quark-gluon plasma can successfully describe the measured B -> D-0 suppression at higher p(T), the data show an indication of larger suppression than the model predictions in the range of 2 <p(T) <5 GeV/c.Peer reviewe

    Calibration of the CMS hadron calorimeters using proton-proton collision data at root s=13 TeV

    Get PDF
    Methods are presented for calibrating the hadron calorimeter system of theCMSetector at the LHC. The hadron calorimeters of the CMS experiment are sampling calorimeters of brass and scintillator, and are in the form of one central detector and two endcaps. These calorimeters cover pseudorapidities vertical bar eta vertical bar ee data. The energy scale of the outer calorimeters has been determined with test beam data and is confirmed through data with high transverse momentum jets. In this paper, we present the details of the calibration methods and accuracy.Peer reviewe

    Measurement of nuclear modification factors of gamma(1S)), gamma(2S), and gamma(3S) mesons in PbPb collisions at root s(NN)=5.02 TeV

    Get PDF
    The cross sections for ϒ(1S), ϒ(2S), and ϒ(3S) production in lead-lead (PbPb) and proton-proton (pp) collisions at √sNN = 5.02 TeV have been measured using the CMS detector at the LHC. The nuclear modification factors, RAA, derived from the PbPb-to-pp ratio of yields for each state, are studied as functions of meson rapidity and transverse momentum, as well as PbPb collision centrality. The yields of all three states are found to be significantly suppressed, and compatible with a sequential ordering of the suppression, RAA(ϒ(1S)) > RAA(ϒ(2S)) > RAA(ϒ(3S)). The suppression of ϒ(1S) is larger than that seen at √sNN = 2.76 TeV, although the two are compatible within uncertainties. The upper limit on the RAA of ϒ(3S) integrated over pT, rapidity and centrality is 0.096 at 95% confidence level, which is the strongest suppression observed for a quarkonium state in heavy ion collisions to date. © 2019 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Funded by SCOAP3.Peer reviewe

    Search for anomalous couplings in boosted WW/WZ -> l nu q(q)over-bar production in proton-proton collisions at root s=8TeV

    Get PDF
    Peer reviewe

    Global, regional, and national burden of disorders affecting the nervous system, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    BackgroundDisorders affecting the nervous system are diverse and include neurodevelopmental disorders, late-life neurodegeneration, and newly emergent conditions, such as cognitive impairment following COVID-19. Previous publications from the Global Burden of Disease, Injuries, and Risk Factor Study estimated the burden of 15 neurological conditions in 2015 and 2016, but these analyses did not include neurodevelopmental disorders, as defined by the International Classification of Diseases (ICD)-11, or a subset of cases of congenital, neonatal, and infectious conditions that cause neurological damage. Here, we estimate nervous system health loss caused by 37 unique conditions and their associated risk factors globally, regionally, and nationally from 1990 to 2021.MethodsWe estimated mortality, prevalence, years lived with disability (YLDs), years of life lost (YLLs), and disability-adjusted life-years (DALYs), with corresponding 95% uncertainty intervals (UIs), by age and sex in 204 countries and territories, from 1990 to 2021. We included morbidity and deaths due to neurological conditions, for which health loss is directly due to damage to the CNS or peripheral nervous system. We also isolated neurological health loss from conditions for which nervous system morbidity is a consequence, but not the primary feature, including a subset of congenital conditions (ie, chromosomal anomalies and congenital birth defects), neonatal conditions (ie, jaundice, preterm birth, and sepsis), infectious diseases (ie, COVID-19, cystic echinococcosis, malaria, syphilis, and Zika virus disease), and diabetic neuropathy. By conducting a sequela-level analysis of the health outcomes for these conditions, only cases where nervous system damage occurred were included, and YLDs were recalculated to isolate the non-fatal burden directly attributable to nervous system health loss. A comorbidity correction was used to calculate total prevalence of all conditions that affect the nervous system combined.FindingsGlobally, the 37 conditions affecting the nervous system were collectively ranked as the leading group cause of DALYs in 2021 (443 million, 95% UI 378–521), affecting 3·40 billion (3·20–3·62) individuals (43·1%, 40·5–45·9 of the global population); global DALY counts attributed to these conditions increased by 18·2% (8·7–26·7) between 1990 and 2021. Age-standardised rates of deaths per 100 000 people attributed to these conditions decreased from 1990 to 2021 by 33·6% (27·6–38·8), and age-standardised rates of DALYs attributed to these conditions decreased by 27·0% (21·5–32·4). Age-standardised prevalence was almost stable, with a change of 1·5% (0·7–2·4). The ten conditions with the highest age-standardised DALYs in 2021 were stroke, neonatal encephalopathy, migraine, Alzheimer's disease and other dementias, diabetic neuropathy, meningitis, epilepsy, neurological complications due to preterm birth, autism spectrum disorder, and nervous system cancer.InterpretationAs the leading cause of overall disease burden in the world, with increasing global DALY counts, effective prevention, treatment, and rehabilitation strategies for disorders affecting the nervous system are needed
    corecore