428 research outputs found

    Fraud Dataset Benchmark and Applications

    Full text link
    Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields, fraud detection has unique challenges: high-class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these, the modeling approaches evaluated on datasets from other research fields may not work well for the fraud detection. In this paper, we introduce Fraud Dataset Benchmark (FDB), a compilation of publicly available datasets catered to fraud detection FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, estimating risk of loan default to content moderation. The Python based library for FDB provides a consistent API for data loading with standardized training and testing splits. We demonstrate several applications of FDB that are of broad interest for fraud detection, including feature engineering, comparison of supervised learning algorithms, label noise removal, class-imbalance treatment and semi-supervised learning. We hope that FDB provides a common playground for researchers and practitioners in the fraud detection domain to develop robust and customized machine learning techniques targeting various fraud use cases

    Differential cross section measurements for the production of a W boson in association with jets in proton‚Äďproton collisions at ‚ąös = 7 TeV

    Get PDF
    Measurements are reported of differential cross sections for the production of a W boson, which decays into a muon and a neutrino, in association with jets, as a function of several variables, including the transverse momenta (pT) and pseudorapidities of the four leading jets, the scalar sum of jet transverse momenta (HT), and the difference in azimuthal angle between the directions of each jet and the muon. The data sample of pp collisions at a centre-of-mass energy of 7 TeV was collected with the CMS detector at the LHC and corresponds to an integrated luminosity of 5.0 fb[superscript ‚ąí1]. The measured cross sections are compared to predictions from Monte Carlo generators, MadGraph + pythia and sherpa, and to next-to-leading-order calculations from BlackHat + sherpa. The differential cross sections are found to be in agreement with the predictions, apart from the pT distributions of the leading jets at high pT values, the distributions of the HT at high-HT and low jet multiplicity, and the distribution of the difference in azimuthal angle between the leading jet and the muon at low values.United States. Dept. of EnergyNational Science Foundation (U.S.)Alfred P. Sloan Foundatio

    Juxtaposing BTE and ATE ‚Äď on the role of the European insurance industry in funding civil litigation