24,159 research outputs found
Classification of Credit Card Frauds Detection using machine learning techniques
Credit card fraud refers to the illegal activities carried out by criminals. In this research paper, we delve into the topic by exploring four different approaches to analyze fraud, namely decision trees, logistic regression, support vector machines, and Random Forests. Our proposed technique encompasses four stages: inputting the dataset, balancing the data through sampling, training classifier models, and detecting fraud. To analyze the data, we utilized two methods: forward stepwise logistic regression analysis (LR) and decision tree analysis (DT), in addition to Random Forest and support vector machine. Based on the outcomes of our analysis, the decision tree algorithm produced the highest AUC and accuracy value, achieving a perfect score of 1. On the other hand, logistic regression yielded the lowest values of 0.33 and 0.2933 for AUC and accuracy, respectively. Moreover, the implementation of forest algorithms resulted in an impressive accuracy rate of 99.5%, which signifies a significant advancement in automating the detection of credit card fraud
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
DISCRIMINANT STEPWISE PROCEDURE
Stepwise procedure is now probably the most popular tool for automatic feature selection. In the most cases it represents model selection approach which evaluates various feature subsets (so called wrapper). In fact it is heuristic search technique which examines the space of all possible feature subsets. This method is known in the literature under different names and variants. We organize the concepts and terminology, and show several variants of stepwise feature selection from a search strategy point of view. Short review of implementations in R will be given
Bayesian model search and multilevel inference for SNP association studies
Technological advances in genotyping have given rise to hypothesis-based
association studies of increasing scope. As a result, the scientific hypotheses
addressed by these studies have become more complex and more difficult to
address using existing analytic methodologies. Obstacles to analysis include
inference in the face of multiple comparisons, complications arising from
correlations among the SNPs (single nucleotide polymorphisms), choice of their
genetic parametrization and missing data. In this paper we present an efficient
Bayesian model search strategy that searches over the space of genetic markers
and their genetic parametrization. The resulting method for Multilevel
Inference of SNP Associations, MISA, allows computation of multilevel posterior
probabilities and Bayes factors at the global, gene and SNP level, with the
prior distribution on SNP inclusion in the model providing an intrinsic
multiplicity correction. We use simulated data sets to characterize MISA's
statistical power, and show that MISA has higher power to detect association
than standard procedures. Using data from the North Carolina Ovarian Cancer
Study (NCOCS), MISA identifies variants that were not identified by standard
methods and have been externally ``validated'' in independent studies. We
examine sensitivity of the NCOCS results to prior choice and method for
imputing missing data. MISA is available in an R package on CRAN.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Feature Selection of Post-Graduation Income of College Students in the United States
This study investigated the most important attributes of the 6-year
post-graduation income of college graduates who used financial aid during their
time at college in the United States. The latest data released by the United
States Department of Education was used. Specifically, 1,429 cohorts of
graduates from three years (2001, 2003, and 2005) were included in the data
analysis. Three attribute selection methods, including filter methods, forward
selection, and Genetic Algorithm, were applied to the attribute selection from
30 relevant attributes. Five groups of machine learning algorithms were applied
to the dataset for classification using the best selected attribute subsets.
Based on our findings, we discuss the role of neighborhood professional degree
attainment, parental income, SAT scores, and family college education in
post-graduation incomes and the implications for social stratification.Comment: 14 pages, 6 tables, 3 figure
Responder Identification in Clinical Trials with Censored Data
We present a newly developed technique for identification of positive and negative responders to a new treatment which was compared to a classical treatment (or placebo) in a randomized clinical trial. This bump-hunting-based method was developed for trials in which the two treatment arms do not differ in survival overall. It checks in a systematic manner if certain subgroups, described by predictive factors do show difference in survival due to the new treatment. Several versions of the method were discussed and compared in a simulation study. The best version of the responder identification method employs martingale residuals to a prognostic model as response in a stabilized through bootstrapping bump hunting procedure. On average it recognizes 90% of the time the correct positive responder group and 99% of the time the correct negative responder group
- …