1,691 research outputs found
CREDIT SCORING USING LOGISTIC REGRESSION
This report presents an approach to predict the credit scores of customers using the Logistic Regression machine learning algorithm. The research objective of this project is to perform a comparative study between feature selection and feature extraction, against the same dataset using the Logistic Regression machine learning algorithm. For feature selection, we have used Stepwise Logistic Regression. For feature extraction, we have used Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). In order to test the accuracy obtained using feature selection and feature extraction, we used a public credit dataset having 11 features and 150,000 records. After performing feature reduction, Logistic Regression algorithm was used for classification. In our results, we observed that Stepwise Logistic Regression gave a 14% increase in accuracy as compared to Singular Value Decomposition (SVD) and a 10% increase in accuracy as compared to Weighted Singular Value Decomposition (SVD). Thus, we can conclude that Stepwise Logistic Regression performed significantly better than both Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). The benefit of using feature selection was that it helped us in identifying important features, which improved the prediction accuracy of the classifier
Malware Detection Using Dynamic Analysis
In this research, we explore the field of dynamic analysis which has shown promis- ing results in the field of malware detection. Here, we extract dynamic software birth- marks during malware execution and apply machine learning based detection tech- niques to the resulting feature set. Specifically, we consider Hidden Markov Models and Profile Hidden Markov Models. To determine the effectiveness of this dynamic analysis approach, we compare our detection results to the results obtained by using static analysis. We show that in some cases, significantly stronger results can be obtained using our dynamic approach
Detecting money laundering using hidden Markov model
Recent money laundering scandals, like the Danske Bank and Swedbank’s failure to mitigate money laundering risks (Kim, 2019), have made “anti money laundering” (AML) a much discussed topic. Governments are making AML regulations tougher and financial institutions are struggling to comply, one of the requirements is to actively monitor financial transactions to detect suspicious ones. Most of the financial industry applies simple rule-based methods for monitoring. This thesis provides a practical model to detect suspicious transactions using the hidden Markov model (HMM). The use of HMM is justified, because the criminal nature of a transaction is hidden to the financial institution, only transaction parameters can be observed. By using past data, a model is built to detect if current transaction is suspicious or not. The model is assessed with artificial and real transactions data. It was concluded that this model performs better than a classical k-means clustering algorithm
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Hidden Markov models and neural networks for speech recognition
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..
Detecting money laundering in transaction monitoring using hidden Markov model
The purpose of the thesis is to introduce, build and test HMM as a method of detecting suspicious financial transactions that might be correlated with money laundering. HMM is a statistical Markov model in which the system being modelled is assumed to be Markov process with unobserved (i.e., hidden) states. These hidden states however generate observable outcomes. HMM fits the context of transaction monitoring in the fight against money laundering as the intent of a transaction (part of money laundering scheme or not) is and only some parameters of the transaction can be observed. The model was built and tested on artificial datasets provided by Salv Technologies and commonly used k-means clustering model was chosen for comparison. Analysis and testing showed that overall, HMM outperforms k-means clustering. Based on analysis, it can be concluded that in essence, HMM can be used in transaction monitoring but getting high precision needs expert knowledge and practical testing. A brief overview of money laundering, anomaly detection methods and HMM are given. Empirical part includes application of HMM on 3 different study cases using R software
Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences
Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective—noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance
Integration of speech biometrics in a phone payment system: text-independent speaker verification
Integration of a speaker recognition system in a payment system by phone.Nowadays, the integration of biometrics in security systems is a prominent research
and application field. Also, it is clear that speech is the most common form of
communication, which makes a swell candidate. While using speech as a biometric,
one could say there are two types of systems that should be analyzed: those systems
which do know what the speaker is going to say upon verification and those that
do not. This degree thesis offers an overview of both systems, focusing on those
that do not know what the speaker is going to say beforehand, also known as textindependent
systems. To be able to determine which would be the best approach
to integrate speech biometrics into a security system, both types of systems are
compared; and two methodologies are also analyzed for the text-independent system.
To conclude, one of those methodologies is implemented in a software library which
allows the creation a text-independent speaker verification system.En l’actualitat, la integració de biometries en els sistemes de seguretat és una branca
d’investigació i aplicacions prominent. A més a més, la veu és un dels mitjans més
comuns de comunicació, cosa que fa que sigui una bona candidata per a aquests
sistemes. Si prenem la parla com a biometria, es pot dir que hi ha dos tipus de
sistemes bastant diferenciats a analitzar: aquells sistemes els quals saben el que dirà
la persona que s’intenta verificar i aquells que no saben el que dirà. Aquest treball
ofereix una visió àmplia dels dos tipus de sistemes, centrant-se en els sistemes on no
es sap el que es dirà, també coneguts com sistemes de text independent. Per decidir
quin seria la millor manera d’integrar la parla com a biometria en un sistema de
seguretat, es comparen ambdós sistemes i, en el cas del sistema de text independent,
es comparen també dues metodologies diferents. Per acabar, s’implementa una
d’aquestes metodologies a unes llibreries de software per dur a terme un sistema de
verificació de locutor amb text independent.En la actualidad, la integración de biometrías en los sistemas de seguridad es una rama de investigación y de aplicaciones prominente. Además, está claro que la voz es el medio más común de comunicación y es por eso que es una buena candidata. Usando el habla como biometría, se podría decir que hay dos tipos de sistemas diferentes a analizar: aquellos sistemas que saben de antemano aquello que va a decir el locutor que intenta verificarse y aquellos que no lo saben. Este trabajo ofrece una visión amplia de los dos tipos de sistemas, centrándose en los sistemas donde aquello que se va a decir no se sabe, también conocidos como sistemas de texto independiente. Para decir cuál sería la mejor manera de integrar el habla como biometría en un sistema de seguridad se comparan ambos sistemas y, en el caso del sistema de texto independiente, se comparan también dos metodologías diferentes. Para finalizar, se implementa una de estas últimas en unas librerías de software para poder llevar a cabo un sistema de verificación de locutor de texto independiente
Generalized Hidden Filter Markov Models Applied to Speaker Recognition
Classification of time series has wide Air Force, DoD and commercial interest, from automatic target recognition systems on munitions to recognition of speakers in diverse environments. The ability to effectively model the temporal information contained in a sequence is of paramount importance. Toward this goal, this research develops theoretical extensions to a class of stochastic models and demonstrates their effectiveness on the problem of text-independent (language constrained) speaker recognition. Specifically within the hidden Markov model architecture, additional constraints are implemented which better incorporate observation correlations and context, where standard approaches fail. Two methods of modeling correlations are developed, and their mathematical properties of convergence and reestimation are analyzed. These differ in modeling correlation present in the time samples and those present in the processed features, such as Mel frequency cepstral coefficients. The system models speaker dependent phonemes, making use of word dictionary grammars, and recognition is based on normalized log-likelihood Viterbi decoding. Both closed set identification and speaker verification using cohorts are performed on the YOHO database. YOHO is the only large scale, multiple-session, high-quality speech database for speaker authentication and contains over one hundred speakers stating combination locks. Equal error rates of 0.21% for males and 0.31% for females are demonstrated. A critical error analysis using a hypothesis test formulation provides the maximum number of errors observable while still meeting the goal error rates of 1% False Reject and 0.1% False Accept. Our system achieves this goal
- …