72,373 research outputs found
Evolutionary Optimization Of Support Vector Machines
Support vector machines are a relatively new approach for creating classifiers that have become increasingly popular in the machine learning community. They present several advantages over other methods like neural networks in areas like training speed, convergence, complexity control of the classifier, as well as a stronger mathematical background based on optimization and statistical learning theory. This thesis deals with the problem of model selection with support vector machines, that is, the problem of finding the optimal parameters that will improve the performance of the algorithm. It is shown that genetic algorithms provide an effective way to find the optimal parameters for support vector machines. The proposed algorithm is compared with a backpropagation Neural Network in a dataset that represents individual models for electronic commerce
PhysicsGP: A Genetic Programming Approach to Event Selection
We present a novel multivariate classification technique based on Genetic
Programming. The technique is distinct from Genetic Algorithms and offers
several advantages compared to Neural Networks and Support Vector Machines. The
technique optimizes a set of human-readable classifiers with respect to some
user-defined performance measure. We calculate the Vapnik-Chervonenkis
dimension of this class of learning machines and consider a practical example:
the search for the Standard Model Higgs Boson at the LHC. The resulting
classifier is very fast to evaluate, human-readable, and easily portable. The
software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu
Feature Selection of Post-Graduation Income of College Students in the United States
This study investigated the most important attributes of the 6-year
post-graduation income of college graduates who used financial aid during their
time at college in the United States. The latest data released by the United
States Department of Education was used. Specifically, 1,429 cohorts of
graduates from three years (2001, 2003, and 2005) were included in the data
analysis. Three attribute selection methods, including filter methods, forward
selection, and Genetic Algorithm, were applied to the attribute selection from
30 relevant attributes. Five groups of machine learning algorithms were applied
to the dataset for classification using the best selected attribute subsets.
Based on our findings, we discuss the role of neighborhood professional degree
attainment, parental income, SAT scores, and family college education in
post-graduation incomes and the implications for social stratification.Comment: 14 pages, 6 tables, 3 figure
Chi-square-based scoring function for categorization of MEDLINE citations
Objectives: Text categorization has been used in biomedical informatics for
identifying documents containing relevant topics of interest. We developed a
simple method that uses a chi-square-based scoring function to determine the
likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our
procedure requires construction of a genetic and a nongenetic domain document
corpus. We used MeSH descriptors assigned to MEDLINE citations for this
categorization task. We compared frequencies of MeSH descriptors between two
corpora applying chi-square test. A MeSH descriptor was considered to be a
positive indicator if its relative observed frequency in the genetic domain
corpus was greater than its relative observed frequency in the nongenetic
domain corpus. The output of the proposed method is a list of scores for all
the citations, with the highest score given to those citations containing MeSH
descriptors typical for the genetic domain. Results: Validation was done on a
set of 734 manually annotated MEDLINE citations. It achieved predictive
accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method
by comparing it to three machine learning algorithms (support vector machines,
decision trees, na\"ive Bayes). Although the differences were not statistically
significantly different, results showed that our chi-square scoring performs as
good as compared machine learning algorithms. Conclusions: We suggest that the
chi-square scoring is an effective solution to help categorize MEDLINE
citations. The algorithm is implemented in the BITOLA literature-based
discovery support system as a preprocessor for gene symbol disambiguation
process.Comment: 34 pages, 2 figure
Algorithms Implemented for Cancer Gene Searching and Classifications
Understanding the gene expression is an important factor to cancer diagnosis. One target of this understanding is implementing cancer gene search and classification methods. However, cancer gene search and classification is a challenge in that there is no an obvious exact algorithm that can be implemented individually for various cancer cells. In this paper a research is con-ducted through the most common top ranked algorithms implemented for cancer gene search and classification, and how they are implemented to reach a better performance. The paper will distinguish algorithms implemented for Bio image analysis for cancer cells and algorithms implemented based on DNA array data. The main purpose of this paper is to explore a road map towards presenting the most current algorithms implemented for cancer gene search and classification
Intelligent Financial Fraud Detection Practices: An Investigation
Financial fraud is an issue with far reaching consequences in the finance
industry, government, corporate sectors, and for ordinary consumers. Increasing
dependence on new technologies such as cloud and mobile computing in recent
years has compounded the problem. Traditional methods of detection involve
extensive use of auditing, where a trained individual manually observes reports
or transactions in an attempt to discover fraudulent behaviour. This method is
not only time consuming, expensive and inaccurate, but in the age of big data
it is also impractical. Not surprisingly, financial institutions have turned to
automated processes using statistical and computational methods. This paper
presents a comprehensive investigation on financial fraud detection practices
using such data mining methods, with a particular focus on computational
intelligence-based techniques. Classification of the practices based on key
aspects such as detection algorithm used, fraud type investigated, and success
rate have been covered. Issues and challenges associated with the current
practices and potential future direction of research have also been identified.Comment: Proceedings of the 10th International Conference on Security and
Privacy in Communication Networks (SecureComm 2014
- …