72,373 research outputs found

    Evolutionary Optimization Of Support Vector Machines

    Get PDF
    Support vector machines are a relatively new approach for creating classifiers that have become increasingly popular in the machine learning community. They present several advantages over other methods like neural networks in areas like training speed, convergence, complexity control of the classifier, as well as a stronger mathematical background based on optimization and statistical learning theory. This thesis deals with the problem of model selection with support vector machines, that is, the problem of finding the optimal parameters that will improve the performance of the algorithm. It is shown that genetic algorithms provide an effective way to find the optimal parameters for support vector machines. The proposed algorithm is compared with a backpropagation Neural Network in a dataset that represents individual models for electronic commerce

    PhysicsGP: A Genetic Programming Approach to Event Selection

    Full text link
    We present a novel multivariate classification technique based on Genetic Programming. The technique is distinct from Genetic Algorithms and offers several advantages compared to Neural Networks and Support Vector Machines. The technique optimizes a set of human-readable classifiers with respect to some user-defined performance measure. We calculate the Vapnik-Chervonenkis dimension of this class of learning machines and consider a practical example: the search for the Standard Model Higgs Boson at the LHC. The resulting classifier is very fast to evaluate, human-readable, and easily portable. The software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu

    Feature Selection of Post-Graduation Income of College Students in the United States

    Full text link
    This study investigated the most important attributes of the 6-year post-graduation income of college graduates who used financial aid during their time at college in the United States. The latest data released by the United States Department of Education was used. Specifically, 1,429 cohorts of graduates from three years (2001, 2003, and 2005) were included in the data analysis. Three attribute selection methods, including filter methods, forward selection, and Genetic Algorithm, were applied to the attribute selection from 30 relevant attributes. Five groups of machine learning algorithms were applied to the dataset for classification using the best selected attribute subsets. Based on our findings, we discuss the role of neighborhood professional degree attainment, parental income, SAT scores, and family college education in post-graduation incomes and the implications for social stratification.Comment: 14 pages, 6 tables, 3 figure

    Chi-square-based scoring function for categorization of MEDLINE citations

    Full text link
    Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

    Algorithms Implemented for Cancer Gene Searching and Classifications

    Get PDF
    Understanding the gene expression is an important factor to cancer diagnosis. One target of this understanding is implementing cancer gene search and classification methods. However, cancer gene search and classification is a challenge in that there is no an obvious exact algorithm that can be implemented individually for various cancer cells. In this paper a research is con-ducted through the most common top ranked algorithms implemented for cancer gene search and classification, and how they are implemented to reach a better performance. The paper will distinguish algorithms implemented for Bio image analysis for cancer cells and algorithms implemented based on DNA array data. The main purpose of this paper is to explore a road map towards presenting the most current algorithms implemented for cancer gene search and classification

    Intelligent Financial Fraud Detection Practices: An Investigation

    Full text link
    Financial fraud is an issue with far reaching consequences in the finance industry, government, corporate sectors, and for ordinary consumers. Increasing dependence on new technologies such as cloud and mobile computing in recent years has compounded the problem. Traditional methods of detection involve extensive use of auditing, where a trained individual manually observes reports or transactions in an attempt to discover fraudulent behaviour. This method is not only time consuming, expensive and inaccurate, but in the age of big data it is also impractical. Not surprisingly, financial institutions have turned to automated processes using statistical and computational methods. This paper presents a comprehensive investigation on financial fraud detection practices using such data mining methods, with a particular focus on computational intelligence-based techniques. Classification of the practices based on key aspects such as detection algorithm used, fraud type investigated, and success rate have been covered. Issues and challenges associated with the current practices and potential future direction of research have also been identified.Comment: Proceedings of the 10th International Conference on Security and Privacy in Communication Networks (SecureComm 2014
    • …
    corecore