3,620 research outputs found

    A generic optimising feature extraction method using multiobjective genetic programming

    Get PDF
    In this paper, we present a generic, optimising feature extraction method using multiobjective genetic programming. We re-examine the feature extraction problem and show that effective feature extraction can significantly enhance the performance of pattern recognition systems with simple classifiers. A framework is presented to evolve optimised feature extractors that transform an input pattern space into a decision space in which maximal class separability is obtained. We have applied this method to real world datasets from the UCI Machine Learning and StatLog databases to verify our approach and compare our proposed method with other reported results. We conclude that our algorithm is able to produce classifiers of superior (or equivalent) performance to the conventional classifiers examined, suggesting removal of the need to exhaustively evaluate a large family of conventional classifiers on any new problem. (C) 2010 Elsevier B.V. All rights reserved

    Machine Learning for Fluid Mechanics

    Full text link
    The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

    Operational Research in Education

    Get PDF
    Operational Research (OR) techniques have been applied, from the early stages of the discipline, to a wide variety of issues in education. At the government level, these include questions of what resources should be allocated to education as a whole and how these should be divided amongst the individual sectors of education and the institutions within the sectors. Another pertinent issue concerns the efficient operation of institutions, how to measure it, and whether resource allocation can be used to incentivise efficiency savings. Local governments, as well as being concerned with issues of resource allocation, may also need to make decisions regarding, for example, the creation and location of new institutions or closure of existing ones, as well as the day-to-day logistics of getting pupils to schools. Issues of concern for managers within schools and colleges include allocating the budgets, scheduling lessons and the assignment of students to courses. This survey provides an overview of the diverse problems faced by government, managers and consumers of education, and the OR techniques which have typically been applied in an effort to improve operations and provide solutions

    Application of support vector machines on the basis of the first Hungarian bankruptcy model

    Get PDF
    In our study we rely on a data mining procedure known as support vector machine (SVM) on the database of the first Hungarian bankruptcy model. The models constructed are then contrasted with the results of earlier bankruptcy models with the use of classification accuracy and the area under the ROC curve. In using the SVM technique, in addition to conventional kernel functions, we also examine the possibilities of applying the ANOVA kernel function and take a detailed look at data preparation tasks recommended in using the SVM method (handling of outliers). The results of the models assembled suggest that a significant improvement of classification accuracy can be achieved on the database of the first Hungarian bankruptcy model when using the SVM method as opposed to neural networks

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    GPPE: a method to generate ad-hoc feature extractors for prediction in financial domains

    Get PDF
    When dealing with classification and regression problems, there is a strong need for high-quality attributes. This is a capital issue not only in financial problems, but in many Data Mining domains. Constructive Induction methods help to overcome this problem by mapping the original representation into a new one, where prediction becomes easier. In this work we present GPPE: a GP-based method that projects data from an original data space into another one where data approaches linear behavior (linear separability or linear regression). Also, GPPE is able to reduce the dimensionality of the problem by recombining related attributes and discarding irrelevant ones. We have applied GPPE to two financial domains: Bankruptcy prediction and IPO Underpricing prediction. In both cases GPPE automatically generated a new data representation that obtained competitive prediction rates and drastically reduced the dimensionality of the problem.Publicad

    Feature-based time-series analysis

    Full text link
    This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Transfer Learning with Label Adaptation for Counterparty Rating Prediction

    Get PDF
    Credit rating is one of the core tools for risk management within financial firms. Ratings are usually provided by specialized agencies which perform an overall study and diagnosis on a given firm’s financial health. Dealing with unrated entities is a common problem, as several risk models rely on the ratings’ completeness, and agencies can not realistically rate every existing company. To solve this, credit rating prediction has been widely studied in academia. However, research in this topic tends to separate models amongst the different rating agencies due to the difference in both rating scales and composition. This work uses transfer learning, via label adaptation, to increase the number of samples for feature selection, and appends these adapted labels as an additional feature to improve the predictive power and stability of previously proposed methods. Accuracy on exact label prediction was improved from 0.30, in traditional models, up to 0.33 in the transfer learning setting. Furthermore, when measuring accuracy with a tolerance of 3 grade notches, accuracy increased almost 0.10, from 0.87 to 0.96. Overall, transfer learning displayed better out-of-sample generalization
    corecore