1,238 research outputs found

    Application of decision trees and multivariate regression trees in design and optimization

    Get PDF
    Induction of decision trees and regression trees is a powerful technique not only for performing ordinary classification and regression analysis but also for discovering the often complex knowledge which describes the input-output behavior of a learning system in qualitative forms;In the area of classification (discrimination analysis), a new technique called IDea is presented for performing incremental learning with decision trees. It is demonstrated that IDea\u27s incremental learning can greatly reduce the spatial complexity of a given set of training examples. Furthermore, it is shown that this reduction in complexity can also be used as an effective tool for improving the learning efficiency of other types of inductive learners such as standard backpropagation neural networks;In the area of regression analysis, a new methodology for performing multiobjective optimization has been developed. Specifically, we demonstrate that muitiple-objective optimization through induction of multivariate regression trees is a powerful alternative to the conventional vector optimization techniques. Furthermore, in an attempt to investigate the effect of various types of splitting rules on the overall performance of the optimizing system, we present a tree partitioning algorithm which utilizes a number of techniques derived from diverse fields of statistics and fuzzy logic. These include: two multivariate statistical approaches based on dispersion matrices, an information-theoretic measure of covariance complexity which is typically used for obtaining multivariate linear models, two newly-formulated fuzzy splitting rules based on Pearson\u27s parametric and Kendall\u27s nonparametric measures of association, Bellman and Zadeh\u27s fuzzy decision-maximizing approach within an inductive framework, and finally, the multidimensional extension of a widely-used fuzzy entropy measure. The advantages of this new approach to optimization are highlighted by presenting three examples which respectively deal with design of a three-bar truss, a beam, and an electric discharge machining (EDM) process

    Clustering of protein structures

    Get PDF
    Proteins are very complex and important molecules that carry out a wide range of functions essential to life. The role of any given protein within a cell is heavily determined by its structure, which is recognized as a valuable resource of information when studying proteins. In applications such protein docking or phylogenetics, there is the need to compare such structures in order to obtain relevant information about the proteins that are being considered. As such, there is also a need to specify a some kind of measure that is able to indicate if two protein structures are similar or not. Currently, there are a few different measures that can be used for this task, however, due to the inherent complexity of protein structures it is very hard for a measure to take into account the numerous possible variations and perfectly quantify the dissimilarity among them. Considering this issue, with this work we aim to use clustering algorithms and exper- iment with different structure similarity measures, in an attempt to find effective ways of grouping protein structures with the goal of obtain useful information that can be used in the previously mentioned applications.As proteínas são moléculas de alta importância e complexidade que desempenham uma grande diversidade de funções essenciais à vida. O papel de uma dada proteína den- tro de uma célula é fortemente influenciado pela sua estrutura, que é reconhecida como um valioso recurso de informação no estudo de proteínas. Em aplicações como o docking ou a análise filogenética de proteínas, há uma necessidade de comparar tais estruturas de forma a obter informação relevante sobre as proteínas que estamos a considerar. Como tal, também há a necessidade de especificar uma medida que seja capaz de indicar se duas estruturas de proteínas são ou não semelhantes. Atualmente, há medidas diferentes que podem ser usadas para esta tarefa, no entanto, devido à inerente complexidade das estru- turas de proteínas é muito difícil para uma medida ter em conta as numerosas variações possíveis e quantificar perfeitamente as diferenças entre elas. Tendo estes problemas em consideração, neste trabalho vamos usar algoritmos de clustering e experimentar diferentes medidas de semelhança, numa tentativa de encontrar maneiras efetivas de agrupar estruturas de proteínas com o objectivo de obter informação útil, que possa ser usada nas aplicações mencionadas anteriormente

    Adaptive algorithms for history matching and uncertainty quantification

    Get PDF
    Numerical reservoir simulation models are the basis for many decisions in regard to predicting, optimising, and improving production performance of oil and gas reservoirs. History matching is required to calibrate models to the dynamic behaviour of the reservoir, due to the existence of uncertainty in model parameters. Finally a set of history matched models are used for reservoir performance prediction and economic and risk assessment of different development scenarios. Various algorithms are employed to search and sample parameter space in history matching and uncertainty quantification problems. The algorithm choice and implementation, as done through a number of control parameters, have a significant impact on effectiveness and efficiency of the algorithm and thus, the quality of results and the speed of the process. This thesis is concerned with investigation, development, and implementation of improved and adaptive algorithms for reservoir history matching and uncertainty quantification problems. A set of evolutionary algorithms are considered and applied to history matching. The shared characteristic of applied algorithms is adaptation by balancing exploration and exploitation of the search space, which can lead to improved convergence and diversity. This includes the use of estimation of distribution algorithms, which implicitly adapt their search mechanism to the characteristics of the problem. Hybridising them with genetic algorithms, multiobjective sorting algorithms, and real-coded, multi-model and multivariate Gaussian-based models can help these algorithms to adapt even more and improve their performance. Finally diversity measures are used to develop an explicit, adaptive algorithm and control the algorithm’s performance, based on the structure of the problem. Uncertainty quantification in a Bayesian framework can be carried out by resampling of the search space using Markov chain Monte-Carlo sampling algorithms. Common critiques of these are low efficiency and their need for control parameter tuning. A Metropolis-Hastings sampling algorithm with an adaptive multivariate Gaussian proposal distribution and a K-nearest neighbour approximation has been developed and applied

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Text Classification Aided by Clustering: a Literature Review

    Get PDF

    Advances in Evolutionary Algorithms

    Get PDF
    With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field

    An academic review: applications of data mining techniques in finance industry

    Get PDF
    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance
    corecore