Search CORE

7,307 research outputs found

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato

A knowledge engineering approach to the recognition of genomic coding regions

Author: กิตติศักดิ์ เกิดประสพ
นิตยา เกิดประสพ
Publication venue: สาขาวิชาวิศวกรรมคอมพิวเตอร์ สำนักวิชาวิศวกรรมศาสตร์ มหาวิทยาลัยเทคโนโลยีสุรนารี
Publication date: 01/01/2560
Field of study

ได้ทุนอุดหนุนการวิจัยจากมหาวิทยาลัยเทคโนโลยีสุรนารี ปีงบประมาณ พ.ศ.2556-255

Suranaree University of Technology Intellectual Repository

Machine learning and its applications in reliability analysis systems

Author: Hong Hui-ling
Publication venue
Publication date: 01/01/1994
Field of study

In this thesis, we are interested in exploring some aspects of Machine Learning (ML) and its application in the Reliability Analysis systems (RAs). We begin by investigating some ML paradigms and their- techniques, go on to discuss the possible applications of ML in improving RAs performance, and lastly give guidelines of the architecture of learning RAs. Our survey of ML covers both levels of Neural Network learning and Symbolic learning. In symbolic process learning, five types of learning and their applications are discussed: rote learning, learning from instruction, learning from analogy, learning from examples, and learning from observation and discovery. The Reliability Analysis systems (RAs) presented in this thesis are mainly designed for maintaining plant safety supported by two functions: risk analysis function, i.e., failure mode effect analysis (FMEA) ; and diagnosis function, i.e., real-time fault location (RTFL). Three approaches have been discussed in creating the RAs. According to the result of our survey, we suggest currently the best design of RAs is to embed model-based RAs, i.e., MORA (as software) in a neural network based computer system (as hardware). However, there are still some improvement which can be made through the applications of Machine Learning. By implanting the 'learning element', the MORA will become learning MORA (La MORA) system, a learning Reliability Analysis system with the power of automatic knowledge acquisition and inconsistency checking, and more. To conclude our thesis, we propose an architecture of La MORA

Durham e-Theses

Tree Induction of Spatial Choice Behavior

Author: Thill Jean-Claude
Wheeler Aaron
Publication venue
Publication date
Field of study

Research Papers in Economics

Recommended from our members

Competitively Evolving Decision Trees Against Fixed Training Cases for Natural Language Processing

Author: Siegel Eric V.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1994
Field of study

Competitive fitness functions can generate performance superior to absolute fitness functions [Angelineand Pollack 1993], [Hillis 1992]. This chapter describes a method by which competition can be implemented when training over a fixed (static) set of examples. Since new training cases cannot be generated by mutation or crossover, the probabilistic frequencies by which individual training cases are selected competitively adapt. We evolve decision trees for the problem of word sense disambiguation. The decision trees contain embedded bit strings; bit string crossover is intermingled with subtree-swapping. To approach the problem of overlearning, we have implemented a fitness penalty function specialized for decision trees which is dependent on the partition of the set of training cases implied by a decision tree

Columbia University Academic Commons

evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R

Author: Achim Zeileis
Karl-Peter Pfeiffer
Thomas Grubinger
Publication venue
Publication date
Field of study

Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the "evtree" package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the "partykit" (Hothorn and Zeileis 2011) package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. "evtree" is compared to "rpart" (Therneau and Atkinson 1997), the open-source CART implementation, and conditional inference trees ("ctree", Hothorn, Hornik, and Zeileis 2006). The usefulness of "evtree" is illustrated in a textbook customer classification task and a benchmark study of predictive accuracy in which "evtree" achieved at least similar and most of the time better results compared to the recursive algorithms "rpart" and "ctree".machine learning, classification trees, regression trees, evolutionary algorithms, R

Research Papers in Economics

The use of vicinal-risk minimization for training decision trees

Author: Cao Y.
Rockett P.I.
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

We propose the use of Vapnik's vicinal risk minimization (VRM) for training decision trees to approximately maximize decision margins. We implement VRM by propagating uncertainties in the input attributes into the labeling decisions. In this way, we perform a global regularization over the decision tree structure. During a training phase, a decision tree is constructed to minimize the total probability of misclassifying the labeled training examples, a process which approximately maximizes the margins of the resulting classifier. We perform the necessary minimization using an appropriate meta-heuristic (genetic programming) and present results over a range of synthetic and benchmark real datasets. We demonstrate the statistical superiority of VRM training over conventional empirical risk minimization (ERM) and the well-known C4.5 algorithm, for a range of synthetic and real datasets. We also conclude that there is no statistical difference between trees trained by ERM and using C4.5. Training with VRM is shown to be more stable and repeatable than by ERM

White Rose Research Online