157,934 research outputs found
Analysis of traffic accident severity using Decision Rules via Decision Trees
[EN] A Decision Tree (DT) is a potential method for studying traffic accident severity. One of its main advantages is that Decision Rules can be extracted from its structure and used to identify safety problems and establish certain measures of performance. However, when it used only one DT, the rule extraction is limited to the structure of that DT and some important relationships between variables cannot be extracted. This paper presents a method for extracting rules from a DT more effectively. The method¿s effectiveness when applied to a particular traffic accidents dataset is shown. Specifically, our study focuses on traffic accident data from rural roads in Granada (Spain) from 2003 to 2009 (both included). The results show that we can obtain more than 70 relevant rules from our data using the new method, whereas with only one DT we would had extracted only 5 rules from the same dataset.Abellán, J.; López-Maldonado, G.; De Oña, J. (2013). Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Systems with Applications. 40(15):6047-6054. doi:10.1016/j.eswa.2013.05.027S60476054401
Expert System for Crop Disease based on Graph Pattern Matching: A proposal
Para la agroindustria, las enfermedades en cultivos constituyen uno de los problemas más frecuentes que generan grandes pérdidas económicas y baja calidad en la producción. Por otro lado, desde las ciencias de la computación, han surgido diferentes herramientas cuya finalidad es mejorar la prevención y el tratamiento de estas enfermedades. En este sentido, investigaciones recientes proponen el desarrollo de sistemas expertos para resolver este problema haciendo uso de técnicas de minería de datos e inteligencia artificial, como inferencia basada en reglas, árboles de decisión, redes bayesianas, entre otras. Además, los grafos pueden ser usados para el almacenamiento de los diferentes tipos de variables que se encuentran presentes en un ambiente de cultivos, permitiendo la aplicación de técnicas de minería de datos en grafos, como el emparejamiento de patrones en los mismos. En este artículo presentamos una visión general de las temáticas mencionadas y una propuesta de un sistema experto para enfermedades en cultivos, basado en emparejamiento de patrones en grafos.For agroindustry, crop diseases constitute one of the most common problems that generate large economic losses and low production quality. On the other hand, from computer science, several tools have emerged in order to improve the prevention and treatment of these diseases. In this sense, recent research proposes the development of expert systems to solve this problem, making use of data mining and artificial intelligence techniques like rule-based inference, decision trees, Bayesian network, among others. Furthermore, graphs can be used for storage of different types of variables that are present in an environment of crops, allowing the application of graph data mining techniques like graph pattern matching. Therefore, in this paper we present an overview of the above issues and a proposal of an expert system for crop disease based on graph pattern matching
Recommended from our members
Multi-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations
between sequence and structure as well as possible functional and evolutionary relationships.
Recent structural genomics initiatives and other high-throughput experiments have populated the
biological databases at a rapid pace. The amount of structural data has made traditional methods
such as manual inspection of the protein structure become impossible. Machine learning has been
widely applied to bioinformatics and has gained a lot of success in this research area. This work
proposes a novel ensemble machine learning method that improves the coverage of the classifiers
under the multi-class imbalanced sample sets by integrating knowledge induced from different base
classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have
compared our approach with PART and show that our method improves the sensitivity of the
classifier in protein fold classification. Furthermore, we have extended this method to learning over
multiple data types, preserving the independence of their corresponding data sources, and show
that our new approach performs at least as well as the traditional technique over a single joined
data source. These experimental results are encouraging, and can be applied to other bioinformatics
problems similarly characterised by multi-class imbalanced data sets held in multiple data
sources
Rule-based Machine Learning Methods for Functional Prediction
We describe a machine learning method for predicting the value of a
real-valued function, given the values of multiple input variables. The method
induces solutions from samples in the form of ordered disjunctive normal form
(DNF) decision rules. A central objective of the method and representation is
the induction of compact, easily interpretable solutions. This rule-based
decision model can be extended to search efficiently for similar cases prior to
approximating function values. Experimental results on real-world data
demonstrate that the new techniques are competitive with existing machine
learning and statistical methods and can sometimes yield superior regression
performance.Comment: See http://www.jair.org/ for any accompanying file
Recommended from our members
Mining learning preferences in web-based instruction: Holists vs. Serialists
Web-based instruction programs are used by learners with diverse knowledge, skills and needs. These differences determine their preferences for the design of Web-based instruction programs and ultimately influence learners' success in using them. Cognitive style has been found to significantly affect learners' preferences of web-based instruction programs. However, the majority of previous studies focus on Field Dependence/Independence. Pask's Holist/Serialist dimension has conceptual links with Field Dependence/Independence but it is left mostly unstudied. Therefore, this study focuses on identifying how this dimension of cognitive style affects learner preferences of Web-based instruction programs. A data mining approach is used to illustrate the difference in preferences between Holists and Serialists. The findings show that there are clear differences in regard to content presentation and navigation support. A set of design features were then produced to help designers incorporate cognitive styles into the development of Web-based instruction programs to ensure that they can accommodate learners' different preferences.This work is partially funded by National Science Council, Taiwan, ROC (NSC 98-2511-S-008-012- MY3; NSC 99-
2511-S-008 -003 -MY2; NSC 99-2631-S-008-001)
Recommended from our members
A survey of induction algorithms for machine learning
Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The “knowledge patterns”
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate “knowledge patterns” that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Using Decision Trees for Coreference Resolution
This paper describes RESOLVE, a system that uses decision trees to learn how
to classify coreferent phrases in the domain of business joint ventures. An
experiment is presented in which the performance of RESOLVE is compared to the
performance of a manually engineered set of rules for the same task. The
results show that decision trees achieve higher performance than the rules in
two of three evaluation metrics developed for the coreference task. In addition
to achieving better performance than the rules, RESOLVE provides a framework
that facilitates the exploration of the types of knowledge that are useful for
solving the coreference problem.Comment: 6 pages; LaTeX source; 1 uuencoded compressed EPS file (separate);
uses ijcai95.sty, named.bst, epsf.tex; to appear in Proc. IJCAI '9
- …