Search CORE

11 research outputs found

Técnicas de minería de datos como alternativa a las técnicas estadísticas de discriminación y clasificación multivariadas clásicas

Author: Bolaño Vanina Celeste
Cavero Lorena Verónica
Dieser María Paula
Irribarra María de los Ángeles
Martín María Cristina
Schlaps Erica
Solaro Claudina
Titionik Diamela
Wagner Laura
Publication venue
Publication date: 06/05/2015
Field of study

En este trabajo se describe brevemente una de las líneas de investigación que se están llevando a cabo en el Departamento de Matemática de la Facultad de Ciencias Exactas y Naturales de la Universidad Nacional de La Pampa, en relación a Métodos Multivariados Discriminantes y de Clasificación, y su sensibilidad y fiabilidad en la aplicación a diferentes problemas reales o simulados. Si bien el estudio puede centrarse en ciertos métodos que podrían entenderse como clásicos y de una esencia más estadística, es indudable que, en los últimos años, se ha producido un gran crecimiento en las capacidades de generar y recolectar datos. En estos enormes volúmenes de datos, existe gran cantidad de información a la que sería difícil, cuando no imposible, acceder mediante los métodos clásicos. Técnicas propias de la Minería de Datos, posibilitan el análisis de estas masas de datos, en búsqueda de patrones y predicciones, que permitan generar información útil a partir de ellos. Se pretende, entonces, comparar las diferentes técnicas estadísticas clásicas con las propias de la Minería de Datos en las tareas de Discriminación y Clasificación, estableciendo similitudes y diferencias, y analizando las estimaciones que se obtienen con ellas al aplicarlas a problemas reales o simulados.Eje: Base de Datos y Minería de DatosRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Técnicas de minería de datos como alternativa a las técnicas estadísticas de discriminación y clasificación multivariadas clásicas

Author: Bolaño Vanina Celeste
Cavero Lorena Verónica
Dieser María Paula
Irribarra María de los Ángeles
Martín María Cristina
Schlaps Erica
Solaro Claudina
Titionik Diamela
Wagner Laura
Publication venue
Publication date: 06/05/2015
Field of study

Tight Combinatorial Generalization Bounds for Threshold Conjunction Rules

Author: J. Fürnkranz
J.K. Martin
J.R. Quinlan
J.R. Quinlan
K.V. Vorontsov
K.V. Vorontsov
W.W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. We propose a combinatorial technique for obtaining tight data dependent generalization bounds based on a splitting and connec-tivity graph (SC-graph) of the set of classifiers. We apply this approach to a parametric set of conjunctive rules and propose an algorithm for effective SC-bound computation. Experiments on 6 data sets from the UCI ML Repository show that SC-bound helps to learn more reliable rule-based classifiers as compositions of less overfitted rules

CiteSeerX

Crossref

Rough set methodology in meta-analysis - a comparative and exploratory analysis

Author: Rupp Thomas
Publication venue
Publication date
Field of study

We study the applicability of the pattern recognition methodology "rough set data analysis" (RSDA) in the field of meta analysis. We give a summary of the mathematical and statistical background and then proceed to an application of the theory to a meta analysis of empirical studies dealing with the deterrent effect introduced by Becker and Ehrlich. Results are compared with a previously devised meta regression analysis. We find that the RSDA can be used to discover information overlooked by other methods, to preprocess the data for further studying and to strengthen results previously found by other methods.Rough Data Set, RSDA, Meta Analysis, Data Mining, Pattern Recognition, Deterrence, Criminometrics

Research Papers in Economics

Combinatorial probability and the tightness of generalization bounds

Author: A. A. Ivakhnenko
D. A. Kochedykov
G. S. Lbov
J. K. Martin
J. Quinlan
K. V. Vorontsov
K. V. Vorontsov
L. N. Bol’shev
M. Marchand
R. L. Rivest
V. N. Vapnik
V. N. Vapnik
V. N. Vapnik
V. Vapnik
W. W. Cohen
Yu. K. Belyaev
Publication venue: 'Pleiades Publishing Ltd'
Publication date
Field of study

Crossref

Recommended from our members

An exact probability metric for decision tree splitting and stopping

Author: Martin J. Kent
Publication venue: eScholarship, University of California
Publication date: 18/05/1995
Field of study

ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several other alternatives have been proposed and are examined here (distance, orthogonality, a Beta function, and two chi-squared tests). Gain ratio and orthogonality are strongly correlated, and all of these metrics are biased towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the exact posterior probability of the null hypothesis. Both gain and the chi-squared tests are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules in CART and IDS are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Empirical results show that hypergeometric pre-pruning should be done, as trees pruned in this way are more practical, simpler, more efficient, and generally no less accurate than unpruned or post-pruned trees

eScholarship - University of California

Recommended from our members

An exact probability metric for decision tree splitting and stopping

Author: Martin J. Kent
Publication venue: eScholarship, University of California
Publication date: 18/05/1995
Field of study

eScholarship - University of California

Implementation of decision trees for embedded systems

Author: Bashar Badr (7203341)
Publication venue
Publication date: 01/01/2014
Field of study

This research work develops real-time incremental learning decision tree solutions suitable for real-time embedded systems by virtue of having both a defined memory requirement and an upper bound on the computation time per training vector. In addition, the work provides embedded systems with the capabilities of rapid processing and training of streamed data problems, and adopts electronic hardware solutions to improve the performance of the developed algorithm. Two novel decision tree approaches, namely the Multi-Dimensional Frequency Table (MDFT) and the Hashed Frequency Table Decision Tree (HFTDT) represent the core of this research work. Both methods successfully incorporate a frequency table technique to produce a complete decision tree. The MDFT and HFTDT learning methods were designed with the ability to generate application specific code for both training and classification purposes according to the requirements of the targeted application. The MDFT allows the memory architecture to be specified statically before learning takes place within a deterministic execution time. The HFTDT method is a development of the MDFT where a reduction in the memory requirements is achieved within a deterministic execution time. The HFTDT achieved low memory usage when compared to existing decision tree methods and hardware acceleration improved the performance by up to 10 times in terms of the execution time

Loughborough University Institutional Repository

Flood hazard hydrology: interdisciplinary geospatial preparedness and policy

Author: Petty Timothy R.
Publication venue
Publication date: 01/05/2017
Field of study

Thesis (Ph.D.) University of Alaska Fairbanks, 2017Floods rank as the deadliest and most frequently occurring natural hazard worldwide, and in 2013 floods in the United States ranked second only to wind storms in accounting for loss of life and damage to property. While flood disasters remain difficult to accurately predict, more precise forecasts and better understanding of the frequency, magnitude and timing of floods can help reduce the loss of life and costs associated with the impact of flood events. There is a common perception that 1) local-to-national-level decision makers do not have accurate, reliable and actionable data and knowledge they need in order to make informed flood-related decisions, and 2) because of science--policy disconnects, critical flood and scientific analyses and insights are failing to influence policymakers in national water resource and flood-related decisions that have significant local impact. This dissertation explores these perceived information gaps and disconnects, and seeks to answer the question of whether flood data can be accurately generated, transformed into useful actionable knowledge for local flood event decision makers, and then effectively communicated to influence policy. Utilizing an interdisciplinary mixed-methods research design approach, this thesis develops a methodological framework and interpretative lens for each of three distinct stages of flood-related information interaction: 1) data generation—using machine learning to estimate streamflow flood data for forecasting and response; 2) knowledge development and sharing—creating a geoanalytic visualization decision support system for flood events; and 3) knowledge actualization—using heuristic toolsets for translating scientific knowledge into policy action. Each stage is elaborated on in three distinct research papers, incorporated as chapters in this dissertation, that focus on developing practical data and methodologies that are useful to scientists, local flood event decision makers, and policymakers. Data and analytical results of this research indicate that, if certain conditions are met, it is possible to provide local decision makers and policy makers with the useful actionable knowledge they need to make timely and informed decisions

ScholarWorks@UA

Speech technologies for the audiovisual and multimedia interaction environments

Author: Alvarez Muniain Aitor
Publication venue
Publication date: 22/07/2016
Field of study

361 p

Archivo Digital para la Docencia y la Investigación