Search CORE

146,708 research outputs found

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

Binarized support vector machines

Author: Belén Martín-Barragán
Dolores Romero Morales
Emilio Carrizosa
Publication venue
Publication date
Field of study

The widely used Support Vector Machine (SVM) method has shown to yield very good results in Supervised Classification problems. Other methods such as Classification Trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in Data Mining. In this work, we propose an SVM-based method that automatically detects the most important predictor variables, and the role they play in the classifier. In particular, the proposed method is able to detect those values and intervals which are critical for the classification. The method involves the optimization of a Linear Programming problem, with a large number of decision variables. The numerical experience reported shows that a rather direct use of the standard Column-Generation strategy leads to a classification method which, in terms of classification ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of scale or measurement units of the predictor variables. When the complexity of the classifier is an important issue, a wrapper feature selection method is applied, yielding simpler, still competitive, classifiers.Supervised classification, Binarization, Column generation, Support vector machines

Research Papers in Economics

Ontology population for open-source intelligence: A GATE-based solution

Author: Ganino Giulio
Lembo Domenico
Mecella Massimo
Scafoglieri Federico
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results

Archivio della ricerca- Università di Roma La Sapienza

Binarized support vector machines

Author: Carrizosa Emilio
Martin-Barragan Belen
Romero Morales Dolores
Publication venue
Publication date: 01/11/2007
Field of study

Universidad Carlos III de Madrid e-Archivo

Knowledge-based gene expression classification via matrix factorization

Author: A. M. Tomé
Affymetrix
Allison
Baldi
Barnhill
Bolstad
Breiman
Cardoso
Cardoso
Chen
D. Lutter
Diaz-Uriarte
Diaz-Uriarte
Dougherty
Dougherty
Dudoit
E. W. Lang
F. J. Theis
G. Schmitz
Galton
Galton
Golub
Guyon
Hochreiter
Irrizarry
Lee
Li
Liebermeister
Liu
Lutter
M. Stetter
Mangasarian
P. Gómez Vilda
P. Knollmüller
Pearson
Quackenbush
R. Schachtner
Saidi
Schachtner
Schachtner
Schölkopf
Simon
Spang
Talloen
Troyanskaya
Tusher
Wu
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks. Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas

Crossref

University of Regensburg Publication Server

Repositório Institucional da Universidade de Aveiro

PubMed Central

PuSH

Are there effects of mother's post-16 education on the next generation? Effects on children's development and mothers' parenting [Wider Benefits of Learning Research Report No. 19]

Author: Duckworth Kathryn
Feinstein Leon
Publication venue: Centre for Research on the Wider Benefits of Learning, Institute of Education, University of London
Publication date: 01/06/2006
Field of study

There is an extensive body of research which shows that the children of parents with longer participation in education do better in standard tests of school attainment than those whose parents have had less education. One of the mechanisms put forward for explaining the intergenerational transmission of educational success is parenting. This report adds to a growing body of research from the Centre for Research on the Wider Benefits of Learning on the inter-generational transmission of educational success and issues of parenting skills, behaviours and attitudes. The report seeks to establish whether the strong correlation between mothers' participation in education and both her child's development and her parenting results from a primarily causal relationship, or from selection effects. Using longitudinal data spanning three generations, we find that while mothers' participation in post-compulsory education has some small positive causal effects, much of the apparent relationship between a mother's post-16 educational participation and measures of her children's cognitive ability and her parenting skills is driven by the selection bias – it is largely other factors, such as her aspirations, motivation and prior achievement, which determine her child's attainment and affect her decision to stay on in education. Much of the developmental literature tends towards a causal interpretation of the relationship between parents' education and the development and ability of their children. However, the results of this report suggest that such assumptions should be made with considerable caution. Our findings suggest that simply extending the length of time that women spend in education may do little to directly affect the educational attainment of their children. Rather, it is the ability and aspirations of women which inform their participation in post-16 education, their parenting ability and the attainment of their children. It may be through inter-generational continuities in factors such as these that inequalities in educational success are transmitted through the generations. This suggests that supporting children in learning through early and continued investment in quality education and developmental opportunities is more important in addressing social immobility than simply extending the average length of participation, important though that may be

UCL Discovery

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels

Author: Alcala
Bakshi
Bezdek
Chan
Chen
Chen
Chen
Cococcioni
Cordon
De Castro
Delgado
Dieter
Dorigo
Eberhart
Gacto
Glover
Goldberg
Gomez-Skarmeta
Ishibuchi
Ishibuchi
Jain
Jang
Jin
Jin
Johansen
Kennedy
Kwong
Mahdi Mahfouf
Mamdani
Pickering
Qian Zhang
Rojas
Setnes
Setnes
Sugeno
Takagi
Wang
Wang
Wang
Wang
Yen
Yen
Yoshinari
Zadeh
Zadeh
Zadeh
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/03/2011
Field of study

In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ‘real’ industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions

Crossref

Kent Academic Repository