Search CORE

18 research outputs found

Towards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives

Author: Freitas Alex A.
Publication venue
Publication date: 01/07/1997
Field of study

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Regression on feature projections

Author: Guvenir H. A.
Uysal I.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Cataloged from PDF version of article.This paper describes a machine learning method, called Regression on Feature Projections (RFP), for predicting a real-valued target feature, given the values of multiple predictive features. In RFP training is based on simply storing the projections of the training instances on each feature separately. Prediction of the target value for a query point is obtained through two averaging procedures executed sequentially. The ®rst averaging process is to ®nd the individual predictions of features by using the K-Nearest Neighbor (KNN) algorithm. The second averaging process combines the predictions of all features. During the ®rst averaging step, each feature is associated with a weight in order to determine the prediction ability of the feature at the local query point. The weights, found for each local query point, are used in the second prediction step and enforce the method to have an adaptive or context-sensitive nature. We have compared RFP with KNN and the rule based-regression algorithms. Results on real data sets show that RFP achieves better or comparable accuracy and is faster than both KNN and Rule-based regression algorithms. (C)2000 Elsevier Science B.V. All rights reserved

Bilkent University Institutional Repository

An overview of regression techniques for knowledge discovery

Author: Güvenir H.A.
Uysal I.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1999
Field of study

Predicting or learning numeric features is called regression in the statistical literature, and it is the subject of research in both machine learning and statistics. This paper reviews the important techniques and algorithms for regression developed by both communities. Regression is important for many applications, since lots of real life problems can be modeled as regression problems. The review includes Locally Weighted Regression (LWR), rule-based regression, Projection Pursuit Regression (PPR), instance-based regression, Multivariate Adaptive Regression Splines (MARS) and recursive partitioning regression methods that induce regression trees (CART, RETIS and M5)

Bilkent University Institutional Repository

An overview of regression techniques for knowledge discovery

Author: H. ALTAY GÜVENIR
İLHAN UYSAL
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Ricostruzione Di Oggetti Tridimensionali Basata su Sistemi Fuzzy a Partire da Punti Acquisiti Mediante Scanner

Author: COLUCCIA LUISA
Publication venue: 'Pisa University Press'
Publication date: 10/12/2013
Field of study

In questo lavoro di tesi di laurea specialistica viene affrontato il problema della ricostruzione di oggetti tridimensionali utilizzando sistemi a regole Fuzzy a partire da nuvole di punti disorganizzati acquisite mediante scanner. Per poter trattare scansioni composte da milioni di punti è stato necessario introdurre il concetto di inferenza fuzzy con troncamento, che riduce considerevolmente sia l'occupazione di memoria che il tempo di calcolo

Electronic Thesis and Dissertation Archive - Università di Pisa

Document analysis by means of data mining techniques

Author: Jabeen Saima
Publication venue: Politecnico di Torino
Publication date
Field of study

The huge amount of textual data produced everyday by scientists, journalists and Web users, allows investigating many different aspects of information stored in the published documents. Data mining and information retrieval techniques are exploited to manage and extract information from huge amount of unstructured textual data. Text mining also known as text data mining is the processing of extracting high quality information (focusing relevance, novelty and interestingness) from text by identifying patterns etc. Text mining typically involves the process of structuring input text by means of parsing and other linguistic features or sometimes by removing extra data and then finding patterns from structured data. Patterns are then evaluated at last and interpretation of output is performed to accomplish the desired task. Recently, text mining has got attention in several fields such as in security (involves analysis of Internet news), for commercial (for search and indexing purposes) and in academic departments (such as answering query). Beyond searching the documents consisting the words given in a user query, text mining may provide direct answer to user by semantic web for content based (content meaning and its context). It can also act as intelligence analyst and can also be used in some email spam filters for filtering out unwanted material. Text mining usually includes tasks such as clustering, categorization, sentiment analysis, entity recognition, entity relation modeling and document summarization. In particular, summarization approaches are suitable for identifying relevant sentences that describe the main concepts presented in a document dataset. Furthermore, the knowledge existed in the most informative sentences can be employed to improve the understanding of user and/or community interests. Different approaches have been proposed to extract summaries from unstructured text documents. Some of them are based on the statistical analysis of linguistic features by means of supervised machine learning or data mining methods, such as Hidden Markov models, neural networks and Naive Bayes methods. An appealing research field is the extraction of summaries tailored to the major user interests. In this context, the problem of extracting useful information according to domain knowledge related to the user interests is a challenging task. The main topics have been to study and design of novel data representations and data mining algorithms useful for managing and extracting knowledge from unstructured documents. This thesis describes an effort to investigate the application of data mining approaches, firmly established in the subject of transactional data (e.g., frequent itemset mining), to textual documents. Frequent itemset mining is a widely exploratory technique to discover hidden correlations that frequently occur in the source data. Although its application to transactional data is well-established, the usage of frequent itemsets in textual document summarization has never been investigated so far. A work is carried on exploiting frequent itemsets for the purpose of multi-document summarization so a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer) is presented, that is based on an itemset-based model, i.e., a framework comprise of frequent itemsets, taken out from the document collection. Highly representative and not redundant sentences are selected for generating summary by considering both sentence coverage, with respect to a sentence relevance score, based on tf-idf statistics, and a concise and highly informative itemset-based model. To evaluate the ItemSum performance a suite of experiments on a collection of news articles has been performed. Obtained results show that ItemSum significantly outperforms mostly used previous summarizers in terms of precision, recall, and F-measure. We also validated our approach against a large number of approaches on the DUC’04 document collection. Performance comparisons, in terms of precision, recall, and F-measure, have been performed by means of the ROUGE toolkit. In most cases, ItemSum significantly outperforms the considered competitors. Furthermore, the impact of both the main algorithm parameters and the adopted model coverage strategy on the summarization performance are investigated as well. In some cases, the soundness and readability of the generated summaries are unsatisfactory, because the summaries do not cover in an effective way all the semantically relevant data facets. A step beyond towards the generation of more accurate summaries has been made by semantics-based summarizers. Such approaches combine the use of general-purpose summarization strategies with ad-hoc linguistic analysis. The key idea is to also consider the semantics behind the document content to overcome the limitations of general-purpose strategies in differentiating between sentences based on their actual meaning and context. Most of the previously proposed approaches perform the semantics-based analysis as a preprocessing step that precedes the main summarization process. Therefore, the generated summaries could not entirely reflect the actual meaning and context of the key document sentences. In contrast, we aim at tightly integrating the ontology-based document analysis into the summarization process in order to take the semantic meaning of the document content into account during the sentence evaluation and selection processes. With this in mind, we propose a new multi-document summarizer, namely Yago-based Summarizer, that integrates an established ontology-based entity recognition and disambiguation step. Named Entity Recognition from Yago ontology is being used for the task of text summarization. The Named Entity Recognition (NER) task is concerned with marking occurrences of a specific object being mentioned. These mentions are then classified into a set of predefined categories. Standard categories include “person”, “location”, “geo-political organization”, “facility”, “organization”, and “time”. The use of NER in text summarization improved the summarization process by increasing the rank of informative sentences. To demonstrate the effectiveness of the proposed approach, we compared its performance on the DUC’04 benchmark document collections with that of a large number of state-of-the-art summarizers. Furthermore, we also performed a qualitative evaluation of the soundness and readability of the generated summaries and a comparison with the results that were produced by the most effective summarizers. A parallel effort has been devoted to integrating semantics-based models and the knowledge acquired from social networks into a document summarization model named as SociONewSum. The effort addresses the sentence-based generic multi-document summarization problem, which can be formulated as follows: given a collection of news articles ranging over the same topic, the goal is to extract a concise yet informative summary, which consists of most salient document sentences. An established ontological model has been used to improve summarization performance by integrating a textual entity recognition and disambiguation step. Furthermore, the analysis of the user-generated content coming from Twitter has been exploited to discover current social trends and improve the appealing of the generated summaries. An experimental evaluation of the SociONewSum performance was conducted on real English-written news article collections and Twitter posts. The achieved results demonstrate the effectiveness of the proposed summarizer, in terms of different ROUGE scores, compared to state-of-the-art open source summarizers as well as to a baseline version of the SociONewSum summarizer that does not perform any UGC analysis. Furthermore, the readability of the generated summaries has also been analyzed

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

An Investigation into the Data Collection Process for the Development of Cost Models

Author: Delgado-Arvelo Ysolina
Publication venue: 'Editorial Department of Engineering Sciences'
Publication date: 01/01/2012
Field of study

This thesis is the result of many years of research in the field of manufacturing cost modelling. It particularly focuses on the Data Collection Process for the development of manufacturing cost models in the UK Aerospace Industry with no less important contributions from other areas such as construction, process and software development. The importance of adopting an effective model development process is discussed and a new CMD Methodology is proposed. In this respect, little research has considered the development of the cost model from the point of view of a standard and systematic Methodology, which is essential if an optimum process is to be achieved. A Model Scoping 3 Framework, a functional Data Source and Data Collection Library and a referential Data Type Library are the core elements of the proposed Cost Model Development Methodology. The research identified a number of individual data collection methods, along with a comprehensive list of data sources and data types, from which essential data for developing cost models could be collected. A Taxonomy based upon sets of generic characteristics for describing the individual data collection, data sources and data types was developed. The methods, tools and techniques were identified and categorised according to these generic characteristics. This provides information for selecting between alternative methods, tools and techniques. The need to perform frequent iterations of data collection, data identification, data analysis and decision making tasks until an acceptable cost model has been developed has become an inherent feature of the CMDP. It is expected that the proposed model scoping framework will assist cost engineering and estimating practitioners in: defining the features, activities of the process and the attributes of the product for which a cost model is required, and also in identifying the cost model characteristics before the tasks of data identification and collection start. It offers a structured way of looking at the relationship between data sources, cost model characteristics and data collection tools and procedures. The aim was to make the planning process for developing cost models more effective and efficient and consequently reduce the time to generate cost models

De Montfort University Open Research Archive

Explorations in the document vector model of information retrieval

Author: Paijmans J.J.
Publication venue: [n.n.]
Publication date: 01/01/1999
Field of study

Tilburg University Repository

La personalidad en el marco de una teoría del comportamiento humano

Author: Adarraga Pablo
Hernández López José Manuel
Márquez María Oliva
Santacreu José
Publication venue: Pirámide
Publication date: 01/01/2002
Field of study

Esta obra ofrece una alternativa a los estudios sobre la personalidad realizados desde la perspectiva de la psicología del rasgo, planteando un enfoque de la misma que enmarca la psicología de la personalidad en una teoría de la conducta. Su objetivo es la interpretación de los conocimientos que se poseen sobre la personalidad de los individuos basándose en los planteamientos de la psicología conductua

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo