Search CORE

2,377 research outputs found

PRESISTANT: Learning based assistant for data pre-processing

Author: Abelló Alberto
Aluja-Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue
Publication date: 02/03/2018
Field of study

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Knowledge discovery through creating formal contexts

Author: Andrews Simon
Orphanides Constantinos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/01/2011
Field of study

Knowledge discovery is important for systems that have computational intelligence in helping them learn and adapt to changing environments. By representing, in a formal way, the context in which an intelligent system operates, it is possible to discover knowledge through an emerging data technology called Formal Concept Analysis (FCA). This paper describes a tool called FcaBedrock that converts data into Formal Contexts for FCA. The paper describes how, through a process of guided automation, data preparation techniques such as attribute exclusion and value restriction allow data to be interpreted to meet the requirements of the analysis. Creating Formal Contexts using FcaBedrock is shown to be straightforward and versatile. Large data sets are easily converted into a standard FCA format

Crossref

Sheffield Hallam University Research Archive

Analysing similarity assessment in feature-vector case representations

Author: Comas Joaquim
Cortés García Claudio Ulises
Núñez Héctor
Poch Manel
Rodríguez Roda Ignasi
Sànchez-Marrè Miquel
Publication venue
Publication date: 01/01/2003
Field of study

Case-Based Reasoning (CBR) is a good technique to solve new problems based in previous experience. Main assumption in CBR relies in the hypothesis that similar problems should have similar solutions. CBR systems retrieve the most similar cases or experiences among those stored in the Case Base. Then, previous solutions given to these most similar past-solved cases can be adapted to fit new solutions for new cases or problems in a particular domain, instead of derive them from scratch. Thus, similarity measures are key elements in obtaining reliable similar cases, which will be used to derive solutions for new cases. This paper describes a comparative analysis of several commonly used similarity measures, including a measure previously developed by the authors, and a study on its performance in the CBR retrieval step for feature-vector case representations. The testing has been done using six-teen data sets from the UCI Machine Learning Database Repository, plus two complex environmental databases.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Conversion of Artificial Recurrent Neural Networks to Spiking Neural Networks for Low-power Neuromorphic Hardware

Author: Cassidy Andrew
Diehl Peter U
IEEE
Neftci Emre
Pedroni Bruno U
Zarrella Guido
Publication venue
Publication date: 01/01/2016
Field of study

In recent years the field of neuromorphic low-power systems that consume orders of magnitude less power gained significant momentum. However, their wider use is still hindered by the lack of algorithms that can harness the strengths of such architectures. While neuromorphic adaptations of representation learning algorithms are now emerging, efficient processing of temporal sequences or variable length-inputs remain difficult. Recurrent neural networks (RNN) are widely used in machine learning to solve a variety of sequence learning tasks. In this work we present a train-and-constrain methodology that enables the mapping of machine learned (Elman) RNNs on a substrate of spiking neurons, while being compatible with the capabilities of current and near-future neuromorphic systems. This "train-and-constrain" method consists of first training RNNs using backpropagation through time, then discretizing the weights and finally converting them to spiking RNNs by matching the responses of artificial neurons with those of the spiking neurons. We demonstrate our approach by mapping a natural language processing task (question classification), where we demonstrate the entire mapping process of the recurrent layer of the network on IBM's Neurosynaptic System "TrueNorth", a spike-based digital neuromorphic hardware architecture. TrueNorth imposes specific constraints on connectivity, neural and synaptic parameters. To satisfy these constraints, it was necessary to discretize the synaptic weights and neural activities to 16 levels, and to limit fan-in to 64 inputs. We find that short synaptic delays are sufficient to implement the dynamical (temporal) aspect of the RNN in the question classification task. The hardware-constrained model achieved 74% accuracy in question classification while using less than 0.025% of the cores on one TrueNorth chip, resulting in an estimated power consumption of ~17 uW

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

ZORA

Rough sets for predicting the Kuala Lumpur Stock Exchange Composite Index returns

Author: Ahmad Norbahiah
Alwi Razana
Kok Yit-Pong
Sallehuddin Roselina
Shamsuddin Siti Mariyam
Publication venue
Publication date: 14/02/2004
Field of study

This study aims to prove the usability of Rough Set approach in capturing the relationship between the technical indicators and the level of Kuala Lumpur Stock Exchange Composite Index (KLCI) over time.Stock markets are affected by many interrelated economic, political, and even psychological factors.Therefore, it is generally very difficult to predict its movements. There are extensive literatures available describing attempts to use artificial intelligence techniques; in particular neural networks and genetic algorithm for analyzing stock market variations.However, drawbacks are found where neural networks have great complexity in interpreting the results; genetic algorithms create large data redundancies.A relatively new approach, the rough sets are suggested for its simple knowledge representation, ability to deal with uncertainties and lowering data redundancies.In this study, a few different discretization algorithms were used at data preprocessing. From the simulations and result produced, the rough sets approach can be a promising alternative to the existing methods for stock market prediction

UUM Repository