Search CORE

3,432 research outputs found

Scalable aggregation predictive analytics: a query-driven machine learning approach

Author: Anagnostopoulos Christos
Savva Fotis
Triantafillou Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/12/2017
Field of study

We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

Crossref

Warwick Research Archives Portal Repository

Enlighten

Strategies and algorithms for clustering large datasets: a review

Author: Béjar Alonso Javier
Publication venue
Publication date: 01/01/2013
Field of study

The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks in these kind of projects. More frequently these projects come from many different application areas like biology, text analysis, signal analysis, etc that involve larger and larger datasets in the number of examples and the number of attributes. Classical methods for clustering data like K-means or hierarchical clustering are beginning to reach its maximum capability to cope with this increase of dataset size. The limitation for these algorithms come either from the need of storing all the data in memory or because of their computational time complexity. These problems have opened an area for the search of algorithms able to reduce this data overload. Some solutions come from the side of data preprocessing by transforming the data to a lower dimensionality manifold that represents the structure of the data or by summarizing the dataset by obtaining a smaller subset of examples that represent an equivalent information. A different perspective is to modify the classical clustering algorithms or to derive other ones able to cluster larger datasets. This perspective relies on many different strategies. Techniques such as sampling, on-line processing, summarization, data distribution and efficient datastructures have being applied to the problem of scaling clustering algorithms. This paper presents a review of different strategies and clustering algorithms that apply these techniques. The aim is to cover the different range of methodologies applied for clustering data and how they can be scaled.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Platonic model of mind as an approximation to neurodynamics

Author: Duch Wlodzislaw
Publication venue: Springer, Singapore
Publication date: 01/01/1997
Field of study

Hierarchy of approximations involved in simplification of microscopic theories, from sub-cellural to the whole brain level, is presented. A new approximation to neural dynamics is described, leading to a Platonic-like model of mind based on psychological spaces. Objects and events in these spaces correspond to quasi-stable states of brain dynamics and may be interpreted from psychological point of view. Platonic model bridges the gap between neurosciences and psychological sciences. Static and dynamic versions of this model are outlined and Feature Space Mapping, a neurofuzzy realization of the static version of Platonic model, described. Categorization experiments with human subjects are analyzed from the neurodynamical and Platonic model points of view

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Interactive image search using similarity-based visualization

Author: Nguyen G.P.
Publication venue: 'Betasciencepress Publishing'
Publication date: 01/01/2006
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Renewing the respect for similarity

Author: Reza Shahbazi
Shimon Edelman
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2012
Field of study

In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemming from its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problem at hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, by surveying established results and new developments in the theory and methods of similarity-preserving associative lookup and dimensionality reduction—critical components of many cognitive functions, as well as of intelligent data management in computer vision. We focus in particular on the growing family of algorithms that support associative memory by performing hashing that respects local similarity, and on the uses of similarity in representing structured objects and scenes. Insofar as these similarity-based ideas and methods are useful in cognitive modeling and in AI applications, they should be included in the core conceptual toolkit of computational neuroscience. In support of this stance, the present paper (1) offers a discussion of conceptual, mathematical, computational, and empirical aspects of similarity, as applied to the problems of visual object and scene representation, recognition, and interpretation, (2) mentions some key computational problems arising in attempts to put similarity to use, along with their possible solutions, (3) briefly states a previously developed similarity-based framework for visual object representation, the Chorus of Prototypes, along with the empirical support it enjoys, (4) presents new mathematical insights into the effectiveness of this framework, derived from its relationship to locality-sensitive hashing (LSH) and to concomitant statistics, (5) introduces a new model, the Chorus of Relational Descriptors (ChoRD), that extends this framework to scene representation and interpretation, (6) describes its implementation and testing, and finally (7) suggests possible directions in which the present research program can be extended in the future

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Non-Convex and Geometric Methods for Tomography and Label Learning

Author: Zisler Matthias
Publication venue
Publication date: 01/01/2020
Field of study

Data labeling is a fundamental problem of mathematical data analysis in which each data point is assigned exactly one single label (prototype) from a finite predefined set. In this thesis we study two challenging extensions, where either the input data cannot be observed directly or prototypes are not available beforehand. The main application of the first setting is discrete tomography. We propose several non-convex variational as well as smooth geometric approaches to joint image label assignment and reconstruction from indirect measurements with known prototypes. In particular, we consider spatial regularization of assignments, based on the KL-divergence, which takes into account the smooth geometry of discrete probability distributions endowed with the Fisher-Rao (information) metric, i.e. the assignment manifold. Finally, the geometric point of view leads to a smooth flow evolving on a Riemannian submanifold including the tomographic projection constraints directly into the geometry of assignments. Furthermore we investigate corresponding implicit numerical schemes which amount to solving a sequence of convex problems. Likewise, for the second setting, when the prototypes are absent, we introduce and study a smooth dynamical system for unsupervised data labeling which evolves by geometric integration on the assignment manifold. Rigorously abstracting from ``data-label'' to ``data-data'' decisions leads to interpretable low-rank data representations, which themselves are parameterized by label assignments. The resulting self-assignment flow simultaneously performs learning of latent prototypes in the very same framework while they are used for inference. Moreover, a single parameter, the scale of regularization in terms of spatial context, drives the entire process. By smooth geodesic interpolation between different normalizations of self-assignment matrices on the positive definite matrix manifold, a one-parameter family of self-assignment flows is defined. Accordingly, the proposed approach can be characterized from different viewpoints such as discrete optimal transport, normalized spectral cuts and combinatorial optimization by completely positive factorizations, each with additional built-in spatial regularization

Heidelberger Dokumentenserver

Similarity-based methods: a general framework for classification, approximation and association

Author: Duch Włodzisław
Publication venue: 'Institute of Systematics and Evolution of Animals, Polish Academy of Sciences'
Publication date: 01/01/2000
Field of study

Similarity-based methods (SBM) are a generalization of the minimal distance (MD) methods which form a basis of several machine learning and pattern recognition methods. Investigation of similarity leads to a fruitful framework in which many classification, approximation and association methods are accommodated. Probability p(C|X;M) of assigning class C to a vector X, given a classification modelM, depends on adaptive parameters and procedures used in construction of the model. Systematic overview of choices available for model building is described and numerous improvements suggested. Similarity-Based Methods have natural neural-network type realizations. Such neural network models as the Radial Basis Functions (RBF) and the Multilayer Perceptrons (MLPs) are included in this framework as special cases. SBM may also include several different submodels and a procedure to combine their results. Many new versions of similarity-based methods are derived from this framework. A search in the space of all methods belonging to the SBM framework finds a particular combination of parameterizations and procedures that is most appropriate for a given data. No single classification method can beat this approach. Preliminary implementation of SBM elements tested on a realworld datasets gave very good results

CiteSeerX

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repository of Nicolaus Copernicus University

Towards Comprehensive Foundations of Computational Intelligence

Author: A Cichocki
A Gifi
A Gutkin
A Hyvärinen
A Konar
A Newell
A Pouget
A Pouget
A Roy
AM Callataÿ de
B Bakker
B Kégl
B Schölkopf
C Giraud-Carrier
C Jones
C Wendelken
CD Manning
CS Ong
D Michie
D Nauck
D Rousseau
D Wolpert
DL Wang
E Bauer
E Pekalska
E Salinas
E Simoncelli
EM Iyoda
F Corbacho
F Crestani
F Schwenker
FR Bach
G Giacinto
G-B Huang
GA Carpenter
GE Hinton
GRG Lanckriet
GS Cree
H Haas
H Leung
H Lodhi
I Guyon
J-P Vert
JA Anderson
JA Anderson
JG Wolff
JH Friedman
JSR Jang
K Grabczewski
K Torkkola
K Tsuda
KP Unnikrishnan
KS Fu
L Goldfarb
L Goldfarb
L Györfi
L Shastri
LI Kuncheva
M Blachnik
M Grochowski
M Kordos
M Leshno
MJ Kearns
MJD Powell
N Chater
N Jankowski
N Kunstman
NI Achieser
O Chapelle
P Dayan
P Matykiewicz
P Smyth
PH Winston
PM Baggenstoss
R Avnimelech
R Hecht-Nielsen
R Raizada
RE Schapire
RF Thompson
RL Gorsuch
RO Duda
RS Sutton
S Anuj
S Deneve
S Grossberg
S Haykin
S Mitra
S Roweis
SF Walker
SJ Russell
SK Pal
T Bilgiç
T Kohonen
T Poggio
T Wieczorek
TG Dietterich
TJ McCabe
TM Cover
V Kecman
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Duch
W Maass
W Shoujue
Y Bengio
Y Bengio
Y Burnod
YH Pao
YJ Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented.

CiteSeerX

Crossref

Fuzzy rough and evolutionary approaches to instance selection

Author: Verbiest Nele
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography