Search CORE

1,213 research outputs found

One-to-many data transformations through data mappers

Author: Abiteboul
Aho
Antónia Lopes
Atzeni
Chaudhuri
Codd
Davidson
Fagin
Galhardas
Graefe
Helena Galhardas
Hellerstein
Jaeschke
João Pereira
Klug
Miller
Miller
Paredaens
Paulo Carreira
Shu
Shu
Silberschatz
Suciu
Thomas
van den Bercken
Publication venue: 'Elsevier BV'
Publication date
Field of study

Feature selection in high-dimensional dataset using MapReduce

Author: Bontempi Gianluca
Borgne Yann-Aël Le
Reggiani Claudio
Publication venue
Publication date: 07/09/2017
Field of study

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features

arXiv.org e-Print Archive

DI-fusion

Spatial and verbal routes to number comparison in young children

Author: Lucangeli Daniela
Sella Francesco
Zorzi Marco
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

The ability to compare the numerical magnitude of symbolic numbers represents a milestone in the development of numerical skills. However, it remains unclear how basic numerical abilities contribute to the understanding of symbolic magnitude and whether the impact of these abilities may vary when symbolic numbers are presented as number words (e.g., \u201csix vs. eight\u201d) vs. Arabic numbers (e.g., 6 vs. 8). In the present study on preschool children, we show that comparison of number words is related to cardinality knowledge whereas the comparison of Arabic digits is related to both cardinality knowledge and the ability to spatially map numbers. We conclude that comparison of symbolic numbers in preschool children relies on multiple numerical skills and representations, which can be differentially weighted depending on the presentation format. In particular, the spatial arrangement of digits on the number line seems to scaffold the development of a \u201cspatial route\u201d to understanding the exact magnitude of numerals

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

Circuit Transformations for Quantum Architectures

Author: Childs Andrew M.
Schoute Eddie
Unsal Cem M.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019)
Publication date: 01/01/2019
Field of study

Quantum computer architectures impose restrictions on qubit interactions. We propose efficient circuit transformations that modify a given quantum circuit to fit an architecture, allowing for any initial and final mapping of circuit qubits to architecture qubits. To achieve this, we first consider the qubit movement subproblem and use the ROUTING VIA MATCHINGS framework to prove tighter bounds on parallel routing. In practice, we only need to perform partial permutations, so we generalize ROUTING VIA MATCHINGS to that setting. We give new routing procedures for common architecture graphs and for the generalized hierarchical product of graphs, which produces subgraphs of the Cartesian product. Secondly, for serial routing, we consider the TOKEN SWAPPING framework and extend a 4-approximation algorithm for general graphs to support partial permutations. We apply these routing procedures to give several circuit transformations, using various heuristic qubit placement subroutines. We implement these transformations in software and compare their performance for large quantum circuits on grid and modular architectures, identifying strategies that work well in practice

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server