16 research outputs found

    Some stylized facts of the Bitcoin market

    Get PDF
    In recent years a new type of tradable assets appeared, generically known as cryptocurrencies. Among them, the most widespread is Bitcoin. Given its novelty, this paper investigates some statistical properties of the Bitcoin market. This study compares Bitcoin and standard currencies dynamics and focuses on the analysis of returns at different time scales. We test the presence of long memory in return time series from 2011 to 2017, using transaction data from one Bitcoin platform. We compute the Hurst exponent by means of the Detrended Fluctuation Analysis method, using a sliding window in order to measure long range dependence. We detect that Hurst exponents changes significantly during the first years of existence of Bitcoin, tending to stabilize in recent times. Additionally, multiscale analysis shows a similar behavior of the Hurst exponent, implying a self-similar process.Fil: Fernández, Aurelio. Universitat Rovira I Virgili; España. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Basgall, María José. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Hasperué, Waldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Naiouf, Ricardo Marcelo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentin

    FDR2-BD: A fast data reduction recommendation tool for tabular big data classification problems

    Get PDF
    In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline.Fil: Basgall, María José. Universidad de Granada; España. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Naiouf, Ricardo Marcelo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Fernández, Alberto. Universidad de Granada; Españ

    Data stream treatment using sliding windows with MapReduce

    Get PDF
    Knowledge Discovery in Databases (KDD) techniques present limitations when the volume of data to process is very large. Any KDD algorithm needs to do several iterations on the complete set of data in order to carry out its work. For continuous data stream processing it is necessary to store part of it in a temporal window.In this paper, we present a technique that uses the size of the temporal window in a dynamic way, based on the frequency of the data arrival and the response time of the KDD task. The obtained results show that this technique reaches a great size window where each example of the stream is used in more than one iteration of the KDD task.Fil: Basgall, María José. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Hasperué, Waldo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Naiouf, Ricardo Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentin

    SMOTE-BD: Un método de sobremuestreo exacto y escalable para la clasificación no balanceada en big data

    Get PDF
    El volumen de datos en las aplicaciones de hoy en día ha significado un cambio en la forma de abordar los problemas de Machine Learning. De hecho, el escenario Big Data implica restricciones de escalabilidad que sólo se pueden lograr a través del diseño de modelos inteligentes y el uso de tecnologías distribuidas. En este contexto, las soluciones basadas en la plataforma Spark se han establecido como un estándar de facto. En esta contribución, nos centramos en un marco muy importante dentro de Big Data Analytics, a saber, la clasificación con conjuntos de datos desequilibrados. La principal característica de este problema es que una de las clases está sub-representada y, por lo tanto, generalmente es más complejo encontrar un modelo que la identifique correctamente. Por esta razón, es común aplicar técnicas de preprocesamiento como el sobremuestreo, para equilibrar la distribución de ejemplos en las clases. En este trabajo presentamos SMOTE-BD, un enfoque de preprocesamiento totalmente escalable para la clasificación no balanceada en Big Data. El mismo se basa en una de las soluciones de preprocesamiento más extendidas para la clasificación desequilibrada, a saber, el algoritmo SMOTE, el cual crea nuevas instancias sintéticas de acuerdo con la vecindad de cada ejemplo de la clase minoritaria. Nuestro novedoso desarrollo está hecho para ser independiente de la cantidad de particiones o procesos creados, para lograr un mayor grado de eficiencia. Los experimentos realizados en diferentes conjuntos de datos estándar y de Big Data muestran la calidad del diseño y la implementación propuestos.The volume of data in today´s applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes.In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.Fil: Basgall, María José. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Hasperué, Waldo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Naiouf, Ricardo Marcelo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Fernández, Alberto. Universidad de Granada; EspañaFil: Herrera, Francisco. Universidad de Granada; Españ

    Comparison of Communication/Synchronization Models in Parallel Programming on Multi-Core Cluster

    Get PDF
    Taking into account the increase in use of the multi-core cluster architecture, in this paper we analyze the use of the various communication models (message passing, shared memory, their combination) to efficiently exploit the power of the architecture. Smith-Waterman algorithm, whose parallelization is based on a pipeline scheme due to problem data dependence, is used as test case to determine the similarity degree of two DNA sequences. Finally, future research lines are mentioned, aimed at optimizing the use of memory levels in the architecture.Red de Universidades con Carreras en Informática (RedUNCI

    Comparison of Communication/Synchronization Models in Parallel Programming on Multi-Core Cluster

    Get PDF
    Taking into account the increase in use of the multi-core cluster architecture, in this paper we analyze the use of the various communication models (message passing, shared memory, their combination) to efficiently exploit the power of the architecture. Smith-Waterman algorithm, whose parallelization is based on a pipeline scheme due to problem data dependence, is used as test case to determine the similarity degree of two DNA sequences. Finally, future research lines are mentioned, aimed at optimizing the use of memory levels in the architecture.Red de Universidades con Carreras en Informática (RedUNCI

    Power Characterisation of Shared-Memory HPC Systems

    Get PDF
    Energy consumption has become one of the greatest challenges in the field of High Performance Computing (HPC). Besides its impact on the environment, energy is a limiting factor for the HPC. Keeping the power consumption of a system below a threshold is one of the great problems; and power prediction can help to solve it. The power characterisation can be used to know the power behaviour of the system under study, and to be a support to reach the power prediction. Furthermore, it could be used to design power-aware application programs. In this article we propose a methodology to characterise the power consumption of shared-memory HPC systems. Our proposed methodology involves the finding of influence factors on power consumed by the systems. It is similar to previous works, but we propose an in-deep approach that can help us to get a better power characterisation of the system. We apply our methodology to characterise an Intel server platform and the results show that we can find a more extended set of influence factors on power consumption.Red de Universidades con Carreras en Informática (RedUNCI

    Power Characterisation of Shared-Memory HPC Systems

    Get PDF
    Energy consumption has become one of the greatest challenges in the field of High Performance Computing (HPC). Besides its impact on the environment, energy is a limiting factor for the HPC. Keeping the power consumption of a system below a threshold is one of the great problems; and power prediction can help to solve it. The power characterisation can be used to know the power behaviour of the system under study, and to be a support to reach the power prediction. Furthermore, it could be used to design power-aware application programs. In this article we propose a methodology to characterise the power consumption of shared-memory HPC systems. Our proposed methodology involves the finding of influence factors on power consumed by the systems. It is similar to previous works, but we propose an in-deep approach that can help us to get a better power characterisation of the system. We apply our methodology to characterise an Intel server platform and the results show that we can find a more extended set of influence factors on power consumption.Red de Universidades con Carreras en Informática (RedUNCI

    Power Characterisation of Shared-Memory HPC Systems

    Get PDF
    Energy consumption has become one of the greatest challenges in the field of High Performance Computing (HPC). Besides its impact on the environment, energy is a limiting factor for the HPC. Keeping the power consumption of a system below a threshold is one of the great problems; and power prediction can help to solve it. The power characterisation can be used to know the power behaviour of the system under study, and to be a support to reach the power prediction. Furthermore, it could be used to design power-aware application programs. In this article we propose a methodology to characterise the power consumption of shared-memory HPC systems. Our proposed methodology involves the finding of influence factors on power consumed by the systems. It is similar to previous works, but we propose an in-deep approach that can help us to get a better power characterisation of the system. We apply our methodology to characterise an Intel server platform and the results show that we can find a more extended set of influence factors on power consumption.Red de Universidades con Carreras en Informática (RedUNCI

    4-(N2-1) Puzzle: Parallelization and performance on clusters

    No full text
    In this paper, an analysis of the 4-(N2-1) Puzzle, which is a generalization of the (N2-1) Puzzle, is presented. This problem is of interest due to its algorithmic and computational complexity and its applications to robot movements with several objectives. Taking the formal definition as a starting point, 4 heuristics that can be used to predict the best achievable objective and to estimate the number of steps required to reach a solution state from a given configuration are analyzed. By selecting the objective, a sequential and parallel solution over a cluster is presented for the (N2-1) Puzzle, based on the heuristic search algorithm A*. Also, variations of the classic heuristic are analyzed. The experimental work focuses on analyzing the possible superlinearity and the scalability of the parallel solution on clusters, by varying the physical configuration and the dimension of the problem. Finally, the suitability of the heuristic used to assess the best achievable objective in the 4-(N2-1) Puzzle is analyzed.Fil: Sanz, Victoria María. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: de Giusti, Armando Eduardo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Naiouf, Ricardo Marcelo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentin
    corecore