3,500 research outputs found
Bio-inspired computation for big data fusion, storage, processing, learning and visualization: state of the art and future directions
This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated
Research Group MATHMODE (IT1294-19), EMAITEK and ELK ARTEK programs. D. Camacho also acknowledges support from the Spanish Ministry of Science and Education under PID2020-117263GB-100 grant (FightDIS), the Comunidad Autonoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and the CHIST ERA 2017 BDSI PACMEL Project (PCI2019-103623, Spain)
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Learning from accidents : machine learning for safety at railway stations
In railway systems, station safety is a critical aspect of the overall structure, and yet, accidents at stations still occur. It is time to learn from these errors and improve conventional methods by utilizing the latest technology, such as machine learning (ML), to analyse accidents and enhance safety systems. ML has been employed in many fields, including engineering systems, and it interacts with us throughout our daily lives. Thus, we must consider the available technology in general and ML in particular in the context of safety
in the railway industry. This paper explores the employment of the decision tree (DT) method in safety classification and the analysis of accidents at railway stations to predict the traits of passengers affected by accidents. The critical contribution of this study is the presentation of ML and an explanation of how this technique is applied for ensuring safety, utilizing automated processes, and gaining benefits from this powerful technology. To apply and explore this method, a case study has been selected that focuses on the fatalities caused by accidents at railway stations. An analysis of some of these fatal accidents as reported by the Rail Safety and Standards Board (RSSB) is performed and presented in this paper to provide a broader summary of the application of supervised ML for improving safety at railway stations. Finally, this research shows the vast potential of the innovative application of ML in safety analysis for the railway industry
Next-generation big data analytics: state of the art, challenges, and future research topics
The term big data occurs more frequently now than ever before. A large number of fields and subjects, ranging from everyday life to traditional research fields (i.e., geography and transportation, biology and chemistry, medicine and rehabilitation), involve big data problems. The popularizing of various types of network has diversified types, issues, and solutions for big data more than ever before. In this paper, we review recent research in data types, storage models, privacy, data security, analysis methods, and applications related to network big data. Finally, we summarize the challenges and development of big data to predict current and future trends.This work was supported in part by the “Open3D: Collaborative Editing for 3D Virtual Worlds” [EPSRC (EP/M013685/1)], in part by the “Distributed Java Infrastructure for Real-Time Big-Data” (CAS14/00118), in part by eMadrid (S2013/ICE-2715), in
part by the HERMES-SMARTDRIVER (TIN2013-46801-C4-2-R), and in
part by the AUDACity (TIN2016-77158-C4-1-R). Paper no. TII-16-1
Nearest Neighbors-Based Forecasting for Electricity Demand Time Series in Streaming
This paper presents a new forecasting algorithm for time series in streaming
named StreamWNN. The methodology has two well-differentiated stages: the algorithm
searches for the nearest neighbors to generate an initial prediction model in the batch
phase. Then, an online phase is carried out when the time series arrives in streaming. In
par-ticular, the nearest neighbor of the streaming data from the training set is computed
and the nearest neighbors, previously computed in the batch phase, of this nearest
neighbor are used to obtain the predictions. Results using the electricity consumption
time series are reported, show-ing a remarkable performance of the proposed algorithm
in terms of fore-casting errors when compared to a nearest neighbors-based benchmark
algorithm. The running times for the predictions are also remarkableMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C
New Spark solutions for distributed frequent itemset and association rule mining algorithms
Funding for open access publishing: Universidad de Gran-
ada/CBUA. The research reported in this paper was partially sup-
ported by the BIGDATAMED project, which has received funding
from the Andalusian Government (Junta de Andalucı ́a) under grant
agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and
Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e
Innovacio ́n and, by ERDF A way of making Europe and by the
European Union NextGenerationEU. In addition, this work has been
partially supported by the Ministry of Universities through the EU-
funded Margarita Salas programme NextGenerationEU. Funding for
open access charge: Universidad de Granada/CBUAThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with
massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information
in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous
phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major
problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for
frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive
computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the
existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform
which has been demonstrated to outperform existing distributive algorithmic implementations.Universidad de Granada/CBUAJunta de Andalucia
P18-RT-1765Ministry of Science and Innovation, Spain (MICINN)
Instituto de Salud Carlos III
Spanish Government
PID2021-123960OB-I00,
TED2021-129402B-C21ERDF A way of making EuropeEuropean Union NextGenerationEUMinistry of Universities through the E
- …