1,043,069 research outputs found

    Is intelligence the answer to deal with the 5 V’s of telemetry data?

    Get PDF
    Telemetry data and big data share volume, velocity, variety, veracity and value characteristics. We propose a distributed telemetry architecture and show how intelligence can help dealing with the 5 V’s of optical networks telemetry data.The research leading to these results has received funding from the HORIZON SEASON (G.A. 101096120) and the MICINN IBON (PID2020-114135RB-I00) projects and from the ICREA Institution.Peer ReviewedPostprint (author's final draft

    PIRE ExoGENI – ENVRI: Preparation for Big Data Science

    Get PDF
    Big Data is a new field in both scientific research and IT industry focusing on collections of data sets which are so huge and complex that create numerous difficulties not only in processing them but also in transferring and storing them. The Big Data science tries to overcome problems or optimize performancebased on the “5V” concept: Volume, Variety, Velocity, Variability and Value. A Big Data infrastructure integrates advanced IT technologies such as Cloud computing, databases, network and HPC, providing scientists with all the required functionality for performing high level research activities. The EU project of ENVRI is an example of developing Big Data infrastructure for environmental scientists with a special focus on issues like architecture, metadata frameworks, data discovery etc. In Big Data infrastructures like ENVRI, aggregating huge amount of data from different sources, and transferring them between distribution locations are important processes in the many experiments [5]. Efficient data transfer is thus a key service required in the big data infrastructure. At the same time, Software Defined Networking (SDN) is a new promising approach of networking. SDN decouples the control interface from network devices and allows high level applications to manipulate network behavior. However, most of the existing high level data transfer protocols treat network as a black box, and do not include the control for network level functionality. There is a scientific gap between Big Data science and Software Defined Networking and -until now- there is no work done combining these two technologies. This gap leads our research on this project

    Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks

    Get PDF
    With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management. To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets

    Variable Selection In Big Data With Applications To Develop A New Epigenetic Clock

    Get PDF
    Big data is known for 5V\u27s: (1) volume with huge quantity/amount (large n) and/or large number of variables (large p), (2) variety with various type, nature, and format, (3) velocity with ultra-high speed of data generation/collection, (4) veracity for its trustworthiness and quality of big data, and (5) value for its insights, usefulness and impact. Current computational resources, traditional methodologies and techniques are hard to keep up with the extraordinary volume of data being generated. Therefore, it is challenging to extract useful information from the big data with current computational resources. In this dissertation, we propose procedures to address some of the issues raised with several strategies for some modern variable selection procedures. In particular, we are evaluating various procedures (1) random sub-sampling so that the sub-data will be similar to the original big data, (2) random rows partitions so that all data will be included, (3) random columns partitions to reduce the dimension size for feasible model building and/or variable selection while all columns can be included, (4) random matrix partitions is a natural extension using both row partition and column partition . Results from each proposed procedure can be combined via some ensemble methods.In aging biomarker study, methylation of cytosine residues of cytosine-phosphate-guanine dinucleotides (CpGs) shows strong associations with aging. Several such epigenetic clocks are proposed in the literature. Hannum clock (2013) with 71 CpGs Horvath clock (2013) with 353 CpGs, Levine clock (2015), and GrimAge clock (2019) with 1,030 CpGs. We will demonstrate that our proposed procedures can be useful in this research area to build a simpler but useful model for ultra-high dimension data. In our study, a total of 2640 SJLIFE participants of European ancestry were included, consisting of 2112 SJLIFE childhood cancer survivors as training data and a separate 528 cancer survivors as validation data. The data includes 689,414 CpGs. This is a clear example of large p (p=689,414) and the sample size n is much smaller. We demonstrate that we can indeed develop a new DNA methylation-based epigenetic clock with much smaller of CpG sites using the proposed procedures

    Data Placement And Task Mapping Optimization For Big Data Workflows In The Cloud

    Get PDF
    Data-centric workflows naturally process and analyze a huge volume of datasets. In this new era of Big Data there is a growing need to enable data-centric workflows to perform computations at a scale far exceeding a single workstation\u27s capabilities. Therefore, this type of applications can benefit from distributed high performance computing (HPC) infrastructures like cluster, grid or cloud computing. Although data-centric workflows have been applied extensively to structure complex scientific data analysis processes, they fail to address the big data challenges as well as leverage the capability of dynamic resource provisioning in the Cloud. The concept of “big data workflows” is proposed by our research group as the next generation of data-centric workflow technologies to address the limitations of exist-ing workflows technologies in addressing big data challenges. Executing big data workflows in the Cloud is a challenging problem as work-flow tasks and data are required to be partitioned, distributed and assigned to the cloud execution sites (multiple virtual machines). In running such big data work-flows in the cloud distributed across several physical locations, the workflow execution time and the cloud resource utilization efficiency highly depends on the initial placement and distribution of the workflow tasks and datasets across the multiple virtual machines in the Cloud. Several workflow management systems have been developed for scientists to facilitate the use of workflows; however, data and work-flow task placement issue has not been sufficiently addressed yet. In this dissertation, I propose BDAP strategy (Big Data Placement strategy) for data placement and TPS (Task Placement Strategy) for task placement, which improve workflow performance by minimizing data movement across multiple virtual machines in the Cloud during the workflow execution. In addition, I propose CATS (Cultural Algorithm Task Scheduling) for workflow scheduling, which improve workflow performance by minimizing workflow execution cost. In this dissertation, I 1) formalize data and task placement problems in workflows, 2) propose a data placement algorithm that considers both initial input dataset and intermediate datasets obtained during workflow run, 3) propose a task placement algorithm that considers placement of workflow tasks before workflow run, 4) propose a workflow scheduling strategy to minimize the workflow execution cost once the deadline is provided by user and 5)perform extensive experiments in the distributed environment to validate that our proposed strategies provide an effective data and task placement solution to distribute and place big datasets and tasks into the appropriate virtual machines in the Cloud within reasonable time

    Big Data Transformation in Agriculture: From Precision Agriculture Towards Smart Farming

    Full text link
    [EN] Big data is a concept that has changed the way to analyse data and information in different environments such as industry and recently, in agriculture. It is used to describe a large volume of data (structured or unstructured data), which are difficult to obtain, process or parse using conventional technologies and tools like relational databases or conventional statistics, in a reasonable time for their insight. However, Big Data is applied differently in each area to take advantage of its potential and capabilities. Specially in agriculture that presents more demanding conditions due to its inherent uncertainty, so Big Data methods and models from other environments cannot be used straight away in this area. In this paper, we present a review/update of term Big Data and analyse the evolution and the role of Big Data in agriculture outlined the element of collaboration.All authors acknowledge the partial support of Project 691249, RUC-APS: Enhancing and implementing Knowledge based ICT solutions within high Risk and Uncertain Conditions for Agriculture Production Systems, funded by the EU under its funding scheme H2020-MSCA-RISE-2015; and the project "Development of an integrated maturity model for agility, resilience and gender perspective in supply chains (MoMARGE). Application to the agricultural sector." Ref. GV/2017/025 funded by the Generalitat Valenciana. This first author was supported by the Aid Programme of Research and Development of Universitat Politecnica de Valencia [PAID-01-18].RodrĂ­guez-SĂĄnchez, MDLÁ.; Cuenca, L.; Ortiz Bas, Á. (2019). Big Data Transformation in Agriculture: From Precision Agriculture Towards Smart Farming. IFIP Advances in Information and Communication Technology. 568:467-474. https://doi.org/10.1007/978-3-030-28464-0_40S467474568Cox, M., Ellsworth, D.: Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization 1997, p. 235. IEEE Computer Society Press (1997)Laney, D.: 3D data management: controlling data volume, velocity and variety. META Group Res. Note 6, 1 (2001)Beyer, M.A., Laney, D.: The Importance of “Big Data”: A Definition. Gartner, Stamford (2012)Kamilaris, A., et al.: A review on the practice of big data analysis in agriculture. Computers and Electronics in Agriculture 143(C), 23–37 (2017)Marr, B.: How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read (2019). https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#5671a61d60baNIST. The definition of Big Data. https://bigdatawg.nist.gov/home.phpIBM. The definition of Big Data. https://www.ibm.com/analytics/hadoop/big-data-analyticsOracle. The definition of Big Data. https://www.oracle.com/big-data/guide/what-is-big-data.htmlShahbaz, M., Gao, Ch., Zhai, L., Shahzad, F., Hu, Y.: Investigating the adoption of big data analytics in healthcare: the moderating role of resistance to change. J. Big Data 6 (2019). https://doi.org/10.1186/s40537-019-0170-yTrom, L., Cronje, J.: Analysis of data governance implications on big data. In: Arai, K., Bhatia, R. (eds.) FICC 2019. LNNS, vol. 69, pp. 645–654. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-12388-8_45Tao, F., et al.: A field programmable gate array implemented fibre channel switch for big data communication towards smart manufacturing. Robotics and Computer Integrated Manufacturing 57, 166–181 (2019)Lu, Y., Li, X., Zhong, J., Xiong, Y.: Research on the innovation of strategic business model in green agricultural products based on Internet of Things (IOT) - May 2010 (2010)Zhao, L., Yin, S., Liu, L., Zhang, Z., Wei, S.: A crop monitoring system based on wireless sensor network - December 2011 (2011)Chi, M., Plaza, A., Benediktsson, J.A., Sun, Z., Shen, J., Zhu, Y.: Big data for remote sensing: challenges and opportunities. Proc. IEEE 104(11), 2207–2219 (2016) https://doi.org/10.1109/jproc.2016.2598228Rodriguez, M.A., Cuenca, L., Bas, A.: FIWARE open source standard platform in smart farming - a review. In: Proceedings of the 19th IFIP WG 5.5 Working Conference on Virtual Enterprises, PRO-VE 2018, Cardiff, UK, 17–19 September 2018 (2018). https://doi.org/10.1007/978-3-319-99127-6_50Stafford, J., LeBars, J.: A GPS backpack system for mapping soil and crop parameters in agricultural fields. J. Navig. 49(1), 9–21 (1996)Robert, P.C.: Precision agriculture: research needs and status in the USA. In: Stafford, J.V. (ed.) Proceedings of the 2nd European Conference on Precision Agriculture, Part 1, pp. 19–33. Academic Press, SCI/Sheffield (1999)Long, D.S., Nielsen, G.A., Henry, M.P., Westcott, M.P.: Remote sensing for northern plains precision agriculture. In: Paper Presented at the Space 2000, pp. 208–214 (2000)Ge, Y., Thomasson, J.A., Sui, R.: Remote sensing of soil properties in precision agriculture: a review. Front. Earth Sci. 5(3), 229–238 (2011)Sundmaeker, H., Verdouw, C., Wolfert, S., PĂ©rez L.: Internet of food and farm 2020. In: Paper presented at Digitising the Industry - Internet of Things Connecting Physical, Digital and Virtual Worlds, River Publishers, Gistrup/Delft, pp. 129–151 (2016)Barmpounakis, S., et al.: Management and control applications in agriculture domain via a FI Business-to-Business platform. Inf. Process. Agric. 2(1), 51–63 (2015)Musat, G., et al.: Advanced services for efficient management of smart farms. J. Parallel Distrib. Comput. 116, 3–17 (2018)FIspace. https://www.fispace.eu/whatisfispace.htmlAgricolus (2019). https://www.agricolus.com/Paton, N.W.: Automating data preparation: can we? Should we? Must we? In: CEUR Workshop Proceedings, p. 2324 (2019)Kim, K.S., Yoo, B.H., Shelia, V., Porter, C.H., Hoogenboom, G.: START: a data preparation tool for crop simulation models using web-based soil databases. Comput. Electron. Agric. 154, 256–264 (2018). https://doi.org/10.1016/j.compag.2018.08.023IoF2020 (2019). https://www.iof2020.eu
    • 

    corecore