Search CORE

223,859 research outputs found

Integrating R and Hadoop for Big Data Analysis

Author: Dragoescu Raluca Mariana
Oancea Bogdan
Publication venue
Publication date: 01/06/2014
Field of study

Analyzing and working with big data could be very diffi cult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Offi cial statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.Comment: Romanian Statistical Review no. 2 / 201

arXiv.org e-Print Archive

Directory of Open Access Journals

A Multi-criteria Group Decision Making Method for Selecting Big Data Visualization Tools

Author: Grandhi S.
Wibowo S.
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 15/02/2018
Field of study

Big data visualization tools are providing opportunities for businesses to strengthen decision making and achieve competitive advantages. Evaluating and selecting the most suitable big data visualization tool is however challenging. To effectively deal with this issue, this paper presents a multicriteria group decision making method for evaluating and selecting of big data visualization tools. Intuitionistic fuzzy numbers are used to tackle the subjectiveness and imprecision of the decision making process. The concept based on ideal solutions is applied for producing a relative closeness coefficient value for every big data visualization tool alternative across all evaluation criteria. A big data visualization tool selection problem is presented to demonstrate the applicability of the method

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Big Data Management in Education Sector: an Overview

Author: Nda Ramatu Muhammad
Tasmin Rosmaini Bin
Publication venue: 'Publishing Center Dialog'
Publication date: 30/06/2019
Field of study

The advancement in technological innovation has given rise to a new trend known as Big Data today. Given the soaring popularity of big data technology, organisations are profoundly attracted to and interested in it to transform their organisation by improving their businesses. Big data is enabling organisations to outpace their competitors and save cost. Similarly, the application of Big Data management in Universities is an essential aspect to institutions that have Big Data to manage; as the use of Big Data in the higher education sector is increasing day by day. Many studies have been carried out on big data and analytics with little interest in its management. Big Data management is a reality that represents a set of challenges involving Big Data modeling, storage, and retrieval, analysis, and visualization for several areas in organizations. This paper introduces and contributes to the conceptual and theoretical understanding of Big Data management within higher education as it outlines its relevance to higher education institutions. It describes the opportunities this growing research area brings to higher education as well as major challenges associated with it

Traektoria Nauki

Big Data Management in Education Sector: an Overview

Author: Nda Ramatu Muhammad
Tasmin Rosmaini Bin
Publication venue: 'Publishing Center Dialog'
Publication date: 01/01/2019
Field of study

Traektoria Nauki

DIALNET

A Systematic Review of Knowledge Visualization Approaches Using Big Data Methodology for Clinical Decision Support

Author: Archer Norm
Gabrielyan Anait R.
Roham Mehrdad
Publication venue: 'IntechOpen'
Publication date: 03/12/2019
Field of study

This chapter reports on results from a systematic review of peer-reviewed studies related to big data knowledge visualization for clinical decision support (CDS). The aims were to identify and synthesize sources of big data in knowledge visualization, identify visualization interactivity approaches for CDS, and summarize outcomes. Searches were conducted via PubMed, Embase, Ebscohost, CINAHL, Medline, Web of Science, and IEEE Xplore in April 2019, using search terms representing concepts of: big data, knowledge visualization, and clinical decision support. A Google Scholar gray literature search was also conducted. All references were screened for eligibility. Our review returned 3252 references, with 17 studies remaining after screening. Data were extracted and coded from these studies and analyzed using a PICOS framework. The most common audience intended for the studies was healthcare providers (n = 16); the most common source of big data was electronic health records (EHRs) (n = 12), followed by microbiology/pathology laboratory data (n = 8). The most common intervention type was some form of analysis platform/tool (n = 7). We identified and classified studies by visualization type, user intent, big data platforms and tools used, big data analytics methods, and outcomes from big data knowledge visualization of CDS applications

IntechOpen

Crossref

Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

Author: Lex Alexander
Ou Yi
Phillips Jeff M.
Zheng Yan
Publication venue
Publication date: 13/09/2017
Field of study

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampling. We also introduce a method to ensure that thresholding of low values based on sampled data does not omit any regions above the desired threshold when working with sampled data. We demonstrate the effectiveness of our approach using both, artificial and real-world large geospatial datasets

arXiv.org e-Print Archive

Crossref