Search CORE

503 research outputs found

Synthetic Data Generation for the Internet of Things

Author: Anderson Jason
Apon Amy
Kennedy K. E.
Luckow Andre
Ngo Linh B.
Publication venue: Clemson University Libraries
Publication date: 01/10/2014
Field of study

The concept of Internet of Things (IoT) is rapidly moving from a vision to being pervasive in our everyday lives. This can be observed in the integration of connected sensors from a multitude of devices such as mobile phones, healthcare equipment, and vehicles. There is a need for the development of infrastructure support and analytical tools to handle IoT data, which are naturally big and complex. But, research on IoT data can be constrained by concerns about the release of privately owned data. In this paper, we present the design and implementation results of a synthetic IoT data generation framework. The framework enables research on synthetic data that exhibit the complex characteristics of original data without compromising proprietary information and personal privacy

Crossref

Clemson University: TigerPrints

A Systematic Review of Knowledge Visualization Approaches Using Big Data Methodology for Clinical Decision Support

Author: Archer Norm
Gabrielyan Anait R.
Roham Mehrdad
Publication venue: 'IntechOpen'
Publication date: 03/12/2019
Field of study

This chapter reports on results from a systematic review of peer-reviewed studies related to big data knowledge visualization for clinical decision support (CDS). The aims were to identify and synthesize sources of big data in knowledge visualization, identify visualization interactivity approaches for CDS, and summarize outcomes. Searches were conducted via PubMed, Embase, Ebscohost, CINAHL, Medline, Web of Science, and IEEE Xplore in April 2019, using search terms representing concepts of: big data, knowledge visualization, and clinical decision support. A Google Scholar gray literature search was also conducted. All references were screened for eligibility. Our review returned 3252 references, with 17 studies remaining after screening. Data were extracted and coded from these studies and analyzed using a PICOS framework. The most common audience intended for the studies was healthcare providers (n = 16); the most common source of big data was electronic health records (EHRs) (n = 12), followed by microbiology/pathology laboratory data (n = 8). The most common intervention type was some form of analysis platform/tool (n = 7). We identified and classified studies by visualization type, user intent, big data platforms and tools used, big data analytics methods, and outcomes from big data knowledge visualization of CDS applications

IntechOpen

Crossref

Big Data Analysis: Apache Storm Perspective

Author: Muhammad Hussain Iqbal
Rahim Tariq
Soomro
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-the boom in the technology has resulted in emergence of new concepts and challenges. Big data is one of those spoke about terms today. Big data is becoming a synonym for competitive advantages in business rivalries. Despite enormous benefits, big data accompanies some serious challenges and when it comes to analyzing of big data, it requires some serious thought. This study explores Big Data terminology and its analysis concepts using sample from Twitter data with the help of one of the most industry trusted real time processing and fault tolerant tool called Apache Storm

CiteSeerX

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Author: S. Bhuvaneswari, T. Anand
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2015
Field of study

With the explosive growth of information available on internet, WWW become the most powerful platform to broadcast, store and retrieve information. As many people move to internet to gather information, analyzing user behavior from web access logs can be helpful to create adaptive system, recommender system and intelligent e-commerce applications. Web access log files are the files that contain information about interaction between users and the websites with the use of internet. It contains the details like User name, IP Address, Time Stamp, Access Request, number of bytes transferred, result status, URL that referred. To analyze such user behavior, a variety of analyzer tools exist. This paper provides a comparative study between famous log analyzer tools based on their features and performance. DOI: 10.17762/ijritcc2321-8169.150510

International Journal on Recent and Innovation Trends in Computing and Communication

THE MODELING OF "MUSTAHIQ" DATA USING K-MEANS CLUSTERING ALGORITHM AND BIG DATA ANALYSIS (CASE STUDY: LAZ)

Author: Agustian Fajar
Buslim Nurhayati
Iswara Rayi Pradono
Publication venue: 'LP2M Universitas Islam Negeri (UIN) Syarif Hidayatullah Jakarta'
Publication date: 15/02/2021
Field of study

There are a lot of Mustahiq data in LAZ (Lembaga Amil Zakat) which is spread in many locations today. Each LAZ has Mustahiq data that is different in type from other LAZ. There are differences in Mustahiq data types so that data that is so large cannot be used together even though the purpose of the data is the same to determine Mustahiq data. And to find out whether the Mustahiq data is still up to date (renewable), of course it will be very difficult due to the types of data types that are not uniform or different, long time span, and the large amount of data. To give zakat to Mustahiq certainly requires speed of information. So, in giving zakat to Mustahiq, LAZ will find it difficult to monitor the progress of the Mustahiq. It is possible that a Mustahiq will change his condition to become a Muzaki. This is the reason for the researcher to take this theme in order to help the existing LAZ to make it easier to cluster Mustahiq data. Furthermore, the data already in the cluster can be used by LAZ managers to develop the organization. This can also be a reference for determining the zakat recipient cluster to those who are entitled later. The research is "Modeling using K-Means Algorithm and Big Data analysis in determine Mustahiq data ". We got data Mustahiq with random sample from online and offline survey. Online data survey with Google form and Offline Data survey we got from BAZNAS (National Amil Zakat Agency) in Indonesia and another zakat agency (LAZ) in Jakarta. We conducted by combining data to analyzed using Big Data and K-Means Algorithm. K-Means algorithm is an algorithm for cluster n objects based on attributes into k partitions according to criteria that will be determined from large and diverse Mustahiq data. This research focuses on modeling that applies K-Means Algorithms and Big Data Analysis. The first we made tools for grouping simulation test data. We do several experimental and simulation scenarios to find a model in mapping Mustahiq data to developed best model for processing the data. The results of this study are displayed in tabular and graphical form, namely the proposed Mustahiq data processing model at Zakat Agency (LAZ). The simulation result from a total of 1109 correspondents, 300 correspondents are included in the Mustahiq cluster and 809 correspondents are included in the Non-Mustahiq cluster and have an accuracy rate of 83.40%. That means accuracy of the system modeling able to determine data Mustahiq. Result filtering based on Gender is “Male” accuracy 83.93%, based on Age is ”30-39” accuracy 71,03%, based on Job is “PNS” accuracy 83.39%, based on Education is “S1” accuracy 83.79%. The advantaged of research expected to be able to determine quickly whether the person meets the criteria as a mustahik or Muzaki for LAZ (Amil Zakat Agency). The result of modeling is K-Means clustering algorithm application program can be used if UIN Syarif Hidayatullah Jakarta want to develop LAZ (Amil Zakat Agency) too

JURNAL TEKNIK INFORMATIKA

The ‘schema-last' approach : data analytics and the intelligence life-cycle

Author: Brittliff Neil
Publication venue
Publication date: 01/01/2014
Field of study

University of Canberra Research Repository

Experimental evaluation of big data querying tools

Author: Rodrigues Mário Miguel Lucas
Publication venue
Publication date: 01/01/2017
Field of study

Nos últimos anos, o termo Big Data tornou-se um tópico bastanta debatido em várias áreas de negócio. Um dos principais desafios relacionados com este conceito é como lidar com o enorme volume e variedade de dados de forma eficiente. Devido à notória complexidade e volume de dados associados ao conceito de Big Data, são necessários mecanismos de consulta eficientes para fins de análise de dados. Motivado pelo rápido desenvolvimento de ferramentas e frameworks para Big Data, há muita discussão sobre ferramentas de consulta e, mais especificamente, quais são as mais apropriadas para necessidades analíticas específica. Esta dissertação descreve e compara as principais características e arquiteturas das seguintes conhecidas ferramentas analíticas para Big Data: Drill, HAWQ, Hive, Impala, Presto e Spark. Para testar o desempenho dessas ferramentas analíticas para Big Data, descrevemos também o processo de preparação, configuração e administração de um Cluster Hadoop para que possamos instalar e utilizar essas ferramentas, tendo um ambiente capaz de avaliar seu desempenho e identificar quais cenários mais adequados à sua utilização. Para realizar esta avaliação, utilizamos os benchmarks TPC-H e TPC-DS, onde os resultados mostraram que as ferramentas de processamento em memória como HAWQ, Impala e Presto apresentam melhores resultados e desempenho em datasets de dimensão baixa e média. No entanto, as ferramentas que apresentaram tempos de execuções mais lentas, especialmente o Hive, parecem apanhar as ferramentas de melhor desempenho quando aumentamos os datasets de referência

Repositório Comum

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX