Search CORE

179,913 research outputs found

Big Data Refinement

Author: Boiten
Boiten
Crépeau
Derrick
Eerke A. Boiten
Eerke Boiten
John Derrick
Kusakabe
McIver
Ono
Pasquale
Reddy
Steve Reeves
Su
Ying
Zuiderveen Borgesius
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2016
Field of study

"Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society. Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

A COMPREHENSIVE SURVEY ON BIG-DATA RESEARCH AND ITS IMPLICATIONS – WHAT IS REALLY ‘NEW’ IN BIG DATA? - IT’S COGNITIVE BIG DATA!

Author: Lugmayr Artur
Scheib Christoph
Stockleben Bjoern
Publication venue: AIS Electronic Library (AISeL)
Publication date: 27/06/2016
Field of study

What is really ‘new’ in Big Data? – Big Data seems to be a hype that has been emerging during the past years. But it requires a more thorough discussion beyond the very common 3V (velocity, volume, and variety) approach. We established an expert group to re-discuss the notion of Big Data, identify new characteristics, and re-think what actually really is new in the idea of Big Data by analysing over 100 literature resources. We identified typical baseline scenarios (traffic, business processes, retail, health, and social media) as starting point, from which we explored the notion of Big Data from a different viewpoint. We concluded, that the idea of Big Data is simply not new, as well as we need to re-think our approach towards Big Data. We introduce a fully new way of thinking about Big Data, and coin it as the trend of ‘Cognitive Big Data’. The publications introduces a basic framework for our research results. However, this work remains work-in-progress, and we will continue with a refinement of the Cognitive Big Data Framework in one future publication

AIS Electronic Library (AISeL)

Multi-Resolution Texture Coding for Multi-Resolution 3D Meshes

Author: Fuentes Sánchez David
Morán Burgos Francisco
Pages Scasso Rafael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

We present an innovative system to encode and transmit textured multi-resolution 3D meshes in a progressive way, with no need to send several texture images, one for each mesh LOD (Level Of Detail). All texture LODs are created from the finest one (associated to the finest mesh), but can be re- constructed progressively from the coarsest thanks to refinement images calculated in the encoding process, and transmitted only if needed. This allows us to adjust the LOD/quality of both 3D mesh and texture according to the rendering power of the device that will display them, and to the network capacity. Additionally, we achieve big savings in data transmission by avoiding altogether texture coordinates, which are generated automatically thanks to an unwrapping system agreed upon by both encoder and decoder

Crossref

Archivo Digital UPM

A CRIS Data Science Investigation of Scientific Workflows of Agriculture Big Data and its Data Curation Elements

Author: Baker Peter N
Bertino Elisa
Branch Benjamin D
Xu Jai
Publication venue: 'Purdue University (bepress)'
Publication date: 05/03/2014
Field of study

This joint collaboration between the Purdue Libraries and Cyber Center demonstrates the next generation of computational platforms supporting interdisciplinary collaborative research. Such platforms are necessary for rapid advancements of technology, industry demand and scholarly congruence towards open data, open access, big data and cyber-infrastructure data science training. Our approach will utilize a Discovery Undergraduate Research Investigation effort as a preliminary research means to further joint library and computer science data curation research, tool development and refinement

Purdue E-Pubs

The Dual JL Transforms and Superfast Matrix Algorithms

Author: Luan Qi
Pan Victor Y.
Svadlenka John
Publication venue
Publication date: 02/04/2021
Field of study

We call a matrix algorithm superfast (aka running at sublinear cost) if it involves much fewer flops and memory cells than the matrix has entries. Using such algorithms is highly desired or even imperative in computations for Big Data, which involve immense matrices and are quite typically reduced to solving linear least squares problem and/or computation of low rank approximation of an input matrix. The known algorithms for these problems are not superfast, but we prove that their certain superfast modifications output reasonable or even nearly optimal solutions for large input classes. We also propose, analyze, and test a novel superfast algorithm for iterative refinement of any crude but sufficiently close low rank approximation of a matrix. The results of our numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap with arXiv:1710.07946, arXiv:1906.0411

arXiv.org e-Print Archive

FlashProfile: A Framework for Synthesizing Data Profiles

Author: Gulwani Sumit
Jain Prateek
Millstein Todd
Padhi Saswat
Perelman Daniel
Polozov Oleksandr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across

153

tasks over

75

large real datasets, we observe a median profiling time of only

\sim\,0.7\,

s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

arXiv.org e-Print Archive

eScholarship - University of California

focusing on prevention of water rates delinquency in local waterworks

Author: Kim Seonju
Publication venue: 'Sejong University Language Research Institute'
Publication date: 01/01/2018
Field of study

Thesis(Master) --KDI School:Master of Public Management,2018.Recently, there has been a growing interest in intelligent technology such as Big Data. It has become an emerging technology as a key issue in major trends in domestic and abroad. The purpose of this research is to study and propose Big Data analysis performance plan for minimizing the occurrence of delinquent customers by analyzing the past pattern of them in local waterworks operation efficiency project. In order to accomplish the purpose of this study, this research will be established analysis performance plan including infrastructure POC (that is, proof of concept) as a preliminary step for data collection and extraction, data refinement and transformation, data analysis and verification in the local waterworks. This paper will serve as an initial guide for Big Data analysis in this area. This research will be of interest to policy makers including K-water and municipal officers. This research will make use of existing databases, Korean public data, Korean Statistical Information Service data, and other working papers and journals.I. Introduction II. Literature Review III. Methods IV. Analysis and findings: Analysis Performance Plan V. Conclusion and SuggestionmasterpublishedSeon Ju, KIM

KDI School Archives

A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications

Author: Di Nitto E.
Guerriero M.
Merseguer J.
Perez-Palacin D.
Requeno J.I.
Tamburri D.A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Big Data or Data-Intensive applications (DIAs) seek to mine, manipulate, extract or otherwise exploit the potential intelligence hidden behind Big Data. However, several practitioner surveys remark that DIAs potential is still untapped because of very difficult and costly design, quality assessment and continuous refinement. To address the above shortcoming, we propose the use of a UML domain-specific modeling language or profile specifically tailored to support the design, assessment and continuous deployment of DIAs. This article illustrates our DIA-specific profile and outlines its usage in the context of DIA performance engineering and deployment. For DIA performance engineering, we rely on the Apache Hadoop technology, while for DIA deployment, we leverage the TOSCA language. We conclude that the proposed profile offers a powerful language for data-intensive software and systems modeling, quality evaluation and automated deployment of DIAs on private or public clouds

Repositorio Universidad de Zaragoza