179,913 research outputs found
Big Data Refinement
"Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society.
Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing
A COMPREHENSIVE SURVEY ON BIG-DATA RESEARCH AND ITS IMPLICATIONS â WHAT IS REALLY âNEWâ IN BIG DATA? - ITâS COGNITIVE BIG DATA!
What is really ânewâ in Big Data? â Big Data seems to be a hype that has been emerging during the past years. But it requires a more thorough discussion beyond the very common 3V (velocity, volume, and variety) approach. We established an expert group to re-discuss the notion of Big Data, identify new characteristics, and re-think what actually really is new in the idea of Big Data by analysing over 100 literature resources. We identified typical baseline scenarios (traffic, business processes, retail, health, and social media) as starting point, from which we explored the notion of Big Data from a different viewpoint. We concluded, that the idea of Big Data is simply not new, as well as we need to re-think our approach towards Big Data. We introduce a fully new way of thinking about Big Data, and coin it as the trend of âCognitive Big Dataâ. The publications introduces a basic framework for our research results. However, this work remains work-in-progress, and we will continue with a refinement of the Cognitive Big Data Framework in one future publication
Multi-Resolution Texture Coding for Multi-Resolution 3D Meshes
We present an innovative system to encode and transmit textured multi-resolution 3D meshes in a progressive way, with no need to send several texture images, one for each mesh LOD (Level Of Detail). All texture LODs are created from the finest one (associated to the finest mesh), but can be re- constructed progressively from the coarsest thanks to refinement images calculated in the encoding process, and transmitted only if needed. This allows us to adjust the LOD/quality of both 3D mesh and texture according to the rendering power of the device that will display them, and to the network capacity. Additionally, we achieve big savings in data transmission by avoiding altogether texture coordinates, which are generated automatically thanks to an unwrapping system agreed upon by both encoder and decoder
A CRIS Data Science Investigation of Scientific Workflows of Agriculture Big Data and its Data Curation Elements
This joint collaboration between the Purdue Libraries and Cyber Center demonstrates the next generation of computational platforms supporting interdisciplinary collaborative research. Such platforms are necessary for rapid advancements of technology, industry demand and scholarly congruence towards open data, open access, big data and cyber-infrastructure data science training. Our approach will utilize a Discovery Undergraduate Research Investigation effort as a preliminary research means to further joint library and computer science data curation research, tool development and refinement
The Dual JL Transforms and Superfast Matrix Algorithms
We call a matrix algorithm superfast (aka running at sublinear cost) if it
involves much fewer flops and memory cells than the matrix has entries. Using
such algorithms is highly desired or even imperative in computations for Big
Data, which involve immense matrices and are quite typically reduced to solving
linear least squares problem and/or computation of low rank approximation of an
input matrix. The known algorithms for these problems are not superfast, but we
prove that their certain superfast modifications output reasonable or even
nearly optimal solutions for large input classes. We also propose, analyze, and
test a novel superfast algorithm for iterative refinement of any crude but
sufficiently close low rank approximation of a matrix. The results of our
numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap
with arXiv:1710.07946, arXiv:1906.0411
FlashProfile: A Framework for Synthesizing Data Profiles
We address the problem of learning a syntactic profile for a collection of
strings, i.e. a set of regex-like patterns that succinctly describe the
syntactic variations in the strings. Real-world datasets, typically curated
from multiple sources, often contain data in various syntactic formats. Thus,
any data processing task is preceded by the critical step of data format
identification. However, manual inspection of data to identify the different
formats is infeasible in standard big-data scenarios.
Prior techniques are restricted to a small set of pre-defined patterns (e.g.
digits, letters, words, etc.), and provide no control over granularity of
profiles. We define syntactic profiling as a problem of clustering strings
based on syntactic similarity, followed by identifying patterns that succinctly
describe each cluster. We present a technique for synthesizing such profiles
over a given language of patterns, that also allows for interactive refinement
by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have
implemented our technique as FlashProfile. Across tasks over large
real datasets, we observe a median profiling time of only s.
Furthermore, we show that access to syntactic profiles may allow for more
accurate synthesis of programs, i.e. using fewer examples, in
programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201
focusing on prevention of water rates delinquency in local waterworks
Thesis(Master) --KDI School:Master of Public Management,2018.Recently, there has been a growing interest in intelligent technology such as Big Data. It has become an emerging technology as a key issue in major trends in domestic and abroad. The purpose of this research is to study and propose Big Data analysis performance plan for minimizing the occurrence of delinquent customers by analyzing the past pattern of them in local waterworks operation efficiency project. In order to accomplish the purpose of this study, this research will be established analysis performance plan including infrastructure POC (that is, proof of concept) as a preliminary step for data collection and extraction, data refinement and transformation, data analysis and verification in the local waterworks. This paper will serve as an initial guide for Big Data analysis in this area. This research will be of interest to policy makers including K-water and municipal officers. This research will make use of existing databases, Korean public data, Korean Statistical Information Service data, and other working papers and journals.I. Introduction
II. Literature Review
III. Methods
IV. Analysis and findings: Analysis Performance Plan
V. Conclusion and SuggestionmasterpublishedSeon Ju, KIM
A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications
Big Data or Data-Intensive applications (DIAs) seek to mine, manipulate, extract or otherwise exploit the potential intelligence hidden behind Big Data. However, several practitioner surveys remark that DIAs potential is still untapped because of very difficult and costly design, quality assessment and continuous refinement. To address the above shortcoming, we propose the use of a UML domain-specific modeling language or profile specifically tailored to support the design, assessment and continuous deployment of DIAs. This article illustrates our DIA-specific profile and outlines its usage in the context of DIA performance engineering and deployment. For DIA performance engineering, we rely on the Apache Hadoop technology, while for DIA deployment, we leverage the TOSCA language. We conclude that the proposed profile offers a powerful language for data-intensive software and systems modeling, quality evaluation and automated deployment of DIAs on private or public clouds
- âŚ