179,913 research outputs found

    Big Data Refinement

    Get PDF
    "Big data" has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the "data exhausts" of our society. Obviously, the refinement community knows how to do "refining". This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in "big data". In particular, can the data refinement paradigm can be used to explain aspects of big data processing

    A COMPREHENSIVE SURVEY ON BIG-DATA RESEARCH AND ITS IMPLICATIONS – WHAT IS REALLY ‘NEW’ IN BIG DATA? - IT’S COGNITIVE BIG DATA!

    Get PDF
    What is really ‘new’ in Big Data? – Big Data seems to be a hype that has been emerging during the past years. But it requires a more thorough discussion beyond the very common 3V (velocity, volume, and variety) approach. We established an expert group to re-discuss the notion of Big Data, identify new characteristics, and re-think what actually really is new in the idea of Big Data by analysing over 100 literature resources. We identified typical baseline scenarios (traffic, business processes, retail, health, and social media) as starting point, from which we explored the notion of Big Data from a different viewpoint. We concluded, that the idea of Big Data is simply not new, as well as we need to re-think our approach towards Big Data. We introduce a fully new way of thinking about Big Data, and coin it as the trend of ‘Cognitive Big Data’. The publications introduces a basic framework for our research results. However, this work remains work-in-progress, and we will continue with a refinement of the Cognitive Big Data Framework in one future publication

    Multi-Resolution Texture Coding for Multi-Resolution 3D Meshes

    Full text link
    We present an innovative system to encode and transmit textured multi-resolution 3D meshes in a progressive way, with no need to send several texture images, one for each mesh LOD (Level Of Detail). All texture LODs are created from the finest one (associated to the finest mesh), but can be re- constructed progressively from the coarsest thanks to refinement images calculated in the encoding process, and transmitted only if needed. This allows us to adjust the LOD/quality of both 3D mesh and texture according to the rendering power of the device that will display them, and to the network capacity. Additionally, we achieve big savings in data transmission by avoiding altogether texture coordinates, which are generated automatically thanks to an unwrapping system agreed upon by both encoder and decoder

    A CRIS Data Science Investigation of Scientific Workflows of Agriculture Big Data and its Data Curation Elements

    Get PDF
    This joint collaboration between the Purdue Libraries and Cyber Center demonstrates the next generation of computational platforms supporting interdisciplinary collaborative research. Such platforms are necessary for rapid advancements of technology, industry demand and scholarly congruence towards open data, open access, big data and cyber-infrastructure data science training. Our approach will utilize a Discovery Undergraduate Research Investigation effort as a preliminary research means to further joint library and computer science data curation research, tool development and refinement

    The Dual JL Transforms and Superfast Matrix Algorithms

    Full text link
    We call a matrix algorithm superfast (aka running at sublinear cost) if it involves much fewer flops and memory cells than the matrix has entries. Using such algorithms is highly desired or even imperative in computations for Big Data, which involve immense matrices and are quite typically reduced to solving linear least squares problem and/or computation of low rank approximation of an input matrix. The known algorithms for these problems are not superfast, but we prove that their certain superfast modifications output reasonable or even nearly optimal solutions for large input classes. We also propose, analyze, and test a novel superfast algorithm for iterative refinement of any crude but sufficiently close low rank approximation of a matrix. The results of our numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap with arXiv:1710.07946, arXiv:1906.0411

    FlashProfile: A Framework for Synthesizing Data Profiles

    Get PDF
    We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across 153153 tasks over 7575 large real datasets, we observe a median profiling time of only ∼ 0.7 \sim\,0.7\,s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

    focusing on prevention of water rates delinquency in local waterworks

    Get PDF
    Thesis(Master) --KDI School:Master of Public Management,2018.Recently, there has been a growing interest in intelligent technology such as Big Data. It has become an emerging technology as a key issue in major trends in domestic and abroad. The purpose of this research is to study and propose Big Data analysis performance plan for minimizing the occurrence of delinquent customers by analyzing the past pattern of them in local waterworks operation efficiency project. In order to accomplish the purpose of this study, this research will be established analysis performance plan including infrastructure POC (that is, proof of concept) as a preliminary step for data collection and extraction, data refinement and transformation, data analysis and verification in the local waterworks. This paper will serve as an initial guide for Big Data analysis in this area. This research will be of interest to policy makers including K-water and municipal officers. This research will make use of existing databases, Korean public data, Korean Statistical Information Service data, and other working papers and journals.I. Introduction II. Literature Review III. Methods IV. Analysis and findings: Analysis Performance Plan V. Conclusion and SuggestionmasterpublishedSeon Ju, KIM

    A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications

    Get PDF
    Big Data or Data-Intensive applications (DIAs) seek to mine, manipulate, extract or otherwise exploit the potential intelligence hidden behind Big Data. However, several practitioner surveys remark that DIAs potential is still untapped because of very difficult and costly design, quality assessment and continuous refinement. To address the above shortcoming, we propose the use of a UML domain-specific modeling language or profile specifically tailored to support the design, assessment and continuous deployment of DIAs. This article illustrates our DIA-specific profile and outlines its usage in the context of DIA performance engineering and deployment. For DIA performance engineering, we rely on the Apache Hadoop technology, while for DIA deployment, we leverage the TOSCA language. We conclude that the proposed profile offers a powerful language for data-intensive software and systems modeling, quality evaluation and automated deployment of DIAs on private or public clouds
    • …
    corecore