Search CORE

316 research outputs found

Scalable structural index construction for json analytics

Author: Jiang Lin
Qiu Junqiao
Zhao Zhijia
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/12/2020
Field of study

JavaScript Object Notation ( JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by the expensive JSON parsing. To address this, recent work has shown that building bitwise indices on JSON data, called structural indices, can greatly accelerate querying. Despite its promise, the existing structural index construction does not scale well as records become larger and more complex, due to its (inherently) sequential construction process and the involvement of costly memory copies that grow as the nesting level increases. To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine

Michigan Technological University

Parallel in situ indexing for data-intensive computing

Author: Abbasi Hasan
Chacon Luis
Docan Ciprian
Kim Jinoh
Klasky Scott
Liu Qing
Podhorszki Norbert
Shoshani Arie
Wu Kesheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2011
Field of study

As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings

Crossref

UNT Digital Library

Integrated Reconfigurable Autonomous Architecture System

Author: He Ziming
Hosmer Tyson
Jiang Wanzhu
Wang J
Publication venue: Acadiahttps://2022.acadia.org/
Publication date: 01/12/2022
Field of study

Advances in state-of-the-art architectural robotics and artificially intelligent design algorithms have the potential not only to transform how we design and build architecture, but to fundamentally change our relationship to the built environment. This system is situated within a larger body of research related to embedding autonomous agency directly into the built environment through the linkage of AI, computation, and robotics. It challenges the traditional separation between digital design and physical construction through the development of an autonomous architecture with an adaptive lifecycle. Integrated Reconfigurable Autonomous Architecture System (IRAAS) is composed of three components: 1) an interactive platform for user and environmental data input, 2) an agent-based generative space planning algorithm with deep reinforcement learning for continuous spatial adaptation, 3) a distributed robotic material system with bi-directional cyber-physical control protocols for simultaneous state alignment. The generative algorithm is a multi-agent system trained using deep reinforcement learning to learn adaptive policies for adjusting the scales, shapes, and relational organization of spatial volumes by processing changes in the environment and user requirements. The robotic material system was designed with a symbiotic relationship between active and passive modular components. Distributed robots slide their bodies on tracks built into passive blocks that enable their locomotion while utilizing a locking and unlocking system to reconfigure the assemblages they move across. The three subsystems have been developed in relation to each other to consider both the constraints of the AI-driven design algorithm and the robotic material system, enabling intelligent spatial adaptation with a continuous feedback chain

UCL Discovery

Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

Author: Malensek Matthew
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2017
Field of study

2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Analysis and Design of Intelligent Logistics System Based on Internet of Things

Author: Yang Lv-qing, He Xiao-li, Zhou Sheng
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2015
Field of study

Based on Internet of things, .NET software development technology and GIS technology, this paper analyzes and designs a system of intelligent distribution information with software engineering life cycle theory as the guide to solve the problem of high complexity and low efficiency of manual operation in logistics and distribution, improve the level of intelligent operation and then improve the operating efficiency. It analyzes the business requirements of the system, then designs its physical architecture, software architecture and system structure, and constructs the terminal node distribution dynamic model of transmission route, realizing the main function modules of the system and verifying the correctness and effectiveness of the system results through systematic and comprehensive tests. DOI: 10.17762/ijritcc2321-8169.15065

International Journal on Recent and Innovation Trends in Computing and Communication

Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing

Author: Alégroth Emil
Feldt Robert
Kolström Pirjo
Publication venue
Publication date: 01/01/2016
Field of study

Context: Verification and validation (V&V) activities make up 20 to 50 percent of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing. Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice. Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique's maintenance costs and feasibility. Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that developing new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing. Conclusions: It is concluded that test automation can lower overall software development costs of a project whilst also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation

arXiv.org e-Print Archive

Chalmers Research

A spatial column-store to triangulate the Netherlands on the fly

Author: Alvanaki F. (Foteini)
Hage W.R. (Willem Robert) van
Koutsourakis P. (Panagiotis)
Kyzirakos K. (Konstantinos)
Pereira Goncalves R.A. (Romulo Antonio)
Tilburg T. (Tom) van
Werkhoven B. (Ben) van
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/10/2016
Field of study

3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS

Crossref

CWI's Institutional Repository

Análise de dados científicos baseada em algoritmos de indexação bitmap

Author: Leite José Vitor Delgado
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/03/2017
Field of study

Computer simulations in large-scale often consume and produce a large volume of raw data files, which can be presented in different formats. Users usually need to analyze domain-specific data based on data elements related through multiple files generated along the computer simulation execution. Different existing solutions, like FastBit and NoDB, intend to support this analysis by indexing raw data in order to allow direct access to specific elements in raw data files regions of interest. However, those solutions are limited to analyze a single raw data file at once, while they are used only after computer simulation execution. The ARMFUL architecture proposes a solution capable of guarantee dataflow management, record related raw data elements in a provenance database and combine techniques of raw data file analysis at runtime. Through a data model that supports integration between computer simulation execution data and domain data, the architecture allows for queries on data elements related by multiple files. This dissertation proposes the implementation of instances of raw data indexing and query processor components presented by ARMFUL architecture, aiming to reduce the elapsed time of data ingestion in the provenance database and support raw data exploratory analysis.As simulações computacionais de larga escala usualmente consomem e produzem grandes volumes de arquivos de dados científicos, os quais podem apresentar diferentes formatos. Os usuários, por sua vez, comumente necessitam analisar dados específicos de domínio baseados em elementos de dados relacionados por meio de múltiplos arquivos gerados ao longo da execução de simulações computacionais. Diferentes soluções existentes, como o FastBit e o NoDB, buscam apoiar esta análise por meio da indexação de dados científicos de forma a permitir o acesso direto a elementos específicos de regiões de interesse em arquivos de dados científicos. Entretanto, tais soluções são limitadas a analisar um único arquivo de dados científicos por vez, ao passo que são utilizadas apenas após a execução de simulações computacionais. A arquitetura ARMFUL propõe uma solução capaz de garantir a gerência do fluxo de dados, registrar elementos de dados científicos relacionados em uma base de proveniência e combinar técnicas de análise de arquivos de dados científicos em tempo de execução. A partir de um modelo de dados que apoia a integração de dados de execução da simulação computacional e dados de domínio, a arquitetura permite consultas a elementos de dados relacionados por múltiplos arquivos. Esta dissertação propõe a implementação de instâncias dos componentes de indexação de dados científicos e de processamento de consultas presentes na arquitetura ARMFUL, buscando reduzir o tempo total de ingestão de dados na base de proveniência e apoiar a análise exploratória de dados científicos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon