Search CORE

348 research outputs found

Hillview:A trillion-cell spreadsheet for big data

Author: Aguilera Marcos K.
Budiu Mihai
Gopalan Parikshit
Kruiger Han
Suresh Lalith
Wieder Udi
Publication venue: 'VLDB Endowment'
Publication date: 01/07/2019
Field of study

Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems

arXiv.org e-Print Archive

Proceedings - University of Groningen

Dissertations of the University of Groningen

The WFCAM Science Archive

Author: A. Adamson
A. Lawrence
Abazajian
Abazajian
Adelman-McCarthy
Barrett
Bertin
Bonifacio
Calabretta
Casali
Dye
E. T. W. Sutorius
Emerson
Emerson
Epchtein
Foucaud
Hambly
Hambly
Hanisch
Hewett
I. Bond
J. Bryant
J. M. Stewart
J. P. Emerson
Kendall
Klein
Kleinmann
L. Rimoldini
Lane
Lawrence
Lodieu
Lodieu
Lodieu
Lupton
M. A. Read
N. C. Hambly
N. J. G. Cross
P. Hirst
P. M. Williams
R. G. Mann
R. S. Collins
S. Dye
S. J. Warren
Schlegel
Skrutskie
Stoughton
Thakar
Venemans
Walton
Warren
Warren
York
Publication venue: 'Wiley'
Publication date: 22/11/2007
Field of study

We describe the WFCAM Science Archive (WSA), which is the primary point of access for users of data from the wide-field infrared camera WFCAM on the United Kingdom Infrared Telescope (UKIRT), especially science catalogue products from the UKIRT Infrared Deep Sky Survey (UKIDSS). We describe the database design with emphasis on those aspects of the system that enable users to fully exploit the survey datasets in a variety of different ways. We give details of the database-driven curation applications that take data from the standard nightly pipeline-processed and calibrated files for the production of science-ready survey datasets. We describe the fundamentals of querying relational databases with a set of astronomy usage examples, and illustrate the results.Comment: 28 pages, 18 figures; accepted for publication in MNRAS (2007 November 8

arXiv.org e-Print Archive

Guiding legacy systems for evolution. PmatE: a case study of maintenance and engineering

Author: Monteiro André
Vieira Gonçalo
Publication venue: International Association for Digital Transformation and Technological Innovation
Publication date: 31/01/2022
Field of study

Even though software change is inevitable, accurate maintenance can extend software lifespan in a subtle way when both budget and time constraints get in the way of software replacement. In the University of Aveiro, the project PmatE – a quiz web platform created to encourage students to like Math – emerged in the early 1990’s and stacked several applications over the decades without major planning, cleaning or upgrade. This resulted in a huge-sized framework that was crucial to be always available and online and had high operational cost, leading to an increasing amount of technical debt. After 3 decades, the project was studied, refactored and refurbished, leading to a stable consistent framework ready for evolution and software spinouts. This work shows how to manage and engineer solutions to maintain a legacy system and evolve it even when tied up to heavy constraints.info:eu-repo/semantics/publishedVersio

The VISTA Science Archive

Author: Blake Robert P.
Collins Ross S.
Cross Nicholas J. G.
Emerson Jim P.
Hambly Nigel C.
Holliman Mark S.
Lawrence Andrew
Mann Robert G.
Noddle Keith T.
Read Mike A.
Sutorius Eckhard T. W.
Publication venue: 'EDP Sciences'
Publication date: 03/12/2012
Field of study

We describe the VISTA Science Archive (VSA) and its first public release of data from five of the six VISTA Public Surveys. The VSA exists to support the VISTA Surveys through their lifecycle: the VISTA Public Survey consortia can use it during their quality control assessment of survey data products before submission to the ESO Science Archive Facility (ESO SAF); it supports their exploitation of survey data prior to its publication through the ESO SAF; and, subsequently, it provides the wider community with survey science exploitation tools that complement the data product repository functionality of the ESO SAF. This paper has been written in conjunction with the first public release of public survey data through the VSA and is designed to help its users understand the data products available and how the functionality of the VSA supports their varied science goals. We describe the design of the database and outline the database-driven curation processes that take data from nightly pipeline-processed and calibrated FITS files to create science-ready survey datasets. Much of this design, and the codebase implementing it, derives from our earlier WFCAM Science Archive (WSA), so this paper concentrates on the VISTA-specific aspects and on improvements made to the system in the light of experience gained in operating the WSA.Comment: 22 pages, 16 figures. Minor edits to fonts and typos after sub-editting. Published in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

VIOLA - A multi-purpose and web-based visualization tool for neuronal-network simulation output

Author: Carde Corto
Diesmann Markus
Hagen Espen
Kuhlen Torsten W.
Senk Johanna
Weyers Benjamin
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Neuronal network models and corresponding computer simulations are invaluable tools to aid the interpretation of the relationship between neuron properties, connectivity and measured activity in cortical tissue. Spatiotemporal patterns of activity propagating across the cortical surface as observed experimentally can for example be described by neuronal network models with layered geometry and distance-dependent connectivity. The interpretation of the resulting stream of multi-modal and multi-dimensional simulation data calls for integrating interactive visualization steps into existing simulation-analysis workflows. Here, we present a set of interactive visualization concepts called views for the visual analysis of activity data in topological network models, and a corresponding reference implementation VIOLA (VIsualization Of Layer Activity). The software is a lightweight, open-source, web-based and platform-independent application combining and adapting modern interactive visualization paradigms, such as coordinated multiple views, for massively parallel neurophysiological data. For a use-case demonstration we consider spiking activity data of a two-population, layered point-neuron network model subject to a spatially confined excitation originating from an external population. With the multiple coordinated views, an explorative and qualitative assessment of the spatiotemporal features of neuronal activity can be performed upfront of a detailed quantitative data analysis of specific aspects of the data. Furthermore, ongoing efforts including the European Human Brain Project aim at providing online user portals for integrated model development, simulation, analysis and provenance tracking, wherein interactive visual analysis tools are one component. Browser-compatible, web-technology based solutions are therefore required. Within this scope, with VIOLA we provide a first prototype.Comment: 38 pages, 10 figures, 3 table

arXiv.org e-Print Archive

Frontiers - Publisher Connector

Juelich Shared Electronic Resources

FigShare

Development of a centralized log management system

Author: Abreu Joaquim Tomás Almada
Publication venue
Publication date: 10/07/2020
Field of study

Os registos de um sistema são uma peça crucial de qualquer sistema e fornecem uma visão útil daquilo que este está fazendo e do que acontenceu em caso de falha. Qualquer processo executado num sistema gera registos em algum formato. Normalmente, estes registos ficam armazenados em memória local. À medida que os sistemas evoluiram, o número de registos a analisar também aumentou, e, como consequência desta evolução, surgiu a necessidade de produzir um formato de registos uniforme, minimizando assim dependências e facilitando o processo de análise. A ams é uma empresa que desenvolve e cria soluções no mercado dos sensores. Com vinte e dois centros de design e três locais de fabrico, a empresa fornece os seus serviços a mais de oito mil clientes em todo o mundo. Um centro de design está localizado no Funchal, no qual está incluida uma equipa de engenheiros de aplicação que planeiam e desenvolvem applicações de software para clientes internos. O processo de desenvolvimento destes engenheiros envolve várias aplicações e programas, cada um com o seu próprio sistema de registos. Os registos gerados por cada aplicação são mantido em sistemas de armazenamento distintos. Se um desenvolvedor ou administrador quiser solucionar um problema que abrange várias aplicações, será necessário percorrer as várias localizações onde os registos estão armazenados, colecionando-os e correlacionando-os de forma a melhor entender o problema. Este processo é cansativo e, se o ambiente for dimensionado automaticamente, a solução de problemas semelhantes torna-se inconcebível. Este projeto teve como principal objetivo resolver estes problemas, criando assim um Sistema de Gestão de Registos Centralizado capaz de lidar com registos de várias fontes, como também fornecer serviços que irão ajudar os desenvolvedores e administradores a melhor entender os diferentes ambientes afetados. A solução final foi desenvolvida utilizando um conjunto de diferentes tecnologias de código aberto, tais como a Elastic Stack (Elasticsearch, Logstash e Kibana), Node.js, GraphQL e Cassandra. O presente documento descreve o processo e as decisões tomadas para chegar à solução apresentada.Logs are a crucial piece of any system and give a helpful insight into what it is doing as well as what happened in case of failure. Every process running on a system generates logs in some format. Generally, these logs are written to local storage resources. As systems evolved, the number of logs to analyze increased, and, as a consequence of this progress, there was the need of having a standardized log format, minimizing dependencies and making the analysis process easier. ams is a company that develops and creates sensor solutions. With twenty-two design centers and three manufacturing locations, the company serves to over eight thousand clients worldwide. One design center is located in Funchal, which includes a team of application engineers that design and develop software applications to clients inside the company. The application engineer’s development process is comprised of several applications and programs, each having its own logging system. Log entries generated by different applications are kept in separate storage systems. If a developer or administrator wants to troubleshoot an issue that includes several applications, he/she would have to go to different database systems or locations to collect the logs and correlate them across the several requests. This is a tiresome process and if the environment is auto-scaled, then troubleshooting an issue is inconceivable. This project aimed to solve these problems by creating a Centralized Log Management System that was capable of handling logs from a variety of sources, as well as to provide services that will help developers and administrators better understand the different affected environments. The deployed solution was developed using a set of different open-source technologies, such as the Elastic Stack (Elasticsearch, Logstash and Kibana), Node.js, GraphQL and Cassandra. The present document describes the process and decisions taken to achieve the solution

Trade-offs between privacy and efficiency on databases

Author: Rogério António da Costa Pontes
Publication venue
Publication date: 04/05/2021
Field of study

DSpace Manual: Software version 1.5

Author: DSpace Foundation
Publication venue: The DSpace Foundation
Publication date: 01/05/2008
Field of study

DSpace is an open source software platform that enables organizations to: - Capture and describe digital material using a submission workflow module, or a variety of programmatic ingest options - Distribute an organization's digital assets over the web through a search and retrieval system - Preserve digital assets over the long term This system documentation includes a functional overview of the system, which is a good introduction to the capabilities of the system, and should be readable by nontechnical personnel. Everyone should read this section first because it introduces some terminology used throughout the rest of the documentation. For people actually running a DSpace service, there is an installation guide, and sections on configuration and the directory structure. Note that as of DSpace 1.2, the administration user interface guide is now on-line help available from within the DSpace system. Finally, for those interested in the details of how DSpace works, and those potentially interested in modifying the code for their own purposes, there is a detailed architecture and design section

Boston University Institutional Repository (OpenBU)

Development of a parallel database environment

Author: Tranter Mette
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Bridging OPC UA and DPWS for Industrial SOA

Author: Minor Johannes
Publication venue
Publication date: 07/03/2012
Field of study

Two web-service based specifications, OPC Unified Architecture (OPC UA) and Devices Profile for Web Services (DPWS), have been proposed by various researchers and organizations as possible enabling technologies for an event-driven Service Oriented Architecture for monitoring and control in manufacturing applications. This paper aims to propose and demonstrate an approach for bridging these two technologies in a way that is applicable in existing industrial applications. A merger between OPC UA and DPWS that effectively combines their complementary strengths could help pave the path toward future industrial event-driven SOA applications, with the inherent modularity, agility, and interoperability envisioned by researchers today. A representation of DPWS devices, services, operations and events in the OPC UA data model is proposed, and a DPWS Module is developed for Ignition, a commercially available HMI/SCADA and MES platform with integrated OPC UA Server. The module discovers DPWS devices in a local network, creates the representation in the address space, and handles subscriptions, input and output parameter values, and invoking operations. A Complex Event Processing component based on Microsoft’s StreamInsight is also integrated with the system, input and output adapters exposing web service interfaces. The system prototype developed will be used as the base for a use case demonstrator in the European Commission’s Framework Package 7 Project, “Architecture for Service-Oriented Process Monitoring and Control (IMC AESOP).” The project aims to develop a system of systems approach for monitoring and control, based on SOA for very large-scale systems in the process industries