205 research outputs found

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations

    Spreadsheet smells

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaViewing spreadsheets as a programing language makes it the most used programming language worldwide. In fact some studies performed show that the so called "end-user" programmers surpass the professional programmers by far. Because of this and the lack of support for abstraction, testing, encapsulation or structured programming, 90% of the spreadsheets in the real world have errors. This dissertation presents an effort to help with this problem. The main goal of this dissertation is to create a tool that allows us to detect probable problems in spreadsheets, those problems were called smells (a lighter error). Thus we first introduce some theoretic concepts like metrics and smells, such as for instance the Functional Dependency Smell that was adapted from the databases. We present the study made, showing the results obtained with the tool applied to a large set of spreadsheets, the EUSES corpus.Olhando para as folhas de cálculo como uma linguagem de programação faz dela a linguagem mais usada em todo mundo. Na verdade alguns estudos dizem que os chamados de programadores não-profissionais excedem em grande número os programadores profissionais. Por causa disso e da falta de mecanismos como abstracção, encapsulamento, ou programação estruturada, 90% das folhas de cálculo têm erros. Esta dissertação apresenta um esforço feito para ajudar com este problema. O objectivo principal desta dissertação é desenvolver uma ferramenta que permita detectar possiveis problemas em folhas de cálculo, esses problemas chamamos "smells" (uma indicação superficial que geralmente aponta para um problema mais profundo). Para isso, introduzimos alguns conceitos teoricos como metricas e smells, como por exemplo o Smell das Dependências Funcionais que adaptamos das bases de dados. Apresentámos o estudo que foi feito, mostrando os resultados obtidos pela ferramenta aplicada a um grande conjunto de folhas de cálculo, o EUSES Corpus

    Introduction to Data Science

    Get PDF
    This book was developed for ICT/LIS 661: Introduction to Data Science, as offered in the University of Kentucky\u27s School of Information Science. It adapts and expands on openly licensed materials to introduce readers to basic statistical concepts, the R programming language, and philosophical critique of data science. This open access textbook was supported by the University of Kentucky Libraries Alternative Textbook programhttps://uknowledge.uky.edu/slis_textbooks/1000/thumbnail.jp

    Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files

    Get PDF
    Tabular data on the web comes in various formats and shapes. Preparing data for data analysis and integration requires manual steps which go beyond simple parsing of the data. The preparation includes steps like correct configuration of the parser, removing of meaningless rows, casting of data types and reshaping of the table structure. The goal of this thesis is the development of a robust and modular system which is able to automatically transform messy CSV data sources into a tidy tabular data structure. The highly diverse corpus of CSV files from the UK open data hub will serve as a basis for the evaluation of the system

    The design and evaluation of an interface and control system for a scariculated rehabilitation robot arm

    Get PDF
    This thesis is concerned with the design and development of a prototype implementation of a Rehabilitation Robotic manipulator based on a novel kinematic configuration. The initial aim of the research was to identify appropriate design criteria for the design of a user interface and control system, and for the subsequent evaluation of the manipulator prototype. This led to a review of the field of rehabilitation robotics, focusing on user evaluations of existing systems. The review showed that the design objectives of individual projects were often contradictory, and that a requirement existed for a more general and complete set of design criteria. These were identified through an analysis of the strengths and weaknesses of existing systems, including an assessment of manipulator performances, commercial success and user feedback. The resulting criteria were used for the design and development of a novel interface and control system for the Middlesex Manipulator - the novel scariculated robotic system. A highly modular architecture was adopted, allowing the manipulator to provide a level of adaptability not approached by existing rehabilitation robotic systems. This allowed the interface to be configured to match the controlling ability and input device selections of individual users. A range of input devices was employed, offering variation in communication mode and bandwidth. These included a commercial voice recognition system, and a novel gesture recognition device. The later was designed using electrolytic tilt sensors, the outputs of which were encoded by artificial neural networks. These allowed for control of the manipulator through head or hand gestures. An individual with spinal-cord injury undertook a single-subject user evaluation of the Middlesex Manipulator over a period of four months. The evaluation provided evidence for the value of adaptability presented by the user interface. It was also shown that the prototype did not currently confonn to all the design criteria, but allowed for the identification of areas for design improvements. This work led to a second research objective, concerned with the problem of configuring an adaptable user interface for a specific individual. A novel form of task analysis is presented within the thesis, that allows the relative usability of interface configurations to be predicted based upon individual user and input device characteristics. An experiment was undertaken with 6 subjects performing 72 tasks runs with 2 interface configurations controlled by user gestures. Task completion times fell within the range predicted, where the range was generated using confidence intervals (α = 0.05) on point estimates of user and device characteristics. This allowed successful prediction over all task runs of the relative task completion times of interface configurations for a given user

    Engaging Learners: Differential Equations in Today\u27s World

    Get PDF
    Engaging Learners: Differential Equations in Today\u27s World CODEE Journal, Volume 14, Issue

    Parameter Identification for Thermal Reduced-Order Models in Electric Engines

    Get PDF
    One part of the validation process of electric engines must check for thermal aging and damage of their components due to the high temperatures to which they are exposed. This way, the thermal requirements of the machine can be defined, and specific minimum service life can be guaranteed. For this purpose, drives must be validated against the most critical cases identified through simulations of representative driving scenarios. Since the thermal models require a long computation time to determine the temperatures of the components in each time increment, reduced-order models (ROM s) that can estimate them quickly are preferred instead. Also, there are positions in the engine where the temperature in the thermal model is determined only by sensor data when performing calculations online, like with the coolant temperature. Since it is also relevant to compute these values when performing simulations offline, ROM s can be applied for this purpose as well. This project focuses on creating and comparing different types of ROM s for tempera- ture estimation in electric engines. Several variants of discrete-time state-space models (SSM s) have been developed in the literature, showing promising results but requiring a high level of expert knowledge. This work introduces a set of SSM s that is entirely data- based and does not require knowledge of the physics and dynamics of the motor. They allow the user to adjust the parameters in different engines and create customized variants. Three models were developed for each engine temperature to be estimated. A preprocess- ing of the driving data divides it into three possible domains, and each model estimates the temperatures in their respective one. Model discretization based on different scenarios has shown an improvement in estimation accuracy. Finally, black-box approaches based on artificial neural networks (ANN s) were designed since they showed high potential in literature. Regression and Long Short-Term Memory (LSTM ) models were created, and their hyperparameter s were optimized, but the results were of low performance compared to the SSM

    Visual programming in a heterogeneous multi-core environment

    Get PDF
    É do conhecimento geral de que, hoje em dia, a tecnologia evolui rapidamente. São criadas novas arquitecturas para resolver determinadas limitações ou problemas. Por vezes, essa evolução é pacífica e não requer necessidade de adaptação e, por outras, essa evolução pode Implicar mudanças. As linguagens de programação são, desde sempre, o principal elo de comunicação entre o programador e o computador. Novas linguagens continuam a aparecer e outras estão sempre em desenvolvimento para se adaptarem a novos conceitos e paradigmas. Isto requer um esforço extra para o programador, que tem de estar sempre atento a estas mudanças. A Programação Visual pode ser uma solução para este problema. Exprimir funções como módulos que recebem determinado Input e retomam determinado output poderá ajudar os programadores espalhados pelo mundo, através da possibilidade de lhes dar uma margem para se abstraírem de pormenores de baixo nível relacionados com uma arquitectura específica. Esta tese não só mostra como combinar as capacidades do CeII/B.E. (que tem uma arquitectura multi­processador heterogénea) com o OpenDX (que tem um ambiente de programação visual), como também demonstra que tal pode ser feito sem grande perda de performance. ABSTRACT; lt is known that nowadays technology develops really fast. New architectures are created ln order to provide new solutions for different technology limitations and problems. Sometimes, this evolution is pacific and there is no need to adapt to new technologies, but things also may require a change every once ln a while. Programming languages have always been the communication bridge between the programmer and the computer. New ones keep coming and other ones keep improving ln order to adapt to new concepts and paradigms. This requires an extra-effort for the programmer, who always needs to be aware of these changes. Visual Programming may be a solution to this problem. Expressing functions as module boxes which receive determined Input and return determined output may help programmers across the world by giving them the possibility to abstract from specific low-level hardware issues. This thesis not only shows how the CeII/B.E. (which has a heterogeneous multi-core architecture) capabilities can be combined with OpenDX (which has a visual programming environment), but also demonstrates that lt can be done without losing much performance

    Data integration support for offshore decommissioning waste management

    Get PDF
    Offshore oil and gas platforms have a design life of about 25 years whereas the techniques and tools used for managing their data are constantly evolving. Therefore, data captured about platforms during their lifetimes will be in varying forms. Additionally, due to the many stakeholders involved with a facility over its life cycle, information representation of its components varies. These challenges make data integration difficult. Over the years, data integration technology application in the oil and gas industry has focused on meeting the needs of asset life cycle stages other than decommissioning. This is the case because most assets are just reaching the end of their design lives. Currently, limited work has been done on integrating life cycle data for offshore decommissioning purposes, and reports by industry stakeholders underscore this need. This thesis proposes a method for the integration of the common data types relevant in oil and gas decommissioning. The key features of the method are that it (i) ensures semantic homogeneity using knowledge representation languages (Semantic Web) and domain specific reference data (ISO 15926); and (ii) allows stakeholders to continue to use their current applications. Prototypes of the framework have been implemented using open source software applications and performance measures made. The work of this thesis has been motivated by the business case of reusing offshore decommissioning waste items. The framework developed is generic and can be applied whenever there is a need to integrate and query disparate data involving oil and gas assets. The prototypes presented show how the data management challenges associated with assessing the suitability of decommissioned offshore facility items for reuse can be addressed. The performance of the prototypes show that significant time and effort is saved compared to the state-of‐the‐art solution. The ability to do this effectively and efficiently during decommissioning will advance the oil the oil and gas industry’s transition toward a circular economy and help save on cost
    corecore