18 research outputs found

    Data curation: towards a tool for all

    Get PDF
    Data science has started to become one of the most important skills one can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simple tasks. With this in mind, we have approached this issue by providing a new data science platform, termed DS4All.Curation, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining HCI concepts, this platform is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project UIDB/50014/2020

    Hubble Spacer Telescope

    Get PDF
    Visualizing a model checker’s run on a model can be useful when trying to gain a deeper understanding of the verification of the particular model. However, it can be difficult to formalize the problem that visualization solves as it varies from person to person. Having a visualized form of a model checker’s run allows a user to pinpoint sections of the run without having to look through the entire log multiple times or having to know what to look for. This thesis presents the Hubble Spacer Telescope (HST), a visualizer for Spacer, an SMT horn clause based solver. HST combines multiple exploration graph views along with customizable lemma transformations. HST offers a variety of ways to transform lemmas so that a user can pick and choose how they want lemmas to be presented. HST’s lemma transformations allow a user to change variable names, rearrange terms in a literal, and rearrange the placement of literals within the lemma through programming by example. HST allows users to not only visually depict a Spacer exploration log but it allows users to transform lemmas produced, in a way that the user hopes, will make understanding a Spacer model checking run, easier. Given a Spacer exploration log, HST creates a raw exploration graph where clicking on a node produces the state of the model as well as the lemmas learned from said state. In addition, there is a second graph view which summarizes the exploration into its proof obligations. HST uses programming by example to simplify lemma transformations so that users only have to modify a few lemmas to transform all lemmas in an exploration log. Users can also choose between multiple transformations to better suit their needs. This thesis presents an evaluation of HST through a case study. The case study is used to demonstrate the extent of the grammar created for lemma transformations. Users have the opportunity to transform disjunctions of literals produced by Spacer into a conditional statement, customized by the contents of the predicate. Since lemma transformations are completely customizable, HST can be viewed as per each individual user’s preferences

    Humanized data cleaning

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaData science has started to become one of the most important skills someone can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simpler tasks. Currently, in order to clean data you need a data scientist. The process of data cleaning, consisting of removing or correcting entries of a data set, usually requires programming knowledge as it is mostly performed using programming languages such as Python and R (kag). However, data cleaning could be performed by people that may possess better knowledge of the data domain, but lack the programming background, if this barrier is removed. We have studied current solutions that are available on the market, the type of interface each one uses to interact with the end users, such as a control flow interface, a tabular based interface or block-based languages. With this in mind, we have approached this issue by providing a new data science tool, termed Data Cleaning for All (DCA), that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining Human-Computer Interaction (HCI) concepts, this tool is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.A ciência de dados tornou-se uma das capacidades mais importantes que alguém pode possuir no mundo moderno, devido aos dados serem cada vez mais importantes na nossa sociedade. A acessibilidade da ciência de dados é, no entanto, limitada, requer software complicado ou conhecimentos de programação. Ambos podem ser desafiantes e difíceis de aprender bem, mesmo para tarefas simples. Atualmente, para efetuar a limpeza de dados e necessário um Data Scientist. O processo de limpeza de dados, que consiste em remover ou corrigir entradas de um dataset, é normalmente efetuado utilizando linguagens de programação como Python e R (kag). No entanto, a limpeza de dados poderia ser efetuada por profissionais que possuam melhor conhecimento sobre o domínio dos dados a tratar, mas que não possuam uma formação em ciências da computação. Estudamos soluções que estão presentes no mercado e o tipo de interface que cada uma usa para interagir com o utilizador, seja através de diagramas de fluxo de controlo, interfaces tabulares ou recorrendo a linguagens de programação baseadas em blocos. Com isto em mente, abordamos o problema através do desenvolvimento de uma nova plataforma onde podemos efetuar tarefas de ciências de dados com o nome Data Cleaning for All (DCA). Com esta ferramenta esperamos reduzir os conhecimentos necessários para efetuar tarefas nesta área, especialmente na área da limpeza de dados. Através da combinação de conceitos de HCI, a plataforma é: simples de usar através da manipulação direta dos dados e da demonstração de pré-visualizações das transformações; permite aos utilizadores poupar tempo através da eliminação de tarefas repetitivas ao calcular muitas das métricas que Data Scientist tem de calcular; e sugere transformações dos dados baseadas nos conteúdos dos mesmos, permitindo um ambiente mais inteligente

    EqFix: Fixing LaTeX Equation Errors by Examples

    Full text link
    LaTeX is a widely-used document preparation system. Its powerful ability in mathematical equation editing is perhaps the main reason for its popularity in academia. Sometimes, however, even an expert user may spend much time on fixing an erroneous equation. In this paper, we present EqFix, a synthesis-based repairing system for LaTeX equations. It employs a set of fixing rules, and can suggest possible repairs for common errors in LaTeX equations. A domain specific language is proposed for formally expressing the fixing rules. The fixing rules can be automatically synthesized from a set of input-output examples. An extension of relaxer is also introduced to enhance the practicality of EqFix. We evaluate EqFix on real-world examples and find that it can synthesize rules with high generalization ability. Compared with a state-of-the-art string transformation synthesizer, EqFix solved 37% more cases and spent only one third of their synthesis time

    Friends with benefits: implementing corecursion in foundational proof assistants

    Get PDF
    We introduce AmiCo, a tool that extends a proof assistant, Isabelle/HOL, with flexible function definitions well beyond primitive corecursion. All definitions are certified by the assistant’s inference kernel to guard against inconsistencies. A central notion is that of friends: functions that preserve the productivity of their arguments and that are allowed in corecursive call contexts. As new friends are registered, corecursion benefits by becoming more expressive. We describe this process and its implementation, from the user’s specification to the synthesis of a higher-order definition to the registration of a friend. We show some substantial case studies where our approach makes a difference
    corecore