18 research outputs found
Data curation: towards a tool for all
Data science has started to become one of the most important skills one can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simple tasks.
With this in mind, we have approached this issue by providing a new data science platform, termed DS4All.Curation, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining HCI concepts, this platform is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and
automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project UIDB/50014/2020
Hubble Spacer Telescope
Visualizing a model checker’s run on a model can be useful when trying to gain a deeper
understanding of the verification of the particular model. However, it can be difficult to
formalize the problem that visualization solves as it varies from person to person. Having
a visualized form of a model checker’s run allows a user to pinpoint sections of the run
without having to look through the entire log multiple times or having to know what to look
for. This thesis presents the Hubble Spacer Telescope (HST), a visualizer for Spacer, an
SMT horn clause based solver. HST combines multiple exploration graph views along with
customizable lemma transformations. HST offers a variety of ways to transform lemmas
so that a user can pick and choose how they want lemmas to be presented. HST’s lemma
transformations allow a user to change variable names, rearrange terms in a literal, and
rearrange the placement of literals within the lemma through programming by example.
HST allows users to not only visually depict a Spacer exploration log but it allows users
to transform lemmas produced, in a way that the user hopes, will make understanding a
Spacer model checking run, easier.
Given a Spacer exploration log, HST creates a raw exploration graph where clicking
on a node produces the state of the model as well as the lemmas learned from said state.
In addition, there is a second graph view which summarizes the exploration into its proof
obligations. HST uses programming by example to simplify lemma transformations so that
users only have to modify a few lemmas to transform all lemmas in an exploration log.
Users can also choose between multiple transformations to better suit their needs.
This thesis presents an evaluation of HST through a case study. The case study is used
to demonstrate the extent of the grammar created for lemma transformations. Users have
the opportunity to transform disjunctions of literals produced by Spacer into a conditional
statement, customized by the contents of the predicate. Since lemma transformations are
completely customizable, HST can be viewed as per each individual user’s preferences
Humanized data cleaning
Dissertação de mestrado integrado em Engenharia InformáticaData science has started to become one of the most important skills someone can have
in the modern world, due to data taking an increasingly meaningful role in our lives.
The accessibility of data science is however limited, requiring complicated software or
programming knowledge. Both can be challenging and hard to master, even for the simpler
tasks.
Currently, in order to clean data you need a data scientist. The process of data cleaning,
consisting of removing or correcting entries of a data set, usually requires programming
knowledge as it is mostly performed using programming languages such as Python and
R (kag). However, data cleaning could be performed by people that may possess better
knowledge of the data domain, but lack the programming background, if this barrier is
removed.
We have studied current solutions that are available on the market, the type of interface
each one uses to interact with the end users, such as a control flow interface, a tabular
based interface or block-based languages. With this in mind, we have approached this issue
by providing a new data science tool, termed Data Cleaning for All (DCA), that attempts
to reduce the necessary knowledge to perform data science tasks, in particular for data
cleaning and curation. By combining Human-Computer Interaction (HCI) concepts, this tool
is: simple to use through direct manipulation and showing transformation previews; allows
users to save time by eliminate repetitive tasks and automatically calculating many of the
common analyses data scientists must perform; and suggests data transformations based on
the contents of the data, allowing for a smarter environment.A ciência de dados tornou-se uma das capacidades mais importantes que alguém pode possuir no mundo moderno, devido aos dados serem cada vez mais importantes na nossa sociedade. A acessibilidade da ciência de dados é, no entanto, limitada, requer software complicado ou conhecimentos de programação. Ambos podem ser desafiantes e difíceis de aprender bem, mesmo para tarefas simples. Atualmente, para efetuar a limpeza de dados e necessário um Data Scientist. O processo de limpeza de dados, que consiste em remover ou corrigir entradas de um dataset, é normalmente efetuado utilizando linguagens de programação como Python e R (kag). No entanto, a limpeza de dados poderia ser efetuada por profissionais que possuam melhor conhecimento sobre o domínio dos dados a tratar, mas que não possuam uma formação em ciências da computação. Estudamos soluções que estão presentes no mercado e o tipo de interface que cada uma usa para interagir com o utilizador, seja através de diagramas de fluxo de controlo, interfaces tabulares ou recorrendo a linguagens de programação baseadas em blocos. Com isto em mente, abordamos o problema através do desenvolvimento de uma nova plataforma onde podemos efetuar tarefas de ciências de dados com o nome Data Cleaning for All (DCA). Com esta ferramenta esperamos reduzir os conhecimentos necessários para efetuar tarefas nesta área, especialmente na área da limpeza de dados. Através da combinação de conceitos de HCI, a plataforma é: simples de usar através da manipulação direta dos dados e da demonstração de pré-visualizações das transformações; permite aos utilizadores poupar tempo através da eliminação de tarefas repetitivas ao calcular muitas das métricas que Data Scientist tem de calcular; e sugere transformações dos dados baseadas nos conteúdos dos mesmos, permitindo um ambiente mais inteligente
EqFix: Fixing LaTeX Equation Errors by Examples
LaTeX is a widely-used document preparation system. Its powerful ability in
mathematical equation editing is perhaps the main reason for its popularity in
academia. Sometimes, however, even an expert user may spend much time on fixing
an erroneous equation. In this paper, we present EqFix, a synthesis-based
repairing system for LaTeX equations. It employs a set of fixing rules, and can
suggest possible repairs for common errors in LaTeX equations. A domain
specific language is proposed for formally expressing the fixing rules. The
fixing rules can be automatically synthesized from a set of input-output
examples. An extension of relaxer is also introduced to enhance the
practicality of EqFix. We evaluate EqFix on real-world examples and find that
it can synthesize rules with high generalization ability. Compared with a
state-of-the-art string transformation synthesizer, EqFix solved 37% more cases
and spent only one third of their synthesis time
Friends with benefits: implementing corecursion in foundational proof assistants
We introduce AmiCo, a tool that extends a proof assistant, Isabelle/HOL, with flexible function definitions well beyond primitive corecursion. All definitions are certified by the assistant’s inference kernel to guard against inconsistencies. A central notion is that of friends: functions that preserve the productivity of their arguments and that are allowed in corecursive call contexts. As new friends are registered, corecursion benefits by becoming more expressive. We describe this process and its implementation, from the user’s specification to the synthesis of a higher-order definition to the registration of a friend. We show some substantial case studies where our approach makes a difference