    A Feature-Augmented Grammar for Automated Media Production

    No full text
    The IST Polymnia project is creating a fully automated system for personalised video generation, including content creation, selection and composition. This paper presents a linguistically motivated solution using context-free feature-augmented grammar rules to describe editing tasks and hence automate video editing. The solution is media and application independent

    Vision-Based Production of Personalized Video

    No full text
    In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitor’s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach

    Learning to Prove Theorems via Interacting with Proof Assistants

    Full text link
    Humans prove theorems by relying on substantial high-level reasoning and problem-specific insights. Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics. However, human experts have to construct proofs manually by entering tactics into the proof assistant. In this paper, we study the problem of using machine learning to automate the interaction with proof assistants. We construct CoqGym, a large-scale dataset and learning environment containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. We develop ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs). Experiments show that ASTactic trained on CoqGym can generate effective tactics and can be used to prove new theorems not previously provable by automated methods. Code is available at https://github.com/princeton-vl/CoqGym.Comment: Accepted to ICML 201

    Um modelo para suporte automatizado ao reconhecimento, extração, personalização e reconstrução de gráficos estáticos

    Get PDF
    Data charts are widely used in our daily lives, being present in regular media, such as newspapers, magazines, web pages, books, and many others. A well constructed data chart leads to an intuitive understanding of its underlying data and in the same way, when data charts have wrong design choices, a redesign of these representations might be needed. However, in most cases, these charts are shown as a static image, which means that the original data are not usually available. Therefore, automatic methods could be applied to extract the underlying data from the chart images to allow these changes. The task of recognizing charts and extracting data from them is complex, largely due to the variety of chart types and their visual characteristics. Computer Vision techniques for image classification and object detection are widely used for the problem of recognizing charts, but only in images without any disturbance. Other features in real-world images that can make this task difficult are not present in most literature works, like photo distortions, noise, alignment, etc. Two computer vision techniques that can assist this task and have been little explored in this context are perspective detection and correction. These methods transform a distorted and noisy chart in a clear chart, with its type ready for data extraction or other uses. The task of reconstructing data is straightforward, as long the data is available the visualization can be reconstructed, but the scenario of reconstructing it on the same context is complex. Using a Visualization Grammar for this scenario is a key component, as these grammars usually have extensions for interaction, chart layers, and multiple views without requiring extra development effort. This work presents a model for automated support for custom recognition, and reconstruction of charts in images. The model automatically performs the process steps, such as reverse engineering, turning a static chart back into its data table for later reconstruction, while allowing the user to make modifications in case of uncertainties. This work also features a model-based architecture along with prototypes for various use cases. Validation is performed step by step, with methods inspired by the literature. This work features three use cases providing proof of concept and validation of the model. The first use case features usage of chart recognition methods focused on documents in the real-world, the second use case focus on vocalization of charts, using a visualization grammar to reconstruct a chart in audio format, and the third use case presents an Augmented Reality application that recognizes and reconstructs charts in the same context (a piece of paper) overlaying the new chart and interaction widgets. The results showed that with slight changes, chart recognition and reconstruction methods are now ready for real-world charts, when taking time, accuracy and precision into consideration.Os gráficos de dados são amplamente utilizados na nossa vida diária, estando presentes nos meios de comunicação regulares, tais como jornais, revistas, páginas web, livros, e muitos outros. Um gráfico bem construído leva a uma compreensão intuitiva dos seus dados inerentes e da mesma forma, quando os gráficos de dados têm escolhas de conceção erradas, poderá ser necessário um redesenho destas representações. Contudo, na maioria dos casos, estes gráficos são mostrados como uma imagem estática, o que significa que os dados originais não estão normalmente disponíveis. Portanto, poderiam ser aplicados métodos automáticos para extrair os dados inerentes das imagens dos gráficos, a fim de permitir estas alterações. A tarefa de reconhecer os gráficos e extrair dados dos mesmos é complexa, em grande parte devido à variedade de tipos de gráficos e às suas características visuais. As técnicas de Visão Computacional para classificação de imagens e deteção de objetos são amplamente utilizadas para o problema de reconhecimento de gráficos, mas apenas em imagens sem qualquer ruído. Outras características das imagens do mundo real que podem dificultar esta tarefa não estão presentes na maioria das obras literárias, como distorções fotográficas, ruído, alinhamento, etc. Duas técnicas de visão computacional que podem ajudar nesta tarefa e que têm sido pouco exploradas neste contexto são a deteção e correção da perspetiva. Estes métodos transformam um gráfico distorcido e ruidoso em um gráfico limpo, com o seu tipo pronto para extração de dados ou outras utilizações. A tarefa de reconstrução de dados é simples, desde que os dados estejam disponíveis a visualização pode ser reconstruída, mas o cenário de reconstrução no mesmo contexto é complexo. A utilização de uma Gramática de Visualização para este cenário é um componente chave, uma vez que estas gramáticas têm normalmente extensões para interação, camadas de gráficos, e visões múltiplas sem exigir um esforço extra de desenvolvimento. Este trabalho apresenta um modelo de suporte automatizado para o reconhecimento personalizado, e reconstrução de gráficos em imagens estáticas. O modelo executa automaticamente as etapas do processo, tais como engenharia inversa, transformando um gráfico estático novamente na sua tabela de dados para posterior reconstrução, ao mesmo tempo que permite ao utilizador fazer modificações em caso de incertezas. Este trabalho também apresenta uma arquitetura baseada em modelos, juntamente com protótipos para vários casos de utilização. A validação é efetuada passo a passo, com métodos inspirados na literatura. Este trabalho apresenta três casos de uso, fornecendo prova de conceito e validação do modelo. O primeiro caso de uso apresenta a utilização de métodos de reconhecimento de gráficos focando em documentos no mundo real, o segundo caso de uso centra-se na vocalização de gráficos, utilizando uma gramática de visualização para reconstruir um gráfico em formato áudio, e o terceiro caso de uso apresenta uma aplicação de Realidade Aumentada que reconhece e reconstrói gráficos no mesmo contexto (um pedaço de papel) sobrepondo os novos gráficos e widgets de interação. Os resultados mostraram que com pequenas alterações, os métodos de reconhecimento e reconstrução dos gráficos estão agora prontos para os gráficos do mundo real, tendo em consideração o tempo, a acurácia e a precisão.Programa Doutoral em Engenharia Informátic

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Book of abstracts: ISTAR-IUL Winter School 2018 Applied Transdisciplinary Research

    Get PDF

    Spatially Aware Computing for Natural Interaction

    Get PDF
    Spatial information refers to the location of an object in a physical or digital world. Besides, it also includes the relative position of an object related to other objects around it. In this dissertation, three systems are designed and developed. All of them apply spatial information in different fields. The ultimate goal is to increase the user friendliness and efficiency in those applications by utilizing spatial information. The first system is a novel Web page data extraction application, which takes advantage of 2D spatial information to discover structured records from a Web page. The extracted information is useful to re-organize the layout of a Web page to fit mobile browsing. The second application utilizes the 3D spatial information of a mobile device within a large paper-based workspace to implement interactive paper that combines the merits of paper documents and mobile devices. This application can overlay digital information on top of a paper document based on the location of a mobile device within a workspace. The third application further integrates 3D space information with sound detection to realize an automatic camera management system. This application automatically controls multiple cameras in a conference room, and creates an engaging video by intelligently switching camera shots among meeting participants based on their activities. Evaluations have been made on all three applications, and the results are promising. In summary, this dissertation comprehensively explores the usage of spatial information in various applications to improve the usability

    Astraea: Grammar-based Fairness Testing

    Get PDF
    Software often produces biased outputs. In particular, machine learning (ML) based software are known to produce erroneous predictions when processing discriminatory inputs. Such unfair program behavior can be caused by societal bias. In the last few years, Amazon, Microsoft and Google have provided software services that produce unfair outputs, mostly due to societal bias (e.g. gender or race). In such events, developers are saddled with the task of conducting fairness testing. Fairness testing is challenging; developers are tasked with generating discriminatory inputs that reveal and explain biases. We propose a grammar-based fairness testing approach (called ASTRAEA) which leverages context-free grammars to generate discriminatory inputs that reveal fairness violations in software systems. Using probabilistic grammars, ASTRAEA also provides fault diagnosis by isolating the cause of observed software bias. ASTRAEA's diagnoses facilitate the improvement of ML fairness. ASTRAEA was evaluated on 18 software systems that provide three major natural language processing (NLP) services. In our evaluation, ASTRAEA generated fairness violations with a rate of ~18%. ASTRAEA generated over 573K discriminatory test cases and found over 102K fairness violations. Furthermore, ASTRAEA improves software fairness by ~76%, via model-retraining