39,325 research outputs found
Learning to Prove Theorems via Interacting with Proof Assistants
Humans prove theorems by relying on substantial high-level reasoning and
problem-specific insights. Proof assistants offer a formalism that resembles
human mathematical reasoning, representing theorems in higher-order logic and
proofs as high-level tactics. However, human experts have to construct proofs
manually by entering tactics into the proof assistant. In this paper, we study
the problem of using machine learning to automate the interaction with proof
assistants. We construct CoqGym, a large-scale dataset and learning environment
containing 71K human-written proofs from 123 projects developed with the Coq
proof assistant. We develop ASTactic, a deep learning-based model that
generates tactics as programs in the form of abstract syntax trees (ASTs).
Experiments show that ASTactic trained on CoqGym can generate effective tactics
and can be used to prove new theorems not previously provable by automated
methods. Code is available at https://github.com/princeton-vl/CoqGym.Comment: Accepted to ICML 201
SenseCam image localisation using hierarchical SURF trees
The SenseCam is a wearable camera that automatically takes photos of the wearer's activities, generating thousands of images per day.
Automatically organising these images for efficient search and retrieval is a challenging task, but can be simplified by providing
semantic information with each photo, such as the wearer's location during capture time. We propose a method for automatically determining the wearer's location using an annotated image database, described using SURF interest point descriptors. We show that SURF out-performs SIFT in matching SenseCam images and that matching can be done efficiently using hierarchical trees of SURF descriptors. Additionally, by re-ranking the top images using bi-directional SURF matches, location matching performance is improved further
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Learning to Extract Keyphrases from Text
Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft?s Word 97 and Verity?s Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. The third set of experiments examines the performance of GenEx on the task of metadata generation, relative to the performance of Microsoft?s Word 97. The fourth and final set of experiments investigates the performance of GenEx on the task of highlighting, relative to Verity?s Search 97. The experimental results support the claim that a specialized learning algorithm (GenEx) can generate better keyphrases than a general-purpose learning algorithm (C4.5) and the non-learning algorithms that are used in commercial software (Word 97 and Search 97)
Recommended from our members
Arcadia, a software development environment research project
The research objectives of the Arcadia project are two-fold: discovery and development of environment architecture principles and creation of novel software development tools, particularly powerful analysis tools, which will function within an environment built upon these architectural principles.Work in the architecture area is concerned with providing the framework to support integration while also supporting the often conflicting goal of extensibility. Thus, this area of research is directed toward achieving external integration by providing a consistent, uniform user interface, while still admitting customization and addition of new tools and interface functions. In an effort to also attain internal integration, research is aimed at developing mechanisms for structuring and managing the tools and data objects that populate a software development environment, while facilitating the insertion of new kinds of tools and new classes of objects.The unifying theme of work in the tools area is support for effective analysis at every stage of a software development project. Research is directed toward tools suitable for analyzing pre-implementation descriptions of software, software itself, and towards the production of testing and debugging tools. In many cases, these tools are specifically tailored for applicability to concurrent, distributed, or real-time software systems.The initial focus of Arcadia research is on creating a prototype environment, embodying the architectural principles, which supports Ada1 software development. This prototype environment is itself being developed in Ada.Arcadia is being developed by a consortium of researchers from the University of California at Irvine, the University of Colorado at Boulder, the University of Massachusetts at Amherst, TRW, Incremental Systems Corporation, and The Aerospace Corporation. This paper delineates the research objectives and describes the approaches being taken, the organization of the research endeavor, and current status of the work
The 1990 progress report and future plans
This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers
Simple identification tools in FishBase
Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further
development. It explores the possibility of a holistic and integrated computeraided strategy
- …