1,100 research outputs found
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Automating quantitative information flow
PhDUnprecedented quantities of personal and business data are collected, stored,
shared, and processed by countless institutions all over the world. Prominent
examples include sharing personal data on social networking sites, storing
credit card details in every store, tracking customer preferences of supermarket
chains, and storing key personal data on biometric passports.
Confidentiality issues naturally arise from this global data growth. There
are continously reports about how private data is leaked from confidential
sources where the implications of the leaks range from embarrassment to serious
personal privacy and business damages.
This dissertation addresses the problem of automatically quantifying the
amount of leaked information in programs. It presents multiple program analysis
techniques of different degrees of automation and scalability.
The contributions of this thesis are two fold: a theoretical result and two
different methods for inferring and checking quantitative information flows are
presented.
The theoretical result relates the amount of possible leakage under any
probability distribution back to the order relation in Landauer and Redmond’s
lattice of partitions [35]. The practical results are split in two analyses: a first
analysis precisely infers the information leakage using SAT solving and model
counting; a second analysis defines quantitative policies which are reduced to
checking a k-safety problem. A novel feature allows reasoning independent of
the secret space.
The presented tools are applied to real, existing leakage vulnerabilities in
operating system code. This has to be understood and weighted within the
context of the information flow literature which suffers under an apparent lack
of practical examples and applications. This thesis studies such “real leaks”
which could influence future strategies for finding information leaks
Leveraging Language to Learn Program Abstractions and Search Heuristics
Inductive program synthesis, or inferring programs from examples of desired
behavior, offers a general paradigm for building interpretable, robust, and
generalizable machine learning systems. Effective program synthesis depends on
two key ingredients: a strong library of functions from which to build
programs, and an efficient search strategy for finding programs that solve a
given task. We introduce LAPS (Language for Abstraction and Program Search), a
technique for using natural language annotations to guide joint learning of
libraries and neurally-guided search models for synthesis. When integrated into
a state-of-the-art library learning system (DreamCoder), LAPS produces
higher-quality libraries and improves search efficiency and generalization on
three domains -- string editing, image composition, and abstract reasoning
about scenes -- even when no natural language hints are available at test time.Comment: appeared in Thirty-eighth International Conference on Machine
Learning (ICML 2021
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation
abstract: Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à -vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated.
Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.
Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.Dissertation/ThesisMasters Thesis Computer Engineering 201
Pseudo-contractions as Gentle Repairs
Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas
- …