9,212 research outputs found
Limitations of Cross-Lingual Learning from Image Search
Cross-lingual representation learning is an important step in making NLP
scale to all the world's languages. Recent work on bilingual lexicon induction
suggests that it is possible to learn cross-lingual representations of words
based on similarities between images associated with these words. However, that
work focused on the translation of selected nouns only. In our work, we
investigate whether the meaning of other parts-of-speech, in particular
adjectives and verbs, can be learned in the same way. We also experiment with
combining the representations learned from visual data with embeddings learned
from textual data. Our experiments across five language pairs indicate that
previous work does not scale to the problem of learning cross-lingual
representations beyond simple nouns
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Program Transformations for Asynchronous and Batched Query Submission
The performance of database/Web-service backed applications can be
significantly improved by asynchronous submission of queries/requests well
ahead of the point where the results are needed, so that results are likely to
have been fetched already when they are actually needed. However, manually
writing applications to exploit asynchronous query submission is tedious and
error-prone. In this paper we address the issue of automatically transforming a
program written assuming synchronous query submission, to one that exploits
asynchronous query submission. Our program transformation method is based on
data flow analysis and is framed as a set of transformation rules. Our rules
can handle query executions within loops, unlike some of the earlier work in
this area. We also present a novel approach that, at runtime, can combine
multiple asynchronous requests into batches, thereby achieving the benefits of
batching in addition to that of asynchronous submission. We have built a tool
that implements our transformation techniques on Java programs that use JDBC
calls; our tool can be extended to handle Web service calls. We have carried
out a detailed experimental study on several real-life applications, which
shows the effectiveness of the proposed rewrite techniques, both in terms of
their applicability and the performance gains achieved.Comment: 14 page
EOSDB: The Database for Nuclear EoS
Nuclear equation of state (EoS) plays an important role in understanding the
formation of compact objects such as neutron stars and black holes. The true
nature of the EoS has been a matter of debate at any density range not only in
the nuclear physics but also in the astronomy and astrophysics. We have
constructed a database of EoSs by compiling data from the literature. Our
database contains the basic properties of the nuclear EoS of symmetric nuclear
matter and of pure neutron matter. It also includes the detailed information
about the theoretical models, for example the adopted methods and assumptions
in individual models. The novelty of the database is to consider new
experimental probes such as the symmetry energy, its slope relative to the
baryon density, and the incompressibility, which enables the users to check
their model dependences. We demonstrate the performance of the EOSDB through
the examinations of the model dependence among different nuclear EoSs. It is
reveled that some theoretical EoSs, which is commonly used in astrophysics, do
not satisfactorily agree with the experimental constraints.Comment: 30 pages, 5 figures, submitted to Publications of the Astronomical
Society of Japan (revised
Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus
In Web search, entity-seeking queries often trigger a special Question
Answering (QA) system. It may use a parser to interpret the question to a
structured query, execute that on a knowledge graph (KG), and return direct
entity responses. QA systems based on precise parsing tend to be brittle: minor
syntax variations may dramatically change the response. Moreover, KG coverage
is patchy. At the other extreme, a large corpus may provide broader coverage,
but in an unstructured, unreliable form. We present AQQUCN, a QA system that
gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of
query syntax, between well-formed questions to short `telegraphic' keyword
sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals
from KGs and large corpora to directly rank KG entities, rather than commit to
one semantic interpretation of the query. AQQUCN models the ideal
interpretation as an unobservable or latent variable. Interpretations and
candidate entity responses are scored as pairs, by combining signals from
multiple convolutional networks that operate collectively on the query, KG and
corpus. On four public query workloads, amounting to over 8,000 queries with
diverse query syntax, we see 5--16% absolute improvement in mean average
precision (MAP), compared to the entity ranking performance of recent systems.
Our system is also competitive at entity set retrieval, almost doubling F1
scores for challenging short queries.Comment: Accepted to Information Retrieval Journa
- …