5,892 research outputs found
Learning Semantic Correspondences in Technical Documentation
We consider the problem of translating high-level textual descriptions to
formal representations in technical documentation as part of an effort to model
the meaning of such documentation. We focus specifically on the problem of
learning translational correspondences between text descriptions and grounded
representations in the target documentation, such as formal representation of
functions or code templates. Our approach exploits the parallel nature of such
documentation, or the tight coupling between high-level text and the low-level
representations we aim to learn. Data is collected by mining technical
documents for such parallel text-representation pairs, which we use to train a
simple semantic parsing model. We report new baseline results on sixteen novel
datasets, including the standard library documentation for nine popular
programming languages across seven natural languages, and a small collection of
Unix utility manuals.Comment: accepted to ACL-201
Polyglot Semantic Parsing in APIs
Traditional approaches to semantic parsing (SP) work by training individual
models for each available parallel dataset of text-meaning pairs. In this
paper, we explore the idea of polyglot semantic translation, or learning
semantic parsing models that are trained on multiple datasets and natural
languages. In particular, we focus on translating text to code signature
representations using the software component datasets of Richardson and Kuhn
(2017a,b). The advantage of such models is that they can be used for parsing a
wide variety of input natural languages and output programming languages, or
mixed input languages, using a single unified model. To facilitate modeling of
this type, we develop a novel graph-based decoding framework that achieves
state-of-the-art performance on the above datasets, and apply this method to
two other benchmark SP tasks.Comment: accepted for NAACL-2018 (camera ready version
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
First Attempt towards a Standard Glossary of Ontology Engineering Terminology
In this paper we present the consensus reaching process followed
within the NeOn consortium for the identification and definition of the
activities involved in the ontology network development process. This work
was conceived due to the lack of standardization in the Ontology Engineering
terminology, which clearly contrasts with the Software Engineering field that
boasts the IEEE Standard Glossary of Software Engineering Terminology.
The paper also includes the NeOn Glossary of Activities, which is the result
of the consensus reaching process here explained. Our future aim is to
standardize the NeOn Glossary of Activities
Recommended from our members
Semantic information systems engineering: A query-based approach for semi-automatic annotation of web services
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.There has been an increasing interest in Semantic Web services (SWS) as a proposed solution to facilitate automatic discovery, composition and deployment of existing syntactic Web services. Successful implementation and wider adoption of SWS by research and industry are, however, profoundly based on the existence of effective and easy to use methods for service semantic description. Unfortunately, Web service semantic annotation is currently performed by manual means. Manual annotation is a difficult, error-prone and time-consuming task and few approaches exist aiming to semi-automate that task. Existing approaches are difficult to use since they require ontology building. Moreover, these approaches employ ineffective matching methods and suffer from the Low Percentage Problem. The latter problem happens when a small number of service elements - in comparison to the total number of elements – are annotated in a given service.
This research addresses the Web services annotation problem by developing a semi-automatic annotation approach that allows SWS developers to effectively and easily annotate their syntactic services. The proposed approach does not require application ontologies to model service semantics. Instead, a standard query template is used: This template is filled with data and semantics extracted from WSDL files in order to produce query instances. The input of the annotation approach is the WSDL file of a candidate service and a set of ontologies. The output is an annotated WSDL file. The proposed approach is composed of five phases: (1) Concept extraction; (2) concept filtering and query filling; (3) query execution; (4) results assessment; and (5) SAWSDL annotation. The query execution engine makes use of name-based and structural matching techniques. The name-based matching is carried out by CN-Match which is a novel matching method and tool that is developed and evaluated in this research.
The proposed annotation approach is evaluated using a set of existing Web services and ontologies. Precision (P), Recall (R), F-Measure (F) and Percentage of annotated elements are used as evaluation metrics. The evaluation reveals that the proposed approach is effective since - in relation to manual results - accurate and almost complete annotation results are obtained. In addition, high percentage of annotated elements is achieved using the proposed approach because it makes use of effective ontology extension mechanisms
Recommended from our members
Global integration of public sector information
This paper deals with technological methods for consolidating assets lists of available public sector information (PSI) for re-use. In this direction, the effort is to review the state of the art in delivering access to PSI throughout the world and to prioritize the necessary engagements for joining available PSI catalogues. We propose an architectural framework grounded on Semantic Web technologies to deliver a global platform for federated searching. A speculative survey of available PSI portals is presented, and the initial implementation, results, and analysis of the proposed architecture are covered in detail
Improving Schema Mapping by Exploiting Domain Knowledge
This dissertation addresses the problem of semi-automatically creating schema mappings. The need for developing schema mappings is a pervasive problem in many integration scenarios. Although the problem is well-known and a large body of work exists in the area, the development of schema mappings is today largely performed manually in industrial integration scenarios. In this thesis an approach for the semi-automatic creation of high quality schema mappings is developed
- …