1,466 research outputs found

    A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    Full text link
    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements

    Git4Voc: Git-based Versioning for Collaborative Vocabulary Development

    Full text link
    Collaborative vocabulary development in the context of data integration is the process of finding consensus between the experts of the different systems and domains. The complexity of this process is increased with the number of involved people, the variety of the systems to be integrated and the dynamics of their domain. In this paper we advocate that the realization of a powerful version control system is the heart of the problem. Driven by this idea and the success of Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences there are still important differences. These need to be considered within the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we were faced with during the creation of vocabularies collaboratively and discusses its distinction to software development. Based on these insights we propose Git4Voc which comprises guidelines how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs

    Enriching Existing Test Collections with OXPath

    Full text link
    Extending TREC-style test collections by incorporating external resources is a time consuming and challenging task. Making use of freely available web data requires technical skills to work with APIs or to create a web scraping program specifically tailored to the task at hand. We present a light-weight alternative that employs the web data extraction language OXPath to harvest data to be added to an existing test collection from web resources. We demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with additional metadata fields harvested via OXPath from the social sciences portal Sowiport. This allows the re-use of this collection for other evaluation purposes like bibliometrics-enhanced retrieval. The demonstrated method can be applied to a variety of similar scenarios and is not limited to extending existing collections but can also be used to create completely new ones with little effort.Comment: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 201

    Modalities to Implement the Multilinguality in Web DYNPRO ABAP

    Get PDF
    The integrated platform SAP Netweaver is a platform that offers support in realizing Web bussiness applications that use the Model View Controller (MVC) concept. The Multilinguality being a property of this platform. The purpose of this article is to highlight the modality to internationalize a Web Dynpro ABAP project The techniques used for the internationalization of a Web Dynpro ABAP application are: the OTR [Online Text Repository] translations, the implementation of the assistance class and the technique of information internationalization in a database. The case study has been performed on the trial “SAP Netweaver 7.0 Application Server ABAP” that offered the possibility to log-in in English and German languages.Computer programes, software

    Multilingual adaptive search for digital libraries

    Get PDF
    This paper describes a framework for Adaptive Multilingual Information Retrieval (AMIR) which allows multilingual resource discovery and delivery using on-the-ïŹ‚y machine translation of documents and queries. Result documents are presented to the user in a contextualised manner. Challenges and affordances of both Adaptive and Multilingual IR, with a particular focus on Digital Libraries, are detailed. The framework components are motivated by a series of results from experiments on query logs and documents from The European Library. We conclude that factoring adaptivity and multilinguality aspects into the search process can enhance the user’s experience with online Digital Libraries

    Identifying Necessary Elements for BERT’s Multilinguality

    Get PDF
    It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations and enables effective zero-shot transfer. This is suprising given that mBERT does not use any kind of crosslingual sig- nal during training. While recent literature has studied this effect, the exact reason for mBERT’s multilinguality is still unknown. We aim to identify architectural properties of BERT as well as linguistic properties of lan- guages that are necessary for BERT to become multilingual. To allow for fast experimenta- tion we propose an efficient setup with small BERT models and synthetic as well as natu- ral data. Overall, we identify six elements that are potentially necessary for BERT to be mul- tilingual. Architectural factors that contribute to multilinguality are underparameterization, shared special tokens (e.g., “[CLS]”), shared position embeddings and replacing masked to- kens with random tokens. Factors related to training data that are beneficial for multilin- guality are similar word order and comparabil- ity of corpora
    • 

    corecore