717 research outputs found

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Design of a Controlled Language for Critical Infrastructures Protection

    Get PDF
    We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen

    Multilingual and intercultural communication in and beyond the UK asylum process: a linguistic ethnographic case study of legal advice-giving across cultural and linguistic borders

    Get PDF
    This thesis investigates how asylum applicants and refugees in the UK, and legal professionals, communicate multilingually and interculturally within legal advice meetings concerning the processes of applying for asylum and for refugee family reunion. The thesis addresses the important question of how English-speaking immigration legal advisors negotiate understanding with clients from a range of linguistic and cultural backgrounds in order to deliver crucial legal advice and support. Adopting a critical social constructionist perspective on language, culture, and communication, the thesis explores how a diverse range of linguistic, languacultural and discursive resources are employed to communicate within legal advice-giving. The thesis offers an in-depth analysis of legal-lay communication in the co-operative professional mediation setting of legal advice, contrasting with, and complementing, the existing literature on multilingual and intercultural communication in institutional gatekeeping contexts. The research takes a linguistic ethnographic case study approach, applying methodological perspectives on researching multilingually and theoretical perspectives from institutional ethnography. It combines ethnographic fieldwork within an advice service offering asylum and refugee legal advice with linguistic analysis of observations and audio recordings of advice meeting interactions.  The linguistic analysis combines the micro-analytic tools of interactional sociolinguistics with a communicative activity type analysis of the discursive structuring of legal advice interactions, and a transcontextual analysis of the range of texts entering into the interaction. The thesis demonstrates how refugee and asylum legal advice interactions are contextually framed by legal institutional intertextual hierarchies, which constrain, but also provide resources for, the purposeful communication taking place. It also demonstrates how a flexibly applied communicative activity type structure functions as a discursive tool to support intercultural communication. The thesis contributes to the fields of intercultural communication studies and professional and legal communication studies, and responds to broader issues of language and social justice, and the linguistic accessibility of institutions

    Development of Domain Specific Cluster : An Integrated Framework for College Libraries under the University of Burdwan

    Get PDF
    This paper discusses the development of six domain specific cluster software in the college libraries under the University of Burdwan. Library is the heart of educational institutions. So, as to select the open source relevant with comprehensive software and global parameters on the basis of global recommendations like IFLA-Working Group, Integrated Library System for Discovery Interface (ILS-DI), Request for Proposals (RFP), Request for Comments (RFC), Service Oriented Architectre (SOA) and Open Library Environment Projects (OLE) including the areas like integrated library system cluster, digital media archiving cluster, content management system cluster, learning content management system cluster, federated search system cluster and college communication interaction cluster for designing and developing the college libraries under the University of Burdwan. Also develop the single window based interface in six domain specific cluster for the college librarians and the users to access their necessary resources through open source software and open standards. These six domain specific cluster softwares are to be selected for easily managed the digital and library resources in the college libraries affiliated to the University of Burdwan. This integrated framework can easily managed the housekeeping operations and information retrieval systems like acquisition, cataloguing, circulation, member generation, authority control, report generation and online public access catalogue for the users as well as library professionals also

    META-LEARNING NEURAL MACHINE TRANSLATION CURRICULA

    Get PDF
    Curriculum learning hypothesizes that presenting training samples in a meaningful order to machine learners during training helps improve model quality and conver- gence rate. In this dissertation, we explore this framework for learning in the context of Neural Machine Translation (NMT). NMT systems are typically trained on a large amount of heterogeneous data and have the potential to benefit greatly from curricu- lum learning in terms of both speed and quality. We concern ourselves with three primary questions in our investigation : (i) how do we design a task and/or dataset specific curriculum for NMT training? (ii) can we leverage human intuition about learning in this design or can we learn the curriculum itself? (iii) how do we featurize training samples (e.g., easy versus hard) so that they can be effectively slotted into a curriculum? We begin by empirically exploring various hand-designed curricula and their effect on translation performance and speed of training NMT systems. We show that these curricula, most of which are based on human intuition, can improve NMT training speed but are highly sensitive to hyperparameter settings. Next, instead of using a hand-designed curriculum, we meta-learn a curriculum for the task of learning from noisy translation samples using reinforcement learning. We demonstrate that this learned curriculum significantly outperforms a random-curriculum baseline and matches the strongest hand-designed curriculum. We then extend this approach to the task of multi-lingual NMT with an emphasis on accumulating knowledge and learning from multiple training runs. Again, we show that this technique can match the strongest baseline obtained via expensive fine-grained grid search for the (learned) hyperparameters. We conclude with an extension which requires no prior knowledge of sample relevance to the task and uses sample features instead, hence learning both the relevance of each training sample to the task and the appropriate curriculum jointly. We show that this technique outperforms the state-of-the-art results on a noisy filtering task
    • …
    corecore