36 research outputs found

    Semantic Enrichment of a Multilingual Archive with Linked Open Data

    Get PDF
    This paper introduces MERCKX, a Multilingual Entity/Resource Combiner & Knowledge eXtractor. A case study involving the semantic enrichment of a multilingual archive is presented with the aim of assessing the relevance of natural language processing techniques such as named-entity recognition and entity linking for cultural heritage material. In order to improve the indexing of historical collections, we map entities to the Linked Open Data cloud using a language-independent method. Our evaluation shows that MERCKX outperforms similar tools on the task of place disambiguation and linking, achieving over 80% precision despite lower recall scores. These results are encouraging for small and medium-size cultural institutions since they demonstrate that semantic enrichment can be achieved with limited resources.Peer reviewe

    Textual Curation

    Get PDF
    This article explores textual curation as a conceptualization of authorship and composition within large information structures that is heavily based on the canon of arrangement. This work is often undertaken through distributed collaboration, thus complicating traditional conceptions of authorial attribution and agency. Central curatorial processes include critical recomposition of prior texts along with the development of small and often invisible textual elements such as architecture, metadata, and strategic links. I offer a grounded definition of textual curation that draws from traditional curatorial fields such as Museum Studies and Library Science as well as Writing Studies’ own subfield of Technical Communication, which focuses heavily on recomposed, collaboratively produced texts. Selected Wikipedia articles serve as case studies for examining live curatorial work in open, collaborative environments

    Linking named entities to Wikipedia

    Get PDF
    Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

    Natural Language Processing and Language Technologies for the Basque Language

    Get PDF
    The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years. In order to develop advanced systems that are considered the basics for an efficient digital communication (e.g. machine translation systems, text-to-speech and speech-to-text converters and digital assistants), it is necessary to digitalise linguistic resources and create tools. In the case of Basque, scholars have studied the creation of digital linguistic resources and the tools that allow the development of those systems for the last forty years. In this paper, we present an overview of the natural language processing and language technology resources developed for Basque, their impact in the process of making Basque a “digital language” and the applications and challenges in multilingual communication. More precisely, we present the well-known products for Basque, the basic tools and the resources that are behind the products we use every day. Likewise, we would like that this survey serves as a guide for other minority languages that are making their way to digitalisation. Received: 05 April 2022 Accepted: 20 May 202

    Towards Population of Knowledge Bases from Conversational Sources

    Get PDF
    With an increasing amount of data created daily, it is challenging for users to organize and discover information from massive collections of digital content (e.g., text and speech). The population of knowledge bases requires linking information from unstructured sources (e.g., news articles and web pages) to structured external knowledge bases (e.g., Wikipedia), which has the potential to advance information archiving and access, and to support knowledge discovery and reasoning. Because of the complexity of this task, knowledge base population is composed of multiple sub-tasks, including the entity linking task, defined as linking the mention of entities (e.g., persons, organizations, and locations) found in documents to their referents in external knowledge bases and the event task, defined as extracting related information for events that should be entered in the knowledge base. Most prior work on tasks related to knowledge base population has focused on dissemination-oriented sources written in the third person (e.g., new articles) that benefit from two characteristics: the content is written in formal language and is to some degree self-contextualized, and the entities mentioned (e.g., persons) are likely to be widely known to the public so that rich information can be found from existing general knowledge bases (e.g., Wikipedia and DBpedia). The work proposed in this thesis focuses on tasks related to knowledge base population for conversational sources written in the first person (e.g., emails and phone recordings), which offers new challenges. One challenge is that most conversations (e.g., 68% of the person names and 53% of the organization names in Enron emails) refer to entities that are known to the conversational participants but not widely known. Thus, existing entity linking techniques relying on general knowledge bases are not appropriate. Another challenge is that some of the shared context between participants in first-person conversations may be implicit and thus challenging to model, increasing the difficulty, even for human annotators, of identifying the true referents. This thesis focuses on several tasks relating to the population of knowledge bases for conversational content: the population of collection-specific knowledge bases for organization entities and meetings from email collections; the entity linking task that resolves the mention of three types of entities (person, organization, and location) found in both conversational text (emails) and speech (phone recordings) sources to multiple knowledge bases, including a general knowledge base built from Wikipedia and collection-specific knowledge bases; the meeting linking task that links meeting-related email messages to the referenced meeting entries in the collection-specific meeting knowledge base; and speaker identification techniques to improve the entity linking task for phone recordings without known speakers. Following the model-based evaluation paradigm, three collections (namely, Enron emails, Avocado emails, and Enron phone recordings) are used as the representations of conversational sources, new test collections are created for each task, and experiments are conducted for each task to evaluate the efficacy of the proposed methods and to provide a comparison to existing state-of-the-art systems. This work has implications in the research fields of e-discovery, scientific collaboration, speaker identification, speech retrieval, and privacy protection

    Assessing the motivators and barriers of interorganizational GIS data sharing for address data in South Africa

    Get PDF
    Address data within geographic information systems (GIS) is used as reference data to link personal and administrative information, thus making it possible to locate and deliver goods and services to eligible persons. Preferably, every country must develop and maintain a single national address database (NAD) to eliminate data redundancy and provide a common point of reference across the board. In South Africa, the challenge is that there are separate address databases, which are developed and maintained by various public and private organizations – with little or no cooperation on data sharing. Currently, the establishment of a Committee for Spatial Information (CSI) which is tasked with the implementation of the South African Spatial Data Infrastructure (SASDI) and the publication of the South African Address Standard (SANS 1883) offer organizations an opportunity to collaborate towards the creation of a single address dataset. This research posits that the implementation of a successful data sharing initiative depends on the understanding of motivators and barriers of organizations participating in it. The research applied the case study method – with a semi-structured questionnaire – to assess the issues that motivate or obstruct GIS data sharing among three address organizations in South Africa. The results identified significant motivators that underlie the data sharing activities, e.g. reduced cost of data collection, improved data quality; and equally identified significant barriers that make organizations reluctant to enter into a data sharing initiative, e.g. data copyright and ownership, high staff-turnover, and lack of financial and technical resources. Although the case studies focused on address data in South Africa, the research findings can equally apply to other spatial datasets and are relevant for the successful implementation of the South African Spatial Data Infrastructure (SASDI).Dissertation (MIT)--University of Pretoria, 2012.Computer ScienceUnrestricte

    Making maps that matter?: the role of geospatial information in addressing rural landscape change

    Get PDF
    Rural communities with bountiful natural amenities are attracting unprecedented in-migration. When unmanaged, the ensuing development threatens the ecological and cultural assets that are driving growth and valued by many residents. Despite the availability of geospatial analysis and visualization tools that seem well-suited to aiding community deliberations about land use planning and common pool resources, these tools have rarely been shown to effectively help communities understand and address threats to their landscape. Through a multi-year, mixed-method participatory research process with community partners in Macon County, North Carolina, I have studied the potential of geospatial information to enjoy increased local relevance, become more accessible to local discussions, and better engage local stakeholders. I co-developed an iterative research process that draws on critical GIS and participatory research traditions, using ethnographic interviews to guide geospatial analysis and mapping. I produced maps and landscape visualizations that successfully contributed to efforts to engage local residents in discussions about their changing community. I also studied how maps contribute to local planning efforts and their effect on attitudes towards planning. I found that maps designed to be relevant to local planning discussions can support more deliberative discussion and successful public engagement, aid in the recognition and articulation of shared community goals that challenge dominant pro-growth narratives, and enhance local capacity for planning and resource management. Further, the maps produced in community-driven processes both reflect and shape the shifting discursive strategies through which land use planning or conservation advocates navigate amenity migration landscapes. However, simply supplying visual information about growth and development trends in an experimental mail survey did not affect attitudes towards planning measures. This research addresses critical but often unasked questions about the relationship between research and on-the-ground outcomes. It should be of interest to landscape change researchers who want their findings to inform land use decision making, critical GIS scholars who are interested in applications, participatory researchers interested in GIS and iterative research designs, and local leaders who want to better engage residents in thinking about changing landscapes and growth management

    Georisks in the Mediterranean and their mitigation

    Get PDF
    An international scientific conference organised by the Seismic Monitoring and Research Unit, Department of Geoscience, Faculty of Science, Department of Civil and Structural Engineering and Department of Construction and Property Management, Faculty of the Built Environment, University of Malta.Part of the SIMIT project: Integrated civil protection system for the Italo-Maltese cross-border area. Italia-Malta Programme – Cohesion Policy 2007-2013This conference is one of the activities organised within the SIMIT strategic project (Integrated Cross-Border Italo-Maltese System of Civil Protection), Italia-Malta Operational Programme 2007 – 2013. SIMIT aims to establish a system of collaboration in Civil Protection procedures and data management between Sicilian and Maltese partners, so as to guarantee the safety and protection of the citizens and infrastructure of the cross-border area. It is led by the Department of Civil Protection of the Sicilian region, and has as other partners the Department of Civil Protection of Malta and the Universities of Palermo, Catania and Malta. SIMIT was launched in March 2013, and will come to a close in October 2015. Ever since the initial formulation of the project, it has been recognised that a state of national preparedness and correct strategies in the face of natural hazards cannot be truly effective without a sound scientific knowledge of the hazards and related risks. The University of Malta, together with colleagues from other Universities in the project, has been contributing mostly to the gathering and application of scientific knowledge, both in earthquake hazard as well as in building vulnerability. The issue of seismic hazard in the cross-border region has been identified as deserving foremost importance. South-East Sicily in particular has suffered on more than one occasion the effects of large devastating earthquakes. Malta, although fortunately more removed from the sources of such large earthquakes, has not been completely spared of their damaging effects. The drastic increase in the building density over recent decades has raised the level of awareness and concern of citizens and authorities about our vulnerability. These considerations have spurred scientists from the cross-border region to work together towards a deeper understanding of the underlying causes and nature of seismic and associated hazards, such as landslide and tsunami. The SIMIT project has provided us with the means of improving earthquake surveillance and analysis in the Sicily Channel and further afield in the Mediterranean, as well as with facilities to study the behaviour of our rocks and buildings during earthquake shaking. The role of the civil engineering community in this endeavour cannot be overstated, and this is reflected in the incorporation, from the beginning, of the civil engineering component in the SIMIT project. Constructing safer buildings is now accepted to be the major option towards human loss mitigation during strong earthquakes, and this project has provided us with a welcome opportunity for interaction between the two disciplines. Finally the role of the Civil Protection authorities must occupy a central position, as we recognize the importance of their prevention, coordination and intervention efforts, aided by the input of the scientific community. This conference brings together a diversity of geoscientists and engineers whose collaboration is the only way forward to tackling issues and strategies for risk mitigation. Moreover we welcome the contribution of participants from farther afield than the Central Mediterranean, so that their varied experience may enhance our efforts. We are proud to host the conference in the historic city of Valletta, in the heart of the Mediterranean, which also serves as a constant reminder of the responsibility of all regions to protect and conserve our collective heritage.peer-reviewe
    corecore