1,337 research outputs found

    Word Representation with Transferable Semantics

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.This thesis is about semantic representation which is a core research problem in text-based machine learning such as natural language processing and information retrieval. The target of this thesis is to improve representation learning methods by utilising transferable semantics extracted from source domains. Specifically, this thesis aims to address four research questions: 1) how to reliably transfer semantics from a structural knowledge base to an unstructured representation space; 2) how to reliably transfer semantics from multiple source domains to a low-resource target domain; 3) how to achieve the reliable and low-cost cross-lingual transfer of semantics; and 4) how to adapt semantic representations for specific applications. To solve these questions, this thesis proposes a set of effective representation methods by exploring and modeling knowledge from 1) knowledge bases; 2) multiple pre-trained embeddings; 3) high-resource languages; and 4) task-related semantics. Comprehensive experiments and case studies have been conducted to evaluate and demonstrate the superior performance of the proposed method compared with baseline methods. To conclude, this thesis proposes a set of effective methods to improve semantic representation by exploring and modeling knowledge beyond raw text and places an emphasis on encoding task-specific features for real-world applications

    A Systematic Literature Review on Image Information Needs and Behaviors

    Get PDF
    Purpose: With ready access to search engines and social media platforms, the way people find image information has evolved and diversified in the past two decades. The purpose of this paper is to provide an overview of the literature on image information needs and behaviors. Design/methodology/approach: Following an eight-step procedure for conducting systematic literature reviews, the paper presents an analysis of peer-reviewed work on image information needs and behaviors, with publications ranging from the years 1997 to 2019. Findings: Application of the inclusion criteria led to 69 peer-reviewed works. These works were synthesized according to the following categories: research methods, users targeted, image types, identified needs, search behaviors, and search obstacles. The reviewed studies show that people seek and use images for multiple reasons, including entertainment, illustration, aesthetic appreciation, knowledge construction, engagement, inspiration, and social interactions. The reviewed studies also report that common strategies for image searches include keyword searches with short queries, browsing, specialization, and reformulation. Observed trends suggest common deployment of query analysis, survey questionnaires, and undergraduate participant pools to research image information needs and behavior. Originality: At this point, after more than two decades of image information needs research, a holistic systematic review of the literature was long overdue. The way users find image information has evolved and diversified due to technological developments in image retrieval. By synthesizing this burgeoning field into specific foci, this systematic literature review provides a foundation for future empirical investigation. With this foundation set, the paper then pinpoints key research gaps to investigate, particularly the influence of user expertise, a need for more diverse population samples, a dearth of qualitative data, new search features, and information and visual literacies instruction

    Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

    Full text link
    This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. This shared task is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of HIPE-2022, run as an evaluation lab of the CLEF 2022 conference, is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets. Tasks, corpora, and results of participating teams are presented. Compared to the condensed overview [1], this paper contains more refined statistics on the datasets, a break down of the results per type of entity, and a discussion of the ‘challenges’ proposed in the shared task

    Translation Alignment Applied to Historical Languages: methods, evaluation, applications, and visualization

    Get PDF
    Translation alignment is an essential task in Digital Humanities and Natural Language Processing, and it aims to link words/phrases in the source text with their translation equivalents in the translation. In addition to its importance in teaching and learning historical languages, translation alignment builds bridges between ancient and modern languages through which various linguistics annotations can be transferred. This thesis focuses on word-level translation alignment applied to historical languages in general and Ancient Greek and Latin in particular. As the title indicates, the thesis addresses four interdisciplinary aspects of translation alignment. The starting point was developing Ugarit, an interactive annotation tool to perform manual alignment aiming to gather training data to train an automatic alignment model. This effort resulted in more than 190k accurate translation pairs that I used for supervised training later. Ugarit has been used by many researchers and scholars also in the classroom at several institutions for teaching and learning ancient languages, which resulted in a large, diverse crowd-sourced aligned parallel corpus allowing us to conduct experiments and qualitative analysis to detect recurring patterns in annotators’ alignment practice and the generated translation pairs. Further, I employed the recent advances in NLP and language modeling to develop an automatic alignment model for historical low-resourced languages, experimenting with various training objectives and proposing a training strategy for historical languages that combines supervised and unsupervised training with mono- and multilingual texts. Then, I integrated this alignment model into other development workflows to project cross-lingual annotations and induce bilingual dictionaries from parallel corpora. Evaluation is essential to assess the quality of any model. To ensure employing the best practice, I reviewed the current evaluation procedure, defined its limitations, and proposed two new evaluation metrics. Moreover, I introduced a visual analytics framework to explore and inspect alignment gold standard datasets and support quantitative and qualitative evaluation of translation alignment models. Besides, I designed and implemented visual analytics tools and reading environments for parallel texts and proposed various visualization approaches to support different alignment-related tasks employing the latest advances in information visualization and best practice. Overall, this thesis presents a comprehensive study that includes manual and automatic alignment techniques, evaluation methods and visual analytics tools that aim to advance the field of translation alignment for historical languages

    Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

    Full text link
    The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

    Ontology Localization

    Get PDF
    Nuestra meta principal en esta tesis es proponer una solución para construir una ontología multilingüe, a través de la localización automática de una ontología. La noción de localización viene del área de Desarrollo de Software que hace referencia a la adaptación de un producto de software a un ambiente no nativo. En la Ingeniería Ontológica, la localización de ontologías podría ser considerada como un subtipo de la localización de software en el cual el producto es un modelo compartido de un dominio particular, por ejemplo, una ontología, a ser usada por una cierta aplicación. En concreto, nuestro trabajo introduce una nueva propuesta para el problema de multilingüismo, describiendo los métodos, técnicas y herramientas para la localización de recursos ontológicos y cómo el multilingüismo puede ser representado en las ontologías. No es la meta de este trabajo apoyar una única propuesta para la localización de ontologías, sino más bien mostrar la variedad de métodos y técnicas que pueden ser readaptadas de otras áreas de conocimiento para reducir el costo y esfuerzo que significa enriquecer una ontología con información multilingüe. Estamos convencidos de que no hay un único método para la localización de ontologías. Sin embargo, nos concentramos en soluciones automáticas para la localización de estos recursos. La propuesta presentada en esta tesis provee una cobertura global de la actividad de localización para los profesionales ontológicos. En particular, este trabajo ofrece una explicación formal de nuestro proceso general de localización, definiendo las entradas, salidas, y los principales pasos identificados. Además, en la propuesta consideramos algunas dimensiones para localizar una ontología. Estas dimensiones nos permiten establecer una clasificación de técnicas de traducción basadas en métodos tomados de la disciplina de traducción por máquina. Para facilitar el análisis de estas técnicas de traducción, introducimos una estructura de evaluación que cubre sus aspectos principales. Finalmente, ofrecemos una vista intuitiva de todo el ciclo de vida de la localización de ontologías y esbozamos nuestro acercamiento para la definición de una arquitectura de sistema que soporte esta actividad. El modelo propuesto comprende los componentes del sistema, las propiedades visibles de esos componentes, las relaciones entre ellos, y provee además, una base desde la cual sistemas de localización de ontologías pueden ser desarrollados. Las principales contribuciones de este trabajo se resumen como sigue: - Una caracterización y definición de los problemas de localización de ontologías, basado en problemas encontrados en áreas relacionadas. La caracterización propuesta tiene en cuenta tres problemas diferentes de la localización: traducción, gestión de la información, y representación de la información multilingüe. - Una metodología prescriptiva para soportar la actividad de localización de ontologías, basada en las metodologías de localización usadas en Ingeniería del Software e Ingeniería del Conocimiento, tan general como es posible, tal que ésta pueda cubrir un amplio rango de escenarios. - Una clasificación de las técnicas de localización de ontologías, que puede servir para comparar (analíticamente) diferentes sistemas de localización de ontologías, así como también para diseñar nuevos sistemas, tomando ventaja de las soluciones del estado del arte. - Un método integrado para construir sistemas de localización de ontologías en un entorno distribuido y colaborativo, que tenga en cuenta los métodos y técnicas más apropiadas, dependiendo de: i) el dominio de la ontología a ser localizada, y ii) la cantidad de información lingüística requerida para la ontología final. - Un componente modular para soportar el almacenamiento de la información multilingüe asociada a cada término de la ontología. Nuestra propuesta sigue la tendencia actual en la integración de la información multilingüe en las ontologías que sugiere que el conocimiento de la ontología y la información lingüística (multilingüe) estén separados y sean independientes. - Un modelo basado en flujos de trabajo colaborativos para la representación del proceso normalmente seguido en diferentes organizaciones, para coordinar la actividad de localización en diferentes lenguajes naturales. - Una infraestructura integrada implementada dentro del NeOn Toolkit por medio de un conjunto de plug-ins y extensiones que soporten el proceso colaborativo de localización de ontologías

    Україна – Канада: сучасні наукові студії

    Get PDF
    The materials of the international collective monograph show the latest Ukrainian-Canadian socio-political, historical, philological, cultural, educational and pedagogical research in the field of modern Canadian Studies. The monograph includes the investigations by several scientists from Ukraine and Canada (from Edmonton, Lutsk, Kyiv, Lviv, and Sumy). Such publication comes out in Ukraine for the first time. For scholars, postgraduates and doctoral students, undergraduates and lecturers of the faculties of international relations, foreign philology, history, political science, philology and journalism, education and social work, Canadian centres in Ukraine and centres of Ukrainian Studies in Canada, as well as for anyone interested in research of Ukrainian-Canadian relations

    Cultural Heritage on line

    Get PDF
    The 2nd International Conference "Cultural Heritage online – Empowering users: an active role for user communities" was held in Florence on 15-16 December 2009. It was organised by the Fondazione Rinascimento Digitale, the Italian Ministry for Cultural Heritage and Activities and the Library of Congress, through the National Digital Information Infrastructure and Preservation Program - NDIIP partners. The conference topics were related to digital libraries, digital preservation and the changing paradigms, focussing on user needs and expectations, analysing how to involve users and the cultural heritage community in creating and sharing digital resources. The sessions investigated also new organisational issues and roles, and cultural and economic limits from an international perspective

    The Transformative Potential of Attorney Bilingualism

    Get PDF
    In contemporary U.S. law practice, attorney bilingualism is increasingly valued, primarily because it allows lawyers to work more efficiently and to pursue a broader range of professional opportunities. This purely functionalist conceptualization of attorney bilingualism, however, ignores the surprising ways in which multilingualism can enhance a lawyer\u27s professional work and can strengthen and reshape relationships among actors in the U.S. legal milieu. Drawing upon research from psychology, linguistics, and other disciplines, this Article advances a theory of the transformative potential of attorney bilingualism. Looking first to the development of lawyers themselves, the Article posits that attorneys who operate bilingually may, over time, enjoy cognitive advantages such as enhanced creative thinking and problem-solving abilities, a more analytical orientation to language, and greater communicative sensitivity. Moreover, the existence of lawyers who are fully immersed in the bilingual practice of law will transform and invigorate interactions between attorneys and limited English proficient (LEP) clients and, more broadly, among attorneys, the parties to a proceeding, and legal decision makers. Although many U.S. lawyers possess non-English language ability, few are equipped with the complement of knowledge, skills, and values needed to utilize that language ability effectively in a professional setting. Therefore, the Article also calls upon the legal profession to adopt a more rigorous approach to bilingual training and instruction and outlines a set of competencies that underlie effective bilingual lawyering. These competencies relate broadly to cross-cultural interactions, knowledge of foreign legal systems, specialized and versatile language ability, and verbal and nonverbal communication skills
    corecore