18 research outputs found

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    Similarity Models in Distributional Semantics using Task Specific Information

    Get PDF
    In distributional semantics, the unsupervised learning approach has been widely used for a large number of tasks. On the other hand, supervised learning has less coverage. In this dissertation, we investigate the supervised learning approach for semantic relatedness tasks in distributional semantics. The investigation considers mainly semantic similarity and semantic classification tasks. Existing and newly-constructed datasets are used as an input for the experiments. The new datasets are constructed from thesauruses like Eurovoc. The Eurovoc thesaurus is a multilingual thesaurus maintained by the Publications Office of the European Union. The meaning of the words in the dataset is represented by using a distributional semantic approach. The distributional semantic approach collects co-occurrence information from large texts and represents the words in high-dimensional vectors. The English words are represented by using UkWaK corpus while German words are represented by using DeWaC corpus. After representing each word by the high dimensional vector, different supervised machine learning methods are used on the selected tasks. The outputs from the supervised machine learning methods are evaluated by comparing the tasks performance and accuracy with the state of the art unsupervised machine learning methods’ results. In addition, multi-relational matrix factorization is introduced as one supervised learning method in distributional semantics. This dissertation shows the multi-relational matrix factorization method as a good alternative method to integrate different sources of information of words in distributional semantics. In the dissertation, some new applications are also introduced. One of the applications is an application which analyzes a German company’s website text, and provides information about the company with a concept cloud visualization. The other applications are automatic recognition/disambiguation of the library of congress subject headings and automatic identification of synonym relations in the Dutch Parliament thesaurus applications

    Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web

    Get PDF
    The current Special Issue launched with the aim of further enlightening important CH areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production–consumption chain (i.e., push and pull ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven (11) contributions were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies placed in today’s ubiquitous society. The finally published papers are related but not limited to the following multidisciplinary topics:Digital storytelling for cultural heritage;Audience engagement in cultural heritage;Sustainability impact indicators of cultural heritage;Cultural heritage digitization, organization, and management;Collaborative cultural heritage archiving, dissemination, and management;Cultural heritage communication and education for sustainable development;Semantic services of cultural heritage;Big data of cultural heritage;Smart systems for Historical cities – smart cities;Smart systems for cultural heritage sustainability

    From social tagging to polyrepresentation: a study of expert annotating behavior of moving images

    Get PDF
    Mención Internacional en el título de doctorThis thesis investigates “nichesourcing” (De Boer, Hildebrand, et al., 2012), an emergent initiative of cultural heritage crowdsoucing in which niches of experts are involved in the annotating tasks. This initiative is studied in relation to moving image annotation, and in the context of audiovisual heritage, more specifically, within the sector of film archives. The work presents a case study of film and media scholars to investigate the types of annotations and attribute descriptions that they could eventually contribute, as well as the information needs, and seeking and searching behaviors of this group, in order to determine what the role of the different types of annotations in supporting their expert tasks would be. The study is composed of three independent but interconnected studies using a mixed methodology and an interpretive approach. It uses concepts from the information behavior discipline, and the "Integrated Information Seeking and Retrieval Framework" (IS&R) (Ingwersen and Järvelin, 2005) as guidance for the investigation. The findings show that there are several types of annotations that moving image experts could contribute to a nichesourcing initiative, of which time-based tags are only one of the possibilities. The findings also indicate that for the different foci in film and media research, in-depth indexing at the content level is only needed for supporting a specific research focus, for supporting research in other domains, or for engaging broader audiences. The main implications at the level of information infrastructure are the requirement for more varied annotating support, more interoperability among existing metadata standards and frameworks, and the need for guidelines about crowdsoucing and nichesourcing implementation in the audiovisual heritage sector. This research presents contributions to the studies of social tagging applied to moving images, to the discipline of information behavior, by proposing new concepts related to the area of use behavior, and to the concept of “polyrepresentation” (Ingwersen, 1992, 1996) applied to the humanities domain.Esta tesis investiga la iniciativa del nichesourcing (De Boer, Hildebrand, et al., 2012), como una forma de crowdsoucing en sector del patrimonio cultural, en la cuál grupos de expertos participan en las tareas de anotación de las colecciones. El ámbito de aplicación es la anotación de las imágenes en movimiento en el contexto del patrimonio audiovisual, más específicamente, en el caso de los archivos fílmicos. El trabajo presenta un estudio de caso aplicado a un dominio específico de expertos en el ámbito audiovisual: los académicos de cine y medios. El análisis se centra en dos aspectos específicos del problema: los tipos de anotaciones y atributos en las descripciones que podrían obtenerse de este nicho de expertos; y en las necesidades de información y el comportamiento informacional de dicho grupo, con el fin de determinar cuál es el rol de los diferentes tipos de anotaciones en sus tareas de investigación. La tesis se compone de tres estudios independientes e interconectados; se usa una metodología mixta e interpretativa. El marco teórico se compone de conceptos del área de estudios de comportamiento informacional (“information behavior”) y del “Marco integrado de búsqueda y recuperación de la información” ("Integrated Information Seeking and Retrieval Framework" (IS&R)) propuesto por Ingwersen y Järvelin (2005), que sirven de guía para la investigación. Los hallazgos indican que existen diversas formas de anotación de la imagen en movimiento que podrían generarse a partir de las contribuciones de expertos, de las cuáles las etiquetas a nivel de plano son sólo una de las posibilidades. Igualmente, se identificaron diversos focos de investigación en el área académica de cine y medios. La indexación detallada de contenidos sólo es requerida por uno de esos grupos y por investigadores de otras disciplinas, o como forma de involucrar audiencias más amplias. Las implicaciones más relevantes, a nivel de la infraestructura informacional, se refieren a los requisitos de soporte a formas más variadas de anotación, el requisito de mayor interoperabilidad de los estándares y marcos de metadatos, y la necesidad de publicación de guías de buenas prácticas sobre de cómo implementar iniciativas de crowdsoucing o nichesourcing en el sector del patrimonio audiovisual. Este trabajo presenta aportes a la investigación sobre el etiquetado social aplicado a las imágenes en movimiento, a la disciplina de estudios del comportamiento informacional, a la que se proponen nuevos conceptos relacionados con el área de uso de la información, y al concepto de “poli-representación” (Ingwersen, 1992, 1996) en las disciplinas humanísticas.Programa Oficial de Doctorado en Documentación: Archivos y Bibliotecas en el Entorno DigitalPresidente: Peter Emil Rerup Ingwersen.- Secretario: Antonio Hernández Pérez.- Vocal: Nils Phar

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Adoption of AI-based Information Systems from an Organizational and User Perspective

    Get PDF
    Artificial intelligence (AI) is fundamentally changing our society and economy. Companies are investing a great deal of money and time into building corresponding competences and developing prototypes with the aim of integrating AI into their products and services, as well as enriching and improving their internal business processes. This inevitably brings corporate and private users into contact with a new technology that functions fundamentally differently than traditional software. The possibility of using machine learning to generate precise models based on large amounts of data capable of recognizing patterns within that data holds great economic and social potential—for example, in task augmentation and automation, medical diagnostics, and the development of pharmaceutical drugs. At the same time, companies and users are facing new challenges that accompany the introduction of this technology. Businesses are struggling to manage and generate value from big data, and employees fear increasing automation. To better prepare society for the growing market penetration of AI-based information systems into everyday life, a deeper understanding of this technology in terms of organizational and individual use is needed. Motivated by the many new challenges and questions for theory and practice that arise from AI-based information systems, this dissertation addresses various research questions with regard to the use of such information systems from both user and organizational perspectives. A total of five studies were conducted and published: two from the perspective of organizations and three among users. The results of these studies contribute to the current state of research and provide a basis for future studies. In addition, the gained insights enable recommendations to be derived for companies wishing to integrate AI into their products, services, or business processes. The first research article (Research Paper A) investigated which factors and prerequisites influence the success of the introduction and adoption of AI. Using the technology–organization–environment framework, various factors in the categories of technology, organization, and environment were identified and validated through the analysis of expert interviews with managers experienced in the field of AI. The results show that factors related to data (especially availability and quality) and the management of AI projects (especially project management and use cases) have been added to the framework, but regulatory factors have also emerged, such as the uncertainty caused by the General Data Protection Regulation. The focus of Research Paper B is companies’ motivation to host data science competitions on online platforms and which factors influence their success. Extant research has shown that employees with new skills are needed to carry out AI projects and that many companies have problems recruiting such employees. Therefore, data science competitions could support the implementation of AI projects via crowdsourcing. The results of the study (expert interviews among data scientists) show that these competitions offer many advantages, such as exchanges and discussions with experienced data scientists and the use of state-of-the-art approaches. However, only a small part of the effort related to AI projects can be represented within the framework of such competitions. The studies in the other three research papers (Research Papers C, D, and E) examine AI-based information systems from a user perspective, with two studies examining user behavior and one focusing on the design of an AI-based IT artifact. Research Paper C analyses perceptions of AI-based advisory systems in terms of the advantages associated with their use. The results of the empirical study show that the greatest perceived benefit is the convenience such systems provide, as they are easy to access at any time and can immediately satisfy informational needs. Furthermore, this study examined the effectiveness of 11 different measures to increase trust in AI-based advisory systems. This showed a clear ranking of measures, with effectiveness decreasing from non-binding testing to providing additional information regarding how the system works to adding anthropomorphic features. The goal of Research Paper D was to investigate actual user behavior when interacting with AI-based advisory systems. Based on the theoretical foundations of task–technology fit and judge–advisor systems, an online experiment was conducted. The results show that, above all, perceived expertise and the ability to make efficient decisions through AI-based advisory systems influence whether users assess these systems as suitable for supporting certain tasks. In addition, the study provides initial indications that users might be more willing to follow the advice of AI-based systems than that of human advisors. Finally, Research Paper E designs and implements an IT artifact that uses machine learning techniques to support structured literature reviews. Following the approach of design science research, an artifact was iteratively developed that can automatically download research articles from various databases and analyze and group them according to their content using the word2vec algorithm, the latent Dirichlet allocation model, and agglomerative hierarchical cluster analysis. An evaluation of the artifact on a dataset of 308 publications shows that it can be a helpful tool to support literature reviews but that much manual effort is still required, especially with regard to the identification of common concepts in extant literature

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Term-driven E-Commerce

    Get PDF
    Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading
    corecore