1,409 research outputs found

    Towards a Soft Evaluation and Refinement of Tagging in Digital Humanities

    Get PDF
    In this paper we estimate the soundness of tagging in digital repositories within the field of Digital Humanities by studying the (semantic) conceptual structure behind the folksnonomy. The use of association rules associated to this conceptual structure (Stem and Luxenburger basis) allows to faithfully (from a semantic point of view) complete the tagging (or suggest such a completion).Ministerio de Economía y Competitividad TIN2013-41086-PJunta de Andalucía TIC-606

    Weaving creativity into the Semantic Web: a language-processing approach

    Get PDF
    This paper describes a novel language processing ap- proach to the analysis of creativity and the development of a machine-readable ontology of creativity. The ontol- ogy provides a conceptualisation of creativity in terms of a set of fourteen key components or building blocks and has application to research into the nature of cre- ativity in general and to the evaluation of creative prac- tice, in particular. We further argue that the provision of a machine readable conceptualisation of creativity pro- vides a small, but important step towards addressing the problem of automated evaluation, ’the Achilles’ heel of AI research on creativity’ (Boden 1999)

    HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report

    Get PDF
    Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report

    DARIAH and the Benelux

    Get PDF

    UCL (University College London) Libraries Masterplan: Masterplanning Report

    Get PDF
    BDP were appointed to undertake a Masterplan for the UCL Main Library and the UCL Science Library and to identify how these buildings could be re-ordered to significantly improve the quality of the library environment and to facilitate the delivery of library services. An initial brief was agreed with UCL’s Estates Management Committee and a Masterplan Steering Group established including academic representatives, library staff and design consultants. To inform the development of this brief, UCL Library Services undertook a number of consultation exercises with users of the Library; students, academic staff and external users, together with Library staff. A number of visits to exemplar library buildings in the UK and continental Europe were also undertaken to inform the development of options for the buildings. Following the development and review of initial options for both the Main Library and Science Library, it was agreed a further, hypothetical New Build Central Library Option should be reviewed, to accommodate a relocated and consolidated library service encompassing 7 of the 16 existing libraries currently distributed across the UCL Estate

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Оцифровка кириллических рукописей для исторического словаря сербского языка с использованием технологии распознавания рукописного текста

    Get PDF
    The paper explores the possibilities of using information technologies based on the principles of machine learning and artificial intelligence in the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language. Empirical research is based on the use of the Transkribus software platform in the creation of a model for automatic text recognition of the manuscripts by Gavril Stefanović Venclović, the most significant and prolific Serbian cultural enthusiast of the 18th century, whose extensive manuscript legacy in Serbian vernacular represents the most significant primary source for the historical dictionary of the Serbian language of this period. Following the results of conducted research, it can be concluded that the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language can be significantly accelerated using Transkribus by creating specific and generic models for automatic text recognition. The advantage of automatic text recognition compared to the traditional methods is particularly reflected in the possibility of continuous improvement of the performance of specific and generic models in accordance with the progress of the transcription process and the increase in the amount of digitized text that can be used to train a new version of the model. DOI: 10.31168/2305-6754.2023.1.08В статье исследуются возможности использования информационных технологий, основанных на принципах машинного обучения и искусственного интеллекта, в процессе оцифровки кириллических рукописей в целях создания исторического словаря сербского языка. Эмпирическое исследование основано на использовании программной платформы Transkribus при создании модели автоматического распознавания текста рукописей Гаврила Стефановича Венцловича, самого значительного и плодовитого сербского культурного энтузиаста XVIII века, чье обширное рукописное наследие в сербском народном языке представляет собой наиболее значительный первоисточник исторического словаря сербского языка, относящегося к этому периоду. По результатам проведенного исследования можно сделать вывод, что процесс оцифровки кириллических рукописей в целях создания исторического словаря сербского языка можно значительно ускорить с помощью Transkribus через создание определенных и генерических моделей для автоматического распознавания текста. Преимущество автоматического распознавания текста по сравнению с традиционным, в частности, выражается в возможности постоянного улучшения производительности определенных и генерических моделей в соответствии с ходом процесса транскрипции и увеличением объема оцифрованного текста, который можно использовать для обучения новой версии модели. DOI: 10.31168/2305-6754.2023.1.0

    Track on Knowledge Society Related Projects. Proceedings of the TEEM’13

    Get PDF
    TEEM (Technological Ecosystems for Enhancing Multiculturality) Conference is born within the new PhD Programme on Education in Knowledge Society at the University of Salamanca, Spain. This conference is addressing both the Social Sciences studies and the new technological advances but within a synergic and symbiotic approach. According to this perspective, a not closed set of different research lines, always with a collaborative orientation, is established, including Education Assessment and Orientation, Human-Computer Interaction, eLearning, Computers in Education, Communication Media and Education, Medicine and Education, Robotics in Education, Engineering and Education and Information Society and Education

    The Taming of the Shrew - non-standard text processing in the Digital Humanities

    Get PDF
    Natural language processing (NLP) has focused on the automatic processing of newspaper texts for many years. With the growing importance of text analysis in various areas such as spoken language understanding, social media processing and the interpretation of text material from the humanities, techniques and methodologies have to be reviewed and redefined since so called non-standard texts pose challenges on the lexical and syntactic level especially for machine-learning-based approaches. Automatic processing tools developed on the basis of newspaper texts show a decreased performance for texts with divergent characteristics. Digital Humanities (DH) as a field that has risen to prominence in the last decades, holds a variety of examples for this kind of texts. Thus, the computational analysis of the relationships of Shakespeare’s dramatic characters requires the adjustment of processing tools to English texts from the 16th-century in dramatic form. Likewise, the investigation of narrative perspective in Goethe’s ballads calls for methods that can handle German verse from the 18th century. In this dissertation, we put forward a methodology for NLP in a DH environment. We investigate how an interdisciplinary context in combination with specific goals within projects influences the general NLP approach. We suggest thoughtful collaboration and increased attention to the easy applicability of resulting tools as a solution for differences in the store of knowledge between project partners. Projects in DH are not only constituted by the automatic processing of texts but are usually framed by the investigation of a research question from the humanities. As a consequence, time limitations complicate the successful implementation of analysis techniques especially since the diversity of texts impairs the transferability and reusability of tools beyond a specific project. We answer to this with modular and thus easily adjustable project workflows and system architectures. Several instances serve as examples for our methodology on different levels. We discuss modular architectures that balance time-saving solutions and problem-specific implementations on the example of automatic postcorrection of the output text from an optical character recognition system. We address the problem of data diversity and low resource situations by investigating different approaches towards non-standard text processing. We examine two main techniques: text normalization and tool adjustment. Text normalization aims at the transformation of non-standard text in order to assimilate it to the standard whereas tool adjustment concentrates on the contrary direction of enabling tools to successfully handle a specific kind of text. We focus on the task of part-of-speech tagging to illustrate various approaches toward the processing of historical texts as an instance for non-standard texts. We discuss how the level of deviation from a standard form influences the performance of different methods. Our approaches shed light on the importance of data quality and quantity and emphasize the indispensability of annotations for effective machine learning. In addition, we highlight the advantages of problem-driven approaches where the purpose of a tool is clearly formulated through the research question. Another significant finding to emerge from this work is a summary of the experiences and increased knowledge through collaborative projects between computer scientists and humanists. We reflect on various aspects of the elaboration and formalization of research questions in the DH and assess the limitations and possibilities of the computational modeling of humanistic research questions. An emphasis is placed on the interplay of expert knowledge with respect to a subject of investigation and the implementation of tools for that purpose and the thereof resulting advantages such as the targeted improvement of digital methods through purposeful manual correction and error analysis. We show obstacles and chances and give prospects and directions for future development in this realm of interdisciplinary research
    corecore