62 research outputs found
A Quadruple-Based Text Analysis System for History and Philosophy of Science
abstract: Computational tools in the digital humanities often either work on the macro-scale, enabling researchers to analyze huge amounts of data, or on the micro-scale, supporting scholars in the interpretation and analysis of individual documents. The proposed research system that was developed in the context of this dissertation ("Quadriga System") works to bridge these two extremes by offering tools to support close reading and interpretation of texts, while at the same time providing a means for collaboration and data collection that could lead to analyses based on big datasets. In the field of history of science, researchers usually use unstructured data such as texts or images. To computationally analyze such data, it first has to be transformed into a machine-understandable format. The Quadriga System is based on the idea to represent texts as graphs of contextualized triples (or quadruples). Those graphs (or networks) can then be mathematically analyzed and visualized. This dissertation describes two projects that use the Quadriga System for the analysis and exploration of texts and the creation of social networks. Furthermore, a model for digital humanities education is proposed that brings together students from the humanities and computer science in order to develop user-oriented, innovative tools, methods, and infrastructures.Dissertation/ThesisDoctoral Dissertation Biology 201
Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population
2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s
Connecting works of art within the semantic web of symbolic meanings
My doctoral research is about the modelling of symbolism in the cultural heritage domain, and on connecting artworks based on their symbolism through knowledge extraction and representation techniques. In particular, I participated in the design of two ontologies: one models the relationships between a symbol, its symbolic meaning, and the cultural context in which the symbol symbolizes the symbolic meaning; the second models artistic interpretations of a cultural heritage object from an iconographic and iconological (thus also symbolic) perspective. I also converted several sources of unstructured data, a dictionary of symbols and an encyclopaedia of symbolism, and semi-structured data, DBpedia and WordNet, to create HyperReal, the first knowledge graph dedicated to conventional cultural symbolism. By making use of HyperReal's content, I showed how linked open data about cultural symbolism could be utilized to initiate a series of quantitative studies that analyse (i) similarities between cultural contexts based on their symbologies, (ii) broad symbolic associations, (iii) specific case studies of symbolism such as the relationship between symbols, their colours, and their symbolic meanings. Moreover, I developed a system that can infer symbolic, cultural context-dependent interpretations from artworks according to what they depict, envisioning potential use cases for museum curation. I have then re-engineered the iconographic and iconological statements of Wikidata, a widely used general-domain knowledge base, creating ICONdata: an iconographic and iconological knowledge graph. ICONdata was then enriched with automatic symbolic interpretations. Subsequently, I demonstrated the significance of enhancing artwork information through alignment with linked open data related to symbolism, resulting in the discovery of novel connections between artworks. Finally, I contributed to the creation of a software application. This application leverages established connections, allowing users to investigate the symbolic expression of a concept across different cultural contexts through the generation of a three-dimensional exhibition of artefacts symbolising the chosen concept
Novel Event Detection and Classification for Historical Texts
Event processing is an active area of research in the Natural Language Processing community but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts particularly in the current era of massive digitization of historical sources: research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorised into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry so far underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus and new models for automatic annotation
Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature
The volume of archaeological reports being produced since the introduction of PG161
has
significantly increased, as a result of the increased volume of archaeological investigations
conducted by academic and commercial archaeology. It is highly desirable to be able to
search effectively within and across such reports in order to find information that promotes
quality research. A potential dissemination of information via semantic technologies offers
the opportunity to improve archaeological practice, not only by enabling access to
information but also by changing how information is structured and the way research is
conducted.
This thesis presents a method for automatic semantic indexing of archaeological greyliterature
reports using rule-based Information Extraction techniques in combination with
domain-specific ontological and terminological resources. This semantic annotation of
contextual abstractions from archaeological grey-literature is driven by Natural Language
Processing (NLP) techniques which are used to identify ârichâ meaningful pieces of text,
thus overcoming barriers in document indexing and retrieval imposed by the use of natural
language. The semantic annotation system (OPTIMA) performs the NLP tasks of Named
Entity Recognition, Relation Extraction, Negation Detection and Word Sense
disambiguation using hand-crafted rules and terminological resources for associating
contextual abstractions with classes of the ISO Standard (ISO 21127:2006) CIDOC
Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension,
CRM-EH, together with concepts from English Heritage thesauri and glossaries.
The results demonstrate that the techniques can deliver semantic annotations of
archaeological grey literature documents with respect to the domain conceptual models.
Such semantic annotations have proven capable of supporting semantic query, document
study and cross-searching via web based applications. The research outcomes have
provided semantic annotations for the Semantic Technologies for Archaeological
Resources (STAR) project, which explored the potential of semantic technologies in the
integration of archaeological digital resources. The thesis represents the first discussion on
the employment of CIDOC CRM and CRM-EH in semantic annotation of grey-literature
documents using rule-based Information Extraction techniques driven by a supplementary
exploitation of domain-specific ontological and terminological resources. It is anticipated
that the methods can be generalised in the future to the broader field of Digital Humanities
Semantic Domains in Akkadian Text
The article examines the possibilities offered by language technology for analyzing semantic fields in Akkadian. The corpus of data for our research group is the existing electronic corpora, Open richly annotated cuneiform corpus (ORACC). In addition to more traditional Assyriological methods, the article explores two language technological methods: Pointwise mutual information (PMI) and Word2vec.Peer reviewe
CyberResearch on the Ancient Near East and Eastern Mediterranean
CyberResearch on the Ancient Near East and Neighboring Regions provides case studies on archaeology, objects, cuneiform texts, and online publishing, digital archiving, and preservation.
Eleven chapters present a rich array of material, spanning the fifth through the first millennium BCE, from Anatolia, the Levant, Mesopotamia, and Iran. Customized cyber- and general glossaries support readers who lack either a technical background or familiarity with the ancient cultures. Edited by Vanessa Bigot Juloux, Amy Rebecca Gansell, and Alessandro Di Ludovico, this volume is dedicated to broadening the understanding and accessibility of digital humanities tools, methodologies, and results to Ancient Near Eastern Studies. Ultimately, this book provides a model for introducing cyber-studies to the mainstream of humanities research
Pattern-based design applied to cultural heritage knowledge graphs
Ontology Design Patterns (ODPs) have become an established and recognised
practice for guaranteeing good quality ontology engineering. There are several
ODP repositories where ODPs are shared as well as ontology design methodologies
recommending their reuse. Performing rigorous testing is recommended as well
for supporting ontology maintenance and validating the resulting resource
against its motivating requirements. Nevertheless, it is less than
straightforward to find guidelines on how to apply such methodologies for
developing domain-specific knowledge graphs. ArCo is the knowledge graph of
Italian Cultural Heritage and has been developed by using eXtreme Design (XD),
an ODP- and test-driven methodology. During its development, XD has been
adapted to the need of the CH domain e.g. gathering requirements from an open,
diverse community of consumers, a new ODP has been defined and many have been
specialised to address specific CH requirements. This paper presents ArCo and
describes how to apply XD to the development and validation of a CH knowledge
graph, also detailing the (intellectual) process implemented for matching the
encountered modelling problems to ODPs. Relevant contributions also include a
novel web tool for supporting unit-testing of knowledge graphs, a rigorous
evaluation of ArCo, and a discussion of methodological lessons learned during
ArCo development
When linguistics meets web technologies. Recent advances in modelling linguistic linked data
This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models
- âŠ