22,491 research outputs found

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Sentiment Analysis for Words and Fiction Characters From The Perspective of Computational (Neuro-)Poetics

    Get PDF
    Two computational studies provide different sentiment analyses for text segments (e.g., ‘fearful’ passages) and figures (e.g., ‘Voldemort’) from the Harry Potter books (Rowling, 1997 - 2007) based on a novel simple tool called SentiArt. The tool uses vector space models together with theory-guided, empirically validated label lists to compute the valence of each word in a text by locating its position in a 2d emotion potential space spanned by the > 2 million words of the vector space model. After testing the tool’s accuracy with empirical data from a neurocognitive study, it was applied to compute emotional figure profiles and personality figure profiles (inspired by the so-called ‚big five’ personality theory) for main characters from the book series. The results of comparative analyses using different machine-learning classifiers (e.g., AdaBoost, Neural Net) show that SentiArt performs very well in predicting the emotion potential of text passages. It also produces plausible predictions regarding the emotional and personality profile of fiction characters which are correctly identified on the basis of eight character features, and it achieves a good cross-validation accuracy in classifying 100 figures into ‘good’ vs. ‘bad’ ones. The results are discussed with regard to potential applications of SentiArt in digital literary, applied reading and neurocognitive poetics studies such as the quantification of the hybrid hero potential of figures

    A Bibliography on the Application of GIS in Archaeology and Cultural Heritage

    Get PDF
    Geographical Information Systems (GIS) applications to archaeological projects of different scales, chronological contexts and cultural milieux has accrued by now a long history and bibliography. Hopefully the phases of experimentation and almost blind testing are over, even if GIS applications are still sometimes being labeled as “new technologies”

    The use of digital tools for spatial analysis in population geography

    Get PDF
    Digital tools, and in particular GIS, have enormously increased the possibilities for analysis in historical geography. In this article, we shall explain how these tools can be used to study the evolution of population density over a significant period. The territorial units used will be municipalities, as they allow detailed territorial analysis. However, research projects that take municipalities as their points of reference tend to be complex because their territorial boundaries have often undergone numerous changes over the course of modern history. The same has occurred, to a greater or lesser degree, in all of the countries in Europe (Bennett, 1989). The countries that have had the most stable municipal boundaries over the past 150 years include France, Italy, and Spain, though the modifications to their boundaries have also been notable. However, like all relevant challenges, these changes also offer us new opportunities, if we are able to cope with them. In this particular case, the challenge will be to achieve the territorial homogenization of the historical municipal series. In other words, when the municipal limits have changed, it will be necessary to adapt the data from the old municipal territories to the new ones. This exercise will have a number of applications. In this article, we present just one of these: the possibility of detecting areas and periods in which, over the course of history, there has been population growth, decline, or stagnation. This will serve as a relevant indicator, or proxy, for organizing research in other fields. For example, in the case of economic history, it is clear that variations in the density of population provide clues for interpreting the territorial distribution of economic activity. We also understand that it will be possible to apply our research about Spain to other countries and that this will make it possible to evaluate the interest and results that we can expect from the homogenized work. We think that, despite its interest, this type of study has, until now, been very rare on account of the methodological difficulties involved. However, these new digital tools in the field of historical GIS, as spatial aggregation and Moran I techniques, have helped to provide solutions to assume this challenge.Partial funding was provided by the Spanish Ministry of Education (CSO2015-65733-P), the EU (Jean Monnet 562390-EPP-1- 2015-1-ES-EPPJMO), and ICREA-Academia

    The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses

    Get PDF
    This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative narrative analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC), which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies

    Archaeological practices, knowledge work and digitalisation

    Get PDF
    Defining what constitute archaeological practices is a prerequisite for understanding where and how archaeological and archaeologically relevant information and knowledge are made, what counts as archaeological information, and where the limits are situated. The aim of this position paper, developed as a part of the COST action Archaeological practices and knowledge work in the digital environment (www.arkwork.eu), is to highlight the need for at least a relative consensus on the extents of archaeological practices in order to be able to understand and develop archaeological practices and knowledge work in the contemporary digital context. The text discusses approaches to study archaeological practices and knowledge work including Nicolini’s notions of zooming in and zooming out, and proposes that a distinction between archaeological and archaeology-related practices could provide a way to negotiate the ‘archaeologicality’ of diverse practices
    • 

    corecore