1,250 research outputs found

    Readers and Reading in the First World War

    Get PDF
    This essay consists of three individually authored and interlinked sections. In ‘A Digital Humanities Approach’, Francesca Benatti looks at datasets and databases (including the UK Reading Experience Database) and shows how a systematic, macro-analytical use of digital humanities tools and resources might yield answers to some key questions about reading in the First World War. In ‘Reading behind the Wire in the First World War’ Edmund G. C. King scrutinizes the reading practices and preferences of Allied prisoners of war in Mainz, showing that reading circumscribed by the contingencies of a prison camp created an unique literary community, whose legacy can be traced through their literary output after the war. In ‘Book-hunger in Salonika’, Shafquat Towheed examines the record of a single reader in a specific and fairly static frontline, and argues that in the case of the Salonika campaign, reading communities emerged in close proximity to existing centres of print culture. The focus of this essay moves from the general to the particular, from the scoping of large datasets, to the analyses of identified readers within a specific geographical and temporal space. The authors engage with the wider issues and problems of recovering, interpreting, visualizing, narrating, and representing readers in the First World War

    An Automatic Partitioning of Gutenberg.org Texts

    Get PDF
    Over the last 10 years, the automatic partitioning of texts has raised the interest of the community. The automatic identification of parts of texts can provide a faster and easier access to textual analysis. We introduce here an exploratory work for multi-part book identification. In an early attempt, we focus on Gutenberg.org which is one of the projects that has received the largest public support in recent years. The purpose of this article is to present a preliminary system that automatically classifies parts of texts into 35 semantic categories. An accuracy of more than 93% on the test set was achieved. We are planning to extend this effort to other repositories in the future

    A scalable framework for cross-lingual authorship identification

    Get PDF
    This is an accepted manuscript of an article published by Elsevier in Information Sciences on 10/07/2018, available online: https://doi.org/10.1016/j.ins.2018.07.009 The accepted version of the publication may differ from the final published version.© 2018 Elsevier Inc. Cross-lingual authorship identification aims at finding the author of an anonymous document written in one language by using labeled documents written in other languages. The main challenge of cross-lingual authorship identification is that the stylistic markers (features) used in one language may not be applicable to other languages in the corpus. Existing methods overcome this challenge by using external resources such as machine translation and part-of-speech tagging. However, such solutions are not applicable to languages with poor external resources (known as low resource languages). They also fail to scale as the number of candidate authors and/or the number of languages in the corpus increases. In this investigation, we analyze different types of stylometric features and identify 10 high-performance language-independent features for cross-lingual stylometric analysis tasks. Based on these stylometric features, we propose a cross-lingual authorship identification solution that can accurately handle a large number of authors. Specifically, we partition the documents into fragments where each fragment is further decomposed into fixed size chunks. Using a multilingual corpus of 400 authors with 825 documents written in 6 different languages, we show that our method can achieve an accuracy level of 96.66%. Our solution also outperforms the best existing solution that does not rely on external resources.Published versio

    Visual Text Analysis in Digital Humanities

    Get PDF
    In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi-faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area

    AXMEDIS 2008

    Get PDF
    The AXMEDIS International Conference series aims to explore all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, protection and rights management, to address the latest developments and future trends of the technologies and their applications, impacts and exploitation. The AXMEDIS events offer venues for exchanging concepts, requirements, prototypes, research ideas, and findings which could contribute to academic research and also benefit business and industrial communities. In the Internet as well as in the digital era, cross-media production and distribution represent key developments and innovations that are fostered by emergent technologies to ensure better value for money while optimising productivity and market coverage

    Fungal isolates from the archive of the University of Coimbra: ionizing radiation response and genotypic fingerprinting

    Get PDF
    Tese de mestrado. Biologia (Biologia Celular e Biotecnologia). Universidade de Lisboa, Faculdade de Ciências, 2011A problemática da biodeterioração nos diferentes tipos de obras de arte tem vindo a tornar-se um dos pontos centrais nas áreas da conservação e restauro. Entende-se por biodeterioração, a degradação causada nos materiais por acção directa de organismos vivos ou por consequência de produtos das suas actividades vitais. Entre os materiais mais afectados encontramos o património escrito devido à diversidade de materiais orgânicos com que é constituído. A evolução dos suportes de escrita ao longo da História (e.g. Papiro, Pergaminho, Papel de fibras de trapo e Papel com origem na madeira), levou à presença de diferentes biopolímeros, como a celulose, a hemicelulose, o colagénio ou a elastina nas obras mais diversas e preciosas para o Conhecimento e Cultura das nossas Sociedades. Sendo estes polímeros descritos como substratos nutricionais para algumas formas de vida, tais como bactérias, fungos e insectos, a colonização/contaminação e consequente degradação destes documentos torna-se um passo simples, caso não sejam tomadas as medidas adequadas. Os fungos são biocontaminantes problemáticos uma vez que, possuem uma morfofisiologia que lhes permite boa penetração em diferentes substratos (hifas), bem como a capacidade de utilizar polímeros complexos como fonte nutricional (e. g. produção de enzimas hidrolíticas que lhes permite degradar moléculas complexas e usar a absorção como modo de nutrição). Nos dias de hoje, ciências como a conservação e restauro têm que ultrapassar as barreiras da revitalização pós-degradação e actuar na prevenção da contaminação (pré-degradação) sendo imperativo a detecção e identificação de contaminantes bem como de possíveis fontes e vectores de contaminação. Neste sentido, diversas medidas preventivas têm sido testadas em arquivos e bibliotecas. No entanto, a maioria apresenta “contra-indicações” tais como toxicidade para os trabalhadores e utilizadores dos diferentes espaços (ex. uso de pesticidas ou atmosferas controladas) ou a degradação dos materiais (ex. tratamentos a baixa e a alta temperatura). Tendo em conta as suas características (alta capacidade de penetração e poder letal sobre os biocontaminantes) bem como o balanço das vantagens vs. desvantagens da sua utilização, a radiação gama aparenta ser um tratamento alternativo. O seu uso como agente descontaminante em materiais de arquivos e bibliotecas tem sido testado e descrito desde o início dos anos 60, sendo que baixas doses (<10 kGy) estão descritas como suficientes para descontaminar os materiais não causando danos nas suas características. A maior parte dos estudos têm sido efectuados em documentos e livros cujo principal constituinte é o papel. O pergaminho foi inventado no século II antes de Cristo numa cidade Grega de nome Pergamo. A sua origem animal (pele de ovelha, vaca ou cabra) torna-o um material complexo cujas características estruturais são extremamente variáveis inter e intra documento, tanto devido ao processo do seu fabrico como à diferente origem animal ou às diferentes condições ambientais a que é exposto ao longo do tempo. Devido à sua resistência, foi o principal suporte de escrita durante a Idade Média tendo sido muito usado por monges copistas e em documentos religiosos. É assim, um importante ancestral do Papel que nos permite conhecer melhor o passado de diferentes sociedades. Apesar da sua importância cultural e científica bem como da sua beleza, o Pergaminho nem sempre é tratado com o devido respeito sendo cada vez mais necessário entender a sua estrutura intrínseca para melhor desenvolver medidas de conservação e protecção. Mergulhando nesta problemática o Arquivo da Universidade de Coimbra (AUC), responsável pela preservação do património cultural da Universidade e do Distrito de Coimbra, tem-se envolvido activamente em diferentes projectos científicos no sentido de encontrar soluções para os problemas de biodeterioração e de conservação que tem identificado no seu espólio. Esta tese enquadra-se num desses projectos (Mycoarchive) constituindo uma parceria entre o Instituto Tecnológico e Nuclear (ITN), a Faculdade de Ciências e Tecnologia da Universidade de Coimbra e o próprio AUC. Sendo assim, os objectivos do trabalho desenvolvido nesta dissertação foram i) Caracterizar a comunidade microbiana do ar interior do AUC, identificando possíveis factores de contaminação cruzada com os documentos contidos no espaço; ii) Caracterizar a contaminação natural de pergaminhos do AUC; iii) Estudar os padrões de inactivação por radiação gama da população microbiana dos pergaminhos a fim de propor uma dose mínima (Dmin.), para uso futuro desta tecnologia como tratamento de descontaminação; iv) Determinar potenciais alterações induzidas pela radiação gama nas propriedades físicas do pergaminho a fim de estimar uma dose máxima (Dmáx.) à qual o material pode ser exposto e v) Analisar alterações nos perfis genéticos provocadas pela exposição à radiação gama de isolados fúngicos do ar e de pergaminho do AUC, usando técnicas de PCR fingerprinting. Assim sendo, o primeiro artigo contido nesta tese descreve os trabalhos efectuados no sentido de caracterizar a comunidade microbiana do ar do interior do AUC. Os resultados apontam para uma baixa contaminação microbiana (<200 CFU/m3), sendo os tipos morfológicos mais frequentes os cocos gram positivos, catalase positivos e os fungos filamentosos. No sentido de identificar correlações entre a comunidade microbiana do ar interior do AUC e a contaminação de pergaminhos presentes no mesmo espaço, validou-se uma metodologia de determinação da carga microbiana de pergaminho. Os resultados apontam para uma baixa contaminação microbiana (<102 CFU/cm2) sendo esta essencialmente constituída pelos mesmos tipos morfológicos anteriormente encontrados no ar interior do AUC. No entanto, a identificação das espécies fúngicas isoladas destes dois ambientes, não permitiu avaliar o papel do ar como vector de contaminação de forma conclusiva. A aplicação da radiação gama como tratamento de descontaminação deste material foi igualmente avaliada. Os resultados demonstraram que a população microbiana deste tipo de documentos não apresenta uma cinética de inactivação exponencial, porém foi obtida uma eficiência de inactivação maior que 99% a partir dos 5 kGy. Assim, propõe-se uma dose de 5 kGy como dose mínima (Dmin.) de radiação gama a aplicar como tratamento de descontaminação do pergaminho. Num tratamento de esterilização ou de descontaminação há que ter em conta, não só a inactivação dos contaminantes mas também os efeitos do agente esterilizante/descontaminante no material. Em arte antiga, o problema agrava-se uma vez que estamos a falar de obras únicas e irreprodutíveis. O terceiro artigo desta tese pretende, deste modo, averiguar os efeitos da radiação gama na textura e cor de pergaminhos, de forma a determinar uma dose máxima para os tratamentos de descontaminação deste material. Os resultados indicam que, até aos 30 kGy parece não existir qualquer efeito da radiação gama nas características estruturais do pergaminho. A radiação gama é um conhecido agente mutagénico estando descrito que os principais efeitos nos sobreviventes dos processos de irradiação são deleções no material genético. O quarto artigo contido nesta dissertação aborda o tema da alteração genética provocada pela radiação gama. Foram obtidos perfis genéticos de quatro isolados fúngicos antes e após exposição a várias doses de radiação gama. Dois dos isolados pertencem à população microbiana do ar interior do AUC (Penicillium griseofluvum e Neosartorya fumigata) e os restantes às amostras de pergaminho (Cladosporium sp. e Epicoccum nigrum). A técnica de PCR fingerprinting mostrou ser adequada na detecção de alterações genéticas induzidas por baixas doses de radiação gama (2 kGy) em fungos, sendo ainda possível relacionar a resistência descrita para cada um dos géneros fúngicos com as alterações observadas. No entanto, os resultados obtidos não apresentam um padrão que relacione as alterações genéticas detectadas e o aumento da dose de radiação. Em conclusão, os resultados apontam para a aplicabilidade de doses de radiação gama entre os 5 e os 10 kGy, como tratamento de descontaminação de materiais de Arquivo. Archive and library materials are composed by complex organic materials which could be metabolized by some living beings, turning the biodeterioration into one concerning issue in conservation and restoration areas.The Archive of the University of Coimbra (AUC) fight to find solutions to the biodeterioration problems that had been detected. This thesis objectives were: i) Characterize the AUC indoor air microbial population in order to identify its role in documents contamination; ii) Characterize the AUC parchments natural contamination; iii) Study the parchment bioburden gamma radiation inactivation patterns in order to propose a minimal dose (Dmin.); iv) Evaluate the gamma radiation effects in parchment physic characteristics in order to estimate a maximum gamma radiation dose (Dmax.) and v) Estimate the genetic profile alterations induced by gamma radiation in AUC air and parchment fungal isolates, using PCR fingerprinting. AUC indoor air microbiota was characterized and results point out to a low microbial contamination (<200 CFU/m3) being the most frequent morphological types the gram positive, catalase positive cocci and the filamentous fungi. A parchment bioburden assessment methodology was evaluated and results indicated a low bioburden (<102 CFU/cm2), mainly constituted by the same morphological types found in AUC indoor air. However, molecular identification of air and parchment fungal species was not able to correlate the air as a contamination vector. Gamma radiation was evaluated as parchment decontamination treatment using microbial inactivation studies. Results indicated that parchment microbial population did not follow an exponential inactivation kinetics, however a 99% inactivation efficiency was achieved for 5 kGy. This dose is proposed as Dmin for parchment decontamination treatment and doses up to 30 kGy seems to have no effects on parchment texture and colour. PCR fingerprinting techniques were used to detect gamma radiation effects on AUC fungal isolates genetic profiles. No patterns were identified to correlate the observed genetic alterations and gamma radiation dose. Although, differences in gamma radiation resistance were detected for the different isolates. Results indicate a 5-10 kGy range as effective for Archive materials decontamination

    Rational Creatures: Using Vector Space Models to Examine Independence in the Novels of Jane Austen, Maria Edgeworth, and Sydney Owenson (1800–1820)

    Get PDF
    Recent trends in digital humanities have led to a proliferation of studies that apply ‘distant’ reading to textual data. There is an uneasy relationship between the increased use of computational methods and their application to literary studies. Much of the current literature has focused on the exploration of large corpora. However, the ability to work at this scale is often not within the power (financial or technical) or the interests, of researchers. As these large-scale studies often ignore smaller corpora, few have sought to define a clear theoretical framework within which to study small-scale text collections. In addition, while some research has been carried out on the application of term-document vector space models (topic models and frequency based analysis) to nineteenth century novels, no study exists which applies word-context models (word embeddings and semantic networks) to the novels of Austen, Edgeworth, and Owenson. This study, therefore, seeks to evaluate the use of vector space models when applied to these novels. This research first defines a theoretical framework - enhanced reading - which combines the use of close and distant reading. Using a corpus of twenty-eight nineteenth century novels as its central focus, this study also demonstrates the practical application of this theoretical approach with the additional aim of providing an insight into the authors’ representation of independence at a time of great political and social upheaval in Ireland and the UK. The use of term-document models was found to be, generally, more useful for gaining an overview of the corpora. However, the findings for word-context models reveal their ability to identify specific textual elements, some of which were not readily identified through close reading, and therefore were useful for exploring texts at both corpus and individual text level

    Non-display uses of copyright works: Google Books and beyond

    Get PDF
    Copyright @ 2011 The AuthorsWith the advent of mass digitisation projects, such as the Google Book Search, a peculiar shift has occurred in the way that copyright works are dealt with. Contrary to what has so far been the case, works are turned into machine-readable data to be automatically processed for various purposes without the expression of works being displayed to the public. In the Google Book Settlement Agreement, this new kind of uses is referred to as “non-display uses” of digital works. The legitimacy of these uses has not yet been tested by Courts and does not comfortably fit in the current copyright doctrine, plainly because the works are not used as works but as something else, namely as data. Since non-display uses may prove to be a very lucrative market in the near future, with the potential to affect the way people use copyright works, we examine non-display uses under the prism of copyright principles to determine the boundaries of their legitimacy. Through this examination, we provide a categorisation of the activities carried out under the heading of “non-display uses”, we examine their lawfulness under the current copyright doctrine and approach the phenomenon from the spectrum of data protection law as could apply, by analogy, to the use of copyright works as processable data

    Rational Creatures: Using Vector Space Models to Examine Independence in the Novels of Jane Austen, Maria Edgeworth, and Sydney Owenson (1800–1820)

    Get PDF
    Recent trends in digital humanities have led to a proliferation of studies that apply ‘distant’ reading to textual data. There is an uneasy relationship between the increased use of computational methods and their application to literary studies. Much of the current literature has focused on the exploration of large corpora. However, the ability to work at this scale is often not within the power (financial or technical) or the interests, of researchers. As these large-scale studies often ignore smaller corpora, few have sought to define a clear theoretical framework within which to study small-scale text collections. In addition, while some research has been carried out on the application of term-document vector space models (topic models and frequency based analysis) to nineteenth century novels, no study exists which applies word-context models (word embeddings and semantic networks) to the novels of Austen, Edgeworth, and Owenson. This study, therefore, seeks to evaluate the use of vector space models when applied to these novels. This research first defines a theoretical framework - enhanced reading - which combines the use of close and distant reading. Using a corpus of twenty-eight nineteenth century novels as its central focus, this study also demonstrates the practical application of this theoretical approach with the additional aim of providing an insight into the authors’ representation of independence at a time of great political and social upheaval in Ireland and the UK. The use of term-document models was found to be, generally, more useful for gaining an overview of the corpora. However, the findings for word-context models reveal their ability to identify specific textual elements, some of which were not readily identified through close reading, and therefore were useful for exploring texts at both corpus and individual text level
    corecore