Search CORE

18 research outputs found

Reflections on Infrastructures for Mining Nineteenth-Century Newspaper Data

Author: Hauswedell T
Nyhan J
Tiedau U
Publication venue: Gale Digital Humanities Day
Publication date: 01/01/2020
Field of study

In this study we compare and contrast our experiences (as historians and as digital humanities and information studies researchers) of seeking to mine large-scale historical datasets via university-based, high-performance computing infrastructures versus our experiences of using external, cloud-hosted platforms and tools to mine the same data. In particular, we reflect on our recent experiences in two large transnational digital humanities projects: Asymmetrical Encounters: E-Humanity Approaches to Reference Cultures in Europe, 1815–1992, which was funded by a Humanities in the European Research Area grant (2013–2016) and Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories 1840–1914, which was funded through the Transatlantic Partnership for Social Sciences and Humanities 2016 Digging into Data Challenge (2017–2019). As part of the research for both these projects we sought to mine the OCR text of nineteenth-century historical newspapers that had been mounted on UCL’s HighPerformance Computing Infrastructures from Gale’s TDM drives. We compare and contrast our experiences of this with our subsequent experiences of performing comparable tasks via Gale Digital Scholar Lab. We contextualise our experiences and observations within wider discourses and recommendations about infrastructural support for humanities-led analyses of large datasets and discuss the advantages and drawbacks of both approaches. We situate our discussions in the aforementioned infrastructural scenarios with reflections on the human experiences of undertaking this research, which represents a step change for many of those who work in the (digital) humanities. Finally, we conclude by discussing the public and private sector research investments that are needed to support further developments and to facilitate access to and critical interrogation of large-scale digital archive

UCL Discovery

Of global reach yet of situated contexts: an examination of the implicit and explicit selection criteria that shape digital archives of historical newspapers

Author: Beals M
Bell E
Hauswedell T
Nyhan J
Terras M
Publication venue
Publication date: 01/06/2020
Field of study

A large literature addresses the processes, circumstances and motivations that have given rise to archives. These questions are increasingly being asked of digital archives, too. Here, we examine the complex interplay of institutional, intellectual, economic, technical, practical and social factors that have shaped decisions about the inclusion and exclusion of digitised newspapers in and from online archives. We do so by undertaking and analysing a series of semi-structured interviews conducted with public and private providers of major newspaper digitisation programmes. Our findings contribute to emerging understandings of factors that are rarely foregrounded or highlighted, yet fundamentally shape the depth and scope of digital cultural heritage archives and thus the questions that can be asked of them, now and in the future. Moreover, we draw attention to providers’ emphasis on meeting the needs of their end-users and how this is shaping the form and function of digital archives. The end user is not often emphasised in the wider literature on archival studies and we thus draw attention to the potential merit of this vector in future studies of digital archives

UCL Discovery

defoe: A Spark-based Toolbox for Analysing Digital Historical Textual Data

Author: Ahnert R.
Ardanuy M.C.
Beavan D.
Colavizza G.
Filgueira R.
Hauswedell T.
Hetherington J.
Hobson T.
Jackson M.
Krause A.
Nyhan J.
Roubickova A.
Terras M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California

The sequences of 150,119 genomes in the UK Biobank

Author: Asgeirsdottir Margret
Beyter Doruk
Brunak Søren
Eggertsson Hannes P
Eiriksson Ogmundur
Erikstrup Christian
Geirsson Arni J
Gudbjartsson Daniel F
Gudjonsson Sigurjon A
Gylfason Arnaldur
Halldorsson Bjarni V
Halldorsson Gisli H
Hardarson Marteinn T
Hauswedell Hannes
Helgason Agnar
Holley Guillaume
Holm Hilma
Jensson Brynjar O
Jonsdottir Ingileif
Jonsson Frosti
Jonsson Hakon
Jonsson Helgi
Jonsson Palmi
Kristinsson Kari
Kristmundsdottir Snaedis
Magnusdottir Droplaug N
Magnusson Olafur T
Masson Gisli
Melsted Pall
Moore Kristjan H S
Nielsen Kaspar René
Norland Kristjan
Oddsson Asmundur
Olafsson Isleifur
Olason Pall I
Ostrowski Sisse Rye
Palsson Gunnar
Pedersen Ole Birger
Rafnar Thorunn
Saemundsdottir Jona
Sigurdsson Brynjar
Sigurdsson Gunnar T
Sigurpalsdottir Brynja D
Snorradottir Steinunn
Sobech Emilia
Stefansson Hreinn
Stefansson Kari
Stefansson Olafur A
Styrkarsdottir Unnur
Sulem Patrick
Sveinbjornsson Gardar
Sverrisson Sverrir T
Thorleifsson Gudmar
Thorsteinsdottir Unnur
Tragante Vinicius
Ulfarsson Magnus O
Zink Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data(1,2). Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank(3). This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation

Copenhagen University Research Information System

PubMed Central

VBN

A new MRI rating scale for progressive supranuclear palsy and multiple system atrophy: validity and reliability

Author: Agid Y.
Agid Y.
Agid Y.
Agid Y.
Al-Sarraj S.
Allain H.
Andrews C.
Arguillère S.
Arnold P. D.
Asmus F.
Asselain B.
Autret A.
Azam S.
Azulay J. P.
Bailbé M.
Bak T.
Bartels C.
Bathgate D.
Bauer M.
Behrmann C.
Ben-Shlomo Y.
Benecke R.
Bensimon G.
Bensimon G.
Bensimon G.
Bensimon Gilbert
Benz S.
Besson G.
Blain C.
Bloch F.
Bogdahn U.
Boncoeur-Martel M. P.
Bonnet A. M.
Bonnet A. M.
Bonneville J. F.
Borg M.
Borg M.
Bradey N.
Brandel J. P.
Brandt T.
Braune S.
Broussolle E.
Broussolle E.
Brown R.
Brückmann H.
Brüning R.
Burn D.
Burn D.
Burn D.
Bötefür I.
Bötzel K.
Camu W.
Carpenter A.
Cesaro P.
Chadwick D.
Chanalet S.
Chaudhuri K. R.
Chavda S.
Chavda S.
Clarke C.
Corcsia P.
Cottier P.
Counsell C.
Couratier C.
Couratier P.
de Broucker T.
Deasey N.
Deasy N.
Deasy Neil
Debilly B.
Dedise N.
Defebvre L.
Defebvre L.
Defer G.
Defevbre Luc
Delmaire C.
Delmaire Christine
Delmaire L.
Dengler R.
Derost P.
Destée A.
Dib M.
Dichgans J.
Dietemann J. L.
Dixon T.
Dormont D.
Dormont D.
Dormont Didier
Dougherty A.
Dressler D.
Dubas F.
Dubois B.
Duchesne Simon
Durif F.
Durif F.
Duyckaerts C.
Dönges M.
Eberhardt O.
Ecker D.
Einhaeupl K.
English P.
Evans A.
Evans A.
Fenelon G.
Fermanian J.
Fermanian J.
Fermanian J.
Forbes R.
Fork M.
Foucart C.
Fressinaud C.
Gabrillargues J.
Gagel-Schweibold G.
Galitzky M.
Gallas S.
Garrigues G.
Gasser T.
Geyer C.
Gholkar A.
Gibson M.
Gil R.
Graf M.
Grand S.
Grau G.
Gröschel K.
Hauptmann B.
Hauser T. K.
Hauser T. K.
Hauser T. K.
Hauser Till K.
Hauswedell A.
Hauw J. J.
Heaney D.
Heinze H.
Hermann T.
Hermier M.
Hermine C.
Herting B.
Hodges J.
Houeto J. L.
Huet H.
Isaacs J.
Jarosz J.
Jarosz J.
Jarosz Josef
Jung A.
Kapels H. H.
Kauffmann G.
Khoris J.
Klempp K.
Klucken J.
Kohl Z.
Kolbe H.
Kornhuber M.
Kosinski Christoph Michael
Kraehenbuhl J.
Kraft E.
Kraft E.
Kraft Eduard
Kramer B.
Kretzschmar H. A.
Kronenbürger Martin
Kémeny S.
Lacomblez L.
Lacomblez L.
Lalam T.
Landwehrmeyer B.
Landwehrmeyer B.
Landwehrmeyer G. B.
Lange M.
Ledoze F.
Lees A.
Leigh P. N.
Leigh P. N.
Leigh P. N.
Leigh P. N.
Leigh P. Nigel
Lipp A.
Lipp A.
Ludolph A.
Ludolph A. C.
Ludolph A. C.
Ludolph A. C.
Ludolph Albert C.
Lücking C. H.
Maass S.
Magerkurth C.
Mallaret C.
Maltête D.
Manelfe P.
Marsault C.
Marsault C.
Mason H.
Massey L.
McCrone P.
Mccrone P.
Mckinstry C. S.
Meininger V.
Memin A.
Mesnage V.
Mollion H.
Moore P.
Moore P.
Moumy H.
Mucha D.
Mueller T.
Murphy C.
Murray A.
Mylius V.
Müller T.
Neudecker S.
Newman P.
Nicholls D.
Niess A.
NNIPPS Study Group
Noth J.
Ouslimani A.
Pageot N.
Paillasseur P.
Pall H.
Paviour D.
Payan C.
Payan C.
Payan C.
Payan C.
Payan Christine A. M.
Payan-Cassin H.
Perret J. E.
Peschel T.
Petit F.
Pham H. P.
Portet F.
Poître B.
Prunier C.
Przuntek H.
Quinn N.
Ranoux D.
Rascol O.
Reichmann H.
Reissberg S.
Revesz T.
Reynolds C.
Roland Y.
Rolland Y.
Rolland Yan
Rotte M.
Rumbach L.
Russo N.
Sagnes S.
Sangla S.
Scaravilli T.
Schlangen C.
Schlueter A.
Schmid G.
Schrader C.
Schuierer G.
Schulz Jörg B.
Schumacher M.
Seifert U.
Seilhean D.
Siepmann M.
Siggelkow S.
Skalej M.
Smallman C.
Spreer J.
Stange V.
Stanton B.
Steinmetz G.
Stevens J.
Steventon D.
Stewen J.
Storch A.
Summers B.
Sussmuth S. D.
Tanguy J. Y.
Thalamas C.
Thun C.
Tilignac C.
Tornyi F. T.
Tourbah A.
Tourbah A.
Tranchant C.
Tranchant C.
Trikouli E.
Vandermarcq P.
Vassault G. M.
Velden J.
Venisse S.
Verin M.
Verin M.
Verny M.
Verny M.
Viader F.
Viallet F.
Viallet F.
Viaud B.
Vidailhet M.
Vidailhet M.
Vidailhet M.
Vidry E.
Viehöver A.
Villringer A.
von Kummer R.
Vérin Marc
Wallesch C. W.
Warlow C.
Warren N.
Wassilowsky D.
Welter M. L.
Wesemann T.
Williams V.
Winkler C.
Winkler J.
Winner B.
Witjas T.
Wolters A.
Worbe J.
Youssri T.
Zegowitz G.
Zermansky A.
Zierz S.
Ziyeh S.
Publication venue: BMJ Group
Publication date: 01/01/2011
Field of study

AIM To evaluate a standardised MRI acquisition protocol and a new image rating scale for disease severity in patients with progressive supranuclear palsy (PSP) and multiple systems atrophy (MSA) in a large multicentre study. METHODS The MRI protocol consisted of two-dimensional sagittal and axial T1, axial PD, and axial and coronal T2 weighted acquisitions. The 32 item ordinal scale evaluated abnormalities within the basal ganglia and posterior fossa, blind to diagnosis. Among 760 patients in the study population (PSP = 362, MSA = 398), 627 had per protocol images (PSP = 297, MSA = 330). Intra-rater (n = 60) and inter-rater (n = 555) reliability were assessed through Cohen's statistic, and scale structure through principal component analysis (PCA) (n = 441). Internal consistency and reliability were checked. Discriminant and predictive validity of extracted factors and total scores were tested for disease severity as per clinical diagnosis. RESULTS Intra-rater and inter-rater reliability were acceptable for 25 (78%) of the items scored (≥ 0.41). PCA revealed four meaningful clusters of covarying parameters (factor (F) F1: brainstem and cerebellum; F2: midbrain; F3: putamen; F4: other basal ganglia) with good to excellent internal consistency (Cronbach α 0.75-0.93) and moderate to excellent reliability (intraclass coefficient: F1: 0.92; F2: 0.79; F3: 0.71; F4: 0.49). The total score significantly discriminated for disease severity or diagnosis; factorial scores differentially discriminated for disease severity according to diagnosis (PSP: F1-F2; MSA: F2-F3). The total score was significantly related to survival in PSP (p<0.0007) or MSA (p<0.0005), indicating good predictive validity. CONCLUSIONS The scale is suitable for use in the context of multicentre studies and can reliably and consistently measure MRI abnormalities in PSP and MSA. Clinical Trial Registration Number The study protocol was filed in the open clinical trial registry (http://www.clinicaltrials.gov) with ID No NCT00211224

Crossref

University of Regensburg Publication Server

HAL-Inserm

PubMed Central

UCL Discovery

Publikationsserver der RWTH Aachen University

Okina

Sussex Research Online

HAL-Rennes 1

defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data.

Author: Ahnert R
Ardanuy MC
Beavan D
Colavizza G
Filgueira R
Hauswedell T
Hetherington J
Hobson T
Jackson M
Krause A
Nyhan J
Roubícková A
Terras M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This work presents defoe, a new scalable and portable digital eScience toolbox that enables historical research. It allows for running text mining queries across large datasets, such as historical newspapers and books in parallel via Apache Spark. It handles queries against collections that comprise several XML schemas and physical representations. The proposed tool has been successfully evaluated using five different large-scale historical text datasets and two HPC environments, as well as on desktops. Results shows that defoe allows researchers to query multiple datasets in parallel from a single command-line interface and in a consistent way, without any HPC environment-specific requirements.</p

Heriot Watt Pure

Crossref

UCL Discovery

Edinburgh Research Explorer

Queen Mary Research Online

International Migration, Integration and Social Cohesion online publications

UvA-DARE

University of St. Andrews - Pure

Irritationen des Friedens. Die nordirischen Kirchen auf der Suche nach ihrer Rolle als Friedensstifter

Author: B. Moltmann
C. Hauswedell
Church Advisory Committee
D. Bacon
D. Stevens
J. McMaster
M. O’Doherty
P. McGarry
R. English
S. Bruce
T. Gromes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices

Author: A Döring
B Langmead
D Belazzougui
D Belazzougui
E Siragusa
F Meyer
H Hauswedell
H Li
M Santiago
P Ferragina
S Gog
T Lam
T Schnattinger
Y Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2016
Field of study

The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If σ is the size of the alphabet then the method of Lam et al. can conduct one step in time O(σ) while needing space O(σ⋅n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(logσ) while using O(logσ⋅n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees. In this paper we introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in O(1) time per step while using O(logσ⋅n)+o(logσ⋅σ⋅n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between ≈2.2−4.2 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices

arXiv.org e-Print Archive

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Politikwissenschaftliche Friedensforschung — ein Überblick

Author: A Bilek
A Rapoport
Augustinus
BW Kubbig
C Hauswedell
CJ Friedrich
D Senghaas
E Jahn
E Senghaas-Knobloch
E-O Czempiel
G Bächler
G Grünewald
H Grotius
H-O Mühleisen
H-O Mühleisen
H-O Mühleisen
J Galtung
J Galtung
J Galtung
J Schwerdtfeger
K Kaiser
K Raumer von
K Raumer von
KJ Gantzel
M Jopp
M Mead
P Lock
P Schneider
S Collmer
T Batschneider
T Ebert
T Ebert
U Drobnig
U Schmiederer
UC Wasmuht
W-D Narr
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref