    Chapter 2. The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries)

    The main aims with this book are to compare source materials, databases and research results, as well as creating new opportunities for collaboration in the field of social and population history in the East and the West. All the contributions are based on nominative source material, mainly censuses and vital records, which have been preserved, scanned, transcribed into databases in order to be used for cross-sectional and longitudinal research. The chapters in the first part of this book mostly focus on the construction of nominative databases in Germany, Spain and Romania. The chapters in the second and third part are case studies on the relationship between marriage and fertility; mortality and fertility; marriage behavior and religion; urban mortality; migration, etc. made on the Russian, Austrian, Estonian, Hungarian and Norwegian databases

    A Tale of Two Transcriptions : Machine-Assisted Transcription of Historical Sources

    This article is part of the "Norwegian Historical Population Register" project financed by the Norwegian Research Council (grant # 225950) and the Advanced Grand Project "Five Centuries of Marriages"(2011-2016) funded by the European Research Council (# ERC 2010-AdG_20100407)This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world's longest series of preserved vital records. Thus, in the Project "Five Centuries of Marriages" (5CofM) at the Autonomous University of Barcelona's Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources

    Phoneme-based Video Indexing Using Phonetic Disparity Search

    This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search

    DARIAH and the Benelux

    Building a health and environment geographical information system :an evaluation, looking at childhood cancer in Northern England

    PhD ThesisThe aim of this research was to evaluate a relatively young technology, Geographical Information Systems (GIS), in a specific applications environment. The application adopted was that of searching for environmental causes of childhood cancer, in particular that of Acute Lymphoblastic Leukaemia (ALL), in Northern England. It is also relevant in terms of the WHO's intention to develop a Health and Environment GIS, and therefore the research aims to satisfy their recommendations for pilot studies. The subject matter of this thesis therefore covers two very high profile topics, which it is believed will mutually benefit from the research carried out. Firstly, very little is known about the aetiology of ALL, and thus any new methodology which is introduced to help analyse sensitive issues of causation is welcomed not only by those in the medical field but also the public. The application was made possible with the provision of detailed cancer data for Northern England and a weak but interesting hypothesis that environmental factors may be an attributable mechanism for causation. Key questions which are asked include; Where are incidences of ALL located? Why are they there? Is there a cluster? and What could be the cause? Secondly a Geographical Information System, in this case the proprietary software package ARC/INFO, was considered an excellent medium for tackling this spatial epidemiological problem. Especially with its capability to store large volumes of diverse data, and its inherent flexibility to deal with spatial information pertaining to health and environmental factors. More importantly the application itself offered a means of evaluating the implementation of a GIS. Establishing the advantages and pitfalls which accompany all stages of 'The GIS Process' and an invaluable documentation of the experiences acquired as an initiator, developer and implementor of this new technology. In addition, this research offers fresh ideas and techniques for improving those areas of the technology which appear to be lacking in these early phases of its development. The problems of spatial analysis in GIS and the provision of useful tools such as 'pattern spotters', 'relationship seekers' and 'error handlers' are discussed as alternative techniques. To ensure an exciting future for GIS technology in application environments the latter and other key areas of research which should be persued are highlighted in this thesis

    Windows of opportunity for status attainment in Southern Europe : family impact and industrialization on the individual career in Catalonia (nineteenth and twentieth centuries)

    The role of the family in both individual social status attainment and labor careers during industrialization was questioned by the Modernization theory. Accordingly, familial nuclearization was argued to be one of the causes. However, little has been said on this topic regarding societies in which stem or joint families were important as in the case of Southern Europe. This article studies the industrialization effects on the familial influence for the individuals' social destinations and labor career progressions on cohorts born between 1860 and 1909 in Catalonia in an area of early industrialization and fertility decline, through the Sant Feliu de Llobregat Longitudinal Demographic Database. The results show that family influence on occupational attainment decreased during the industrialization in Catalonia, albeit did not vanish totally. Moreover, this loss of familial influence was concomitant with the fertility decline, entailing an interdependent relationship between the effects of industrialization and shrinking number of offspring. In contrast to societies with a prevalence of nuclear families, Catalonia faced changes in family influence and fertility decline without losing the strong presence of stem families. The youngest cohorts facing industrialization's consolidation attained higher levels of occupational status, while the oldest cohorts within the initial stages of industrialization achieved less career progression and faced social immobility, which is explained by the proletarianization effect. Nevertheless, this general enhancement over time did not break the social stratification caused by social background, which demonstrates that inequality in accessing opportunities is linked to the capacity to generate progress or demotion within societies.La teoria de la Modernització, ja clàssica, pressuposa la pèrdua d'influència de la família en l'estatus social i en les trajectòries ocupacionals dels seus descendents, en part, com a conseqüència de la industrialització. En aquest sentit, alguns autors han assenyalat la nuclearització de les famílies com una de les seves causes. Malgrat això, poc s'ha dit sobre el tema en societats on la família troncal era imperant, com en el cas del sud d'Europa. En aquest article es proposa estudiar els efectes de la industrialització en la pèrdua d'influència de les famílies en les destinacions socials i/o ocupacionals dels seus descendents a Catalunya per les cohorts nascudes entre 1860 i 1909. Concretament s'analitzarà el cas d'estudi de Sant Feliu de Llobregat, un àrea de primerenca industrialització i de precoç disminució de la fecunditat, utilitzant la Sant Feliu de Llobregat Longitudinal Demographic Database. Els resultats mostren com la influència familiar en l'estatus social i/o ocupacional dels fills va disminuir durant la industrialització a Catalunya, encara que no es va esvair per complet. Aquesta pèrdua va ser concomitant amb la disminució de la fecunditat encara que la prevalença de la família troncal va continuar sent important. Les cohorts més joves (1890-99 i 1900-09) que van tenir la seva entrada al mercat laboral durant la consolidació de la industrialització van aconseguir nivells més alts d'estatus ocupacional que les cohorts més antigues (1860-69 i 1870-79) que s'havien incorporat en les etapes inicials de la industrialització. Aquestes cohorts van mostrar una menor progressió professional i van notar una major immobilitat social, un aspecte que s'explica per un procés de proletarització. L'ordre de naixement dels individus també resulta ser cabdal per a la seva progressió social. D'aquesta manera, els primogènits de les cohorts més antigues assolien un millor estatus socioeconòmic en comparació al de la resta dels germans. En canvi, els germans nascuts en segona posició mostraren un millor estatus que els primers germans en les cohorts més joves. Aquests elements evidencien el declivi del sistema català de l'hereu únic. No obstant això, la millora general en termes de progressió social al llarg del temps no va facilitar la ruptura de l'estratificació social, la qual cosa demostra que la desigualtat en l'accés d'oportunitats està vinculada a la capacitat de generar progrés en les pròpies trajectòries ocupacionals individuals, encara que pertànyer a una família benestant ajudava a la seva consolidació.La teoría de la Modernización, ya clásica, presupone la pérdida de la influencia de la familia en el estatus social y en las trayectorias ocupacionales de sus descendientes, en parte, como consecuencia de la industrialización. En este sentido, algunos autores han señalado la nuclearización de las familias como una de sus causas. Sin embargo, poco se ha dicho sobre este tema en sociedades donde la familia troncal era imperante, como en el caso del sur de Europa. En este artículo se propone el estudio de los efectos de la industrialización en la pérdida de influencia de las familias en los destinos sociales y/u ocupacionales de sus descendientes en Cataluña para las cohortes nacidas entre 1860 y 1909. Concretamente se analizará el caso de estudio de Sant Feliu de Llobregat un área de temprana industrialización y pronta disminución de la fecundidad, utilizando la Sant Feliu Longitudinal Demographic Database. Los resultados muestran como la influencia familiar en el estatus social y/u ocupacional de los hijos disminuyó durante la industrialización en Cataluña, aunque no se desvaneció por completo. Además, esta pérdida fue concomitante con la disminución de la fecundidad aunque la familia troncal continuó imperando. Las cohortes más jóvenes (1890-99 y 1900-09) que tuvieron su entrada en el mercado laboral durante la consolidación de la industrialización alcanzaron niveles más altos de estatus ocupacional que las cohortes más antiguas (1860-69 y 1870-79) que se habían incorporado en las etapas iniciales de la industrialización. Estas mostraron una menor progresión profesional y observaron una mayor inmovilidad social lo que se explica por un proceso de proletarización. El orden de nacimiento de los individuos también resultó clave para la progresión social de estos. De esta manera, los primogénitos de las cohortes más antiguas tuvieron mejor status socioeconómico que el resto de sus hermanos, mientras que los segundos hermanos mostraron un mejor status en las cohortes más jóvenes. Estos elementos evidencian el declive del sistema catalán de heredero único. Sin embargo, la mejora general en términos de progresión social observada a lo largo del tiempo no facilitó la ruptura de la estratificación social, lo que demuestra que la desigualdad en el acceso de oportunidades está vinculada a la capacidad de generar progreso en las propias trayectorias ocupacionales individuales. En verdad, los individuos que provenían de familias acomodadas vieron una mejor consolidación de sus trayectorias

    Adaptive Semantic Annotation of Entity and Concept Mentions in Text

    The recent years have seen an increase in interest for knowledge repositories that are useful across applications, in contrast to the creation of ad hoc or application-specific databases. These knowledge repositories figure as a central provider of unambiguous identifiers and semantic relationships between entities. As such, these shared entity descriptions serve as a common vocabulary to exchange and organize information in different formats and for different purposes. Therefore, there has been remarkable interest in systems that are able to automatically tag textual documents with identifiers from shared knowledge repositories so that the content in those documents is described in a vocabulary that is unambiguously understood across applications. Tagging textual documents according to these knowledge bases is a challenging task. It involves recognizing the entities and concepts that have been mentioned in a particular passage and attempting to resolve eventual ambiguity of language in order to choose one of many possible meanings for a phrase. There has been substantial work on recognizing and disambiguating entities for specialized applications, or constrained to limited entity types and particular types of text. In the context of shared knowledge bases, since each application has potentially very different needs, systems must have unprecedented breadth and flexibility to ensure their usefulness across applications. Documents may exhibit different language and discourse characteristics, discuss very diverse topics, or require the focus on parts of the knowledge repository that are inherently harder to disambiguate. In practice, for developers looking for a system to support their use case, is often unclear if an existing solution is applicable, leading those developers to trial-and-error and ad hoc usage of multiple systems in an attempt to achieve their objective. In this dissertation, I propose a conceptual model that unifies related techniques in this space under a common multi-dimensional framework that enables the elucidation of strengths and limitations of each technique, supporting developers in their search for a suitable tool for their needs. Moreover, the model serves as the basis for the development of flexible systems that have the ability of supporting document tagging for different use cases. I describe such an implementation, DBpedia Spotlight, along with extensions that we performed to the knowledge base DBpedia to support this implementation. I report evaluations of this tool on several well known data sets, and demonstrate applications to diverse use cases for further validation

    Reciprocal Teaching: An Exploration of its Effectiveness in Improving the Vocabulary and Reading Comprehension of Key Stage Two Pupils with and without English as an Additional Language

    Background: The English National Curriculum identifies the acquisition of vocabulary as key to learning (DfE, 2015). Rich contexts provided by text produce robust vocabulary learning (National Reading Panel, 2000). Considering this, as well as evidence that teaching metacognition and reading comprehension are low cost and high impact approaches (Higgins, Katsipataki, Kokotsaki, Coleman, Major, & Coe, 2014), a Reciprocal Teaching intervention (Palincsar & Brown, 1984) was selected for a group of children with known vocabulary and reading comprehension difficulties. A systematic literature search indicated that little research has focused on the effectiveness of Reciprocal Teaching on vocabulary development. The current study aimed to address this gap and to explore the impact of Reciprocal Teaching on the vocabulary development and reading comprehension of monolingual pupils and children with English as an Additional Language (EAL) in the context of the English education system. Method: A purposive sample of 22 participants (aged 8-11) from two mainstream primary schools were selected by teachers according to vocabulary and reading comprehension needs. Nine pupils were monolingual and 13 spoke English as an additional language. All took part in a Reciprocal Teaching intervention, based on approaches devised by Palincsar and Brown. A convergent mixed methods design was employed; whereby quantitative data were collected pre- and post-intervention to measure vocabulary and reading comprehension. Qualitative measures were conducted post-intervention to gain participants’ perspectives. Results: Educationally significant gains were observed in vocabulary for participants who received the greatest number of Reciprocal Teaching sessions and for monolingual children overall. No improvement was observed for reading comprehension. Thematic analysis produced themes related to child engagement and Reciprocal Teaching implementation. Implications: This study contributes to the developing evidence-base regarding the effectiveness of Reciprocal Teaching in England. Implications for Educational Psychologists in facilitating implementation of interventions in schools are discussed