45 research outputs found

    Berkeley Prosopography Services: Building Research Communities and Restoring Ancient Communities through Digital Tools

    Get PDF
    Berkeley Prosopography Service (BPS) is an innovative open-source digital tool and service that automatically extracts prosopographic data from TEI-encoded text and generates visualizations of the dynamic social networks contained in the text corpora. Filters allow researchers to vary search parameters to consider alternative or hypothetical scenarios such as the impact of individuals and conditions on social and economic relationships. BPS provides users with individual workspaces for research, assessment and probabilistic modelling, while corpus administrators maintain data integrity. During the grant period, BPS, the first independent tool and service to be incorporated into the international Cuneiform Digital Library consortium, will undergo beta-testing of additional text corpora to confirm the reliability and generalizability of its tools for widespread use in the broad community of prosopographers

    Mining complex trees for hidden fruit : a graph–based computational solution to detect latent criminal networks : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Technology at Massey University, Albany, New Zealand.

    Get PDF
    The detection of crime is a complex and difficult endeavour. Public and private organisations – focusing on law enforcement, intelligence, and compliance – commonly apply the rational isolated actor approach premised on observability and materiality. This is manifested largely as conducting entity-level risk management sourcing ‘leads’ from reactive covert human intelligence sources and/or proactive sources by applying simple rules-based models. Focusing on discrete observable and material actors simply ignores that criminal activity exists within a complex system deriving its fundamental structural fabric from the complex interactions between actors - with those most unobservable likely to be both criminally proficient and influential. The graph-based computational solution developed to detect latent criminal networks is a response to the inadequacy of the rational isolated actor approach that ignores the connectedness and complexity of criminality. The core computational solution, written in the R language, consists of novel entity resolution, link discovery, and knowledge discovery technology. Entity resolution enables the fusion of multiple datasets with high accuracy (mean F-measure of 0.986 versus competitors 0.872), generating a graph-based expressive view of the problem. Link discovery is comprised of link prediction and link inference, enabling the high-performance detection (accuracy of ~0.8 versus relevant published models ~0.45) of unobserved relationships such as identity fraud. Knowledge discovery uses the fused graph generated and applies the “GraphExtract” algorithm to create a set of subgraphs representing latent functional criminal groups, and a mesoscopic graph representing how this set of criminal groups are interconnected. Latent knowledge is generated from a range of metrics including the “Super-broker” metric and attitude prediction. The computational solution has been evaluated on a range of datasets that mimic an applied setting, demonstrating a scalable (tested on ~18 million node graphs) and performant (~33 hours runtime on a non-distributed platform) solution that successfully detects relevant latent functional criminal groups in around 90% of cases sampled and enables the contextual understanding of the broader criminal system through the mesoscopic graph and associated metadata. The augmented data assets generated provide a multi-perspective systems view of criminal activity that enable advanced informed decision making across the microscopic mesoscopic macroscopic spectrum

    An ontology of ethnicity based upon personal names: with implications for neighbourhood profiling

    Get PDF
    Understanding of the nature and detailed composition of ethnic groups remains key to a vast swathe of social science and human natural science. Yet ethnic origin is not easy to define, much less measure, and ascribing ethnic origins is one of the most contested and unstable research concepts of the last decade - not only in the social sciences, but also in human biology and medicine. As a result, much research remains hamstrung by the quality and availability of ethnicity classifications, constraining the meaningful subdivision of populations. This PhD thesis develops an alternative ontology of ethnicity, using personal names to ascribe population ethnicity, at very fine geographical levels, and using a very detailed typology of ethnic groups optimised for the UK population. The outcome is an improved methodology for classifying population registers, as well as small areas, into cultural, ethnic and linguistic groups (CEL). This in turn makes possible the creation of much more detailed, frequently updatable representations of the ethnic kaleidoscope of UK cities, and can be further applied to other countries. The thesis includes a review of the literature on ethnicity measurement and name analysis, and their applications in ethnic inequalities and geographical research. It presents the development of the new name to ethnicity classification methodology using both a heuristic and an automated and integrated approach. It is based on the UK Electoral Register as well as several health registers in London. Furthermore, a validation of the proposed name-based classification using different datasets is offered, as well as examples of applications in profiling neighbourhoods by ethnicity, in particular the measurement of residential segregation in London. The main study area is London, UK

    Handling metadata in the scope of coreference detection in data collections

    Get PDF

    Aspects of Record Linkage

    Get PDF
    This thesis is an exploration of the subject of historical record linkage. The general goal of historical record linkage is to discover relations between historical entities in a database, for any specific definition of relation, entity and database. Although this task originates from historical research, multiple disciplines are involved. Increasing volumes of data necessitate the use of automated or semi-automated linkage procedures, which is in the domain of computer science. Linkage methodologies depend heavily on the nature of the data itself, often requiring analysis based on onomastics (i.e., the study of person names) or general linguistics. To understand the dynamics of natural language one could be tempted to look at the source of language, i.e., humans, either on the individual cognitive level or as group behaviour. This further increases the multidisciplinarity of the subject by including cognitive psychology. Every discipline addresses a subset of problem aspects, all of which can contribute either to practical solutions for linkage problems or to further insights into the subject matter.Algorithms and the Foundations of Software technolog

    Flavor text generation for role-playing video games

    Get PDF

    Crossing Experiences in Digital Epigraphy: From Practice to Discipline

    Get PDF
    Although a relevant number of projects digitizing inscriptions are under development or have been recently accomplished, Digital Epigraphy is not yet considered to be a proper discipline and there are still no regular occasions to meet and discuss. By collecting contributions on nineteen projects – very diversified for geographic and chronological context, for script and language, and for typology of digital output – this volume intends to point out the methodological issues which are specific to the application of information technologies to epigraphy. The first part of the volume is focused on data modelling and encoding, which are conditioned by the specific features of different scripts and languages, and deeply influence the possibility to perform searches on texts and the approach to the lexicographic study of such under-resourced languages. The second part of the volume is dedicated to the initiatives aimed at fostering aggregation, dissemination and the reuse of epigraphic materials, and to discuss issues of interoperability. The common theme of the volume is the relationship between the compliance with the theoretic tools and the methodologies developed by each different tradition of studies, and, on the other side, the necessity of adopting a common framework in order to produce commensurable and shareable results. The final question is whether the computational approach is changing the way epigraphy is studied, to the extent of renovating the discipline on the basis of new, unexplored questions

    Digital History and Hermeneutics

    Get PDF
    For doing history in the digital age, we need to investigate the “digital kitchen” as the place where the “raw” is transformed into the “cooked”. The novel field of digital hermeneutics provides a critical and reflexive frame for digital humanities research by acquiring digital literacy and skills. The Doctoral Training Unit "Digital History and Hermeneutics" is applying this new digital practice by reflecting on digital tools and methods
    corecore