302 research outputs found

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201

    GenoMetric Query Language: A novel approach to large-scale genomic data management

    Get PDF
    Motivation: Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art ‘big data’ computing strategies, with abstraction levels beyond available tool capabilities. Results: We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic ‘big data’ analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. Availability and implementation: The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    Lessons from Love-Locks: The archaeology of a contemporary assemblage

    Get PDF
    This document is the Accepted Manuscript version. The final, definitive version of this paper has been published in Journal of Material Culture, November 2017, published by SAGE Publishing, All rights reserved.Loss of context is a challenge, if not the bane, of the ritual archaeologist’s craft. Those who research ritual frequently encounter difficulties in the interpretation of its often tantalisingly incomplete material record. Careful analysis of material remains may afford us glimpses into past ritual activity, but our often vast chronological separation from the ritual practitioners themselves prevent us from seeing the whole picture. The archaeologist engaging with structured deposits, for instance, is often forced to study ritual assemblages post-accumulation. Many nuances of its formation, therefore, may be lost in interpretation. This paper considers what insights an archaeologist could gain into the place, people, pace, and purpose of deposition by recording an accumulation of structured deposits during its formation, rather than after. To answer this, the paper will focus on a contemporary depositional practice: the love-lock. This custom involves the inscribing of names/initials onto a padlock, its attachment to a bridge or other public structure, and the deposition of the corresponding key into the water below; a ritual often enacted by a couple as a statement of their romantic commitment. Drawing on empirical data from a three-year diachronic site-specific investigation into a love-lock bridge in Manchester, UK, the author demonstrates the value of contemporary archaeology in engaging with the often enigmatic material culture of ritual accumulations.Peer reviewe

    X-ray harmonic comb from relativistic electron spikes

    Get PDF
    X-ray devices are far superior to optical ones for providing nanometre spatial and attosecond temporal resolutions. Such resolution is indispensable in biology, medicine, physics, material sciences, and their applications. A bright ultrafast coherent X-ray source is highly desirable, for example, for the diffractive imaging of individual large molecules, viruses, or cells. Here we demonstrate experimentally a new compact X-ray source involving high-order harmonics produced by a relativistic-irradiance femtosecond laser in a gas target. In our first implementation using a 9 Terawatt laser, coherent soft X-rays are emitted with a comb-like spectrum reaching the 'water window' range. The generation mechanism is robust being based on phenomena inherent in relativistic laser plasmas: self-focusing, nonlinear wave generation accompanied by electron density singularities, and collective radiation by a compact electric charge. The formation of singularities (electron density spikes) is described by the elegant mathematical catastrophe theory, which explains sudden changes in various complex systems, from physics to social sciences. The new X-ray source has advantageous scalings, as the maximum harmonic order is proportional to the cube of the laser amplitude enhanced by relativistic self-focusing in plasma. This allows straightforward extension of the coherent X-ray generation to the keV and tens of keV spectral regions. The implemented X-ray source is remarkably easily accessible: the requirements for the laser can be met in a university-scale laboratory, the gas jet is a replenishable debris-free target, and the harmonics emanate directly from the gas jet without additional devices. Our results open the way to a compact coherent ultrashort brilliant X-ray source with single shot and high-repetition rate capabilities, suitable for numerous applications and diagnostics in many research fields

    News media use, talk networks, and anti-elitism across geographic location: evidence from Wisconsin

    Get PDF
    A certain social-political geography recurs across European and North American societies: As post-industrialization and mechanization of agriculture have disrupted economies, rural and nonmetropolitan areas are aging and declining in population, leading to widening political and cultural gaps between metropolitan and rural communities. Yet political communication research tends to focus on national or cross-national levels, often emphasizing networked digital media and an implicitly global information order. We contend that geographic place still provides a powerful grounding for individuals’ lifeworld experiences, identities, and orientations to political communications and politics. Focusing on the U.S. state of Wisconsin, and presenting data gathered in 2018, this study demonstrates significant, though often small, differences between geographic locations in terms of their patterns of media consumption, political talk, and anti-elite attitudes. Importantly, television news continues to play a major role in citizens’ repertoires across locations, suggesting we must continue to pay attention to this broadcast medium. Residents of more metropolitan communities consume significantly more national and international news from prestige sources such as the New York Times, and their talk networks are more cleanly sorted by partisanship. Running against common stereotypes of news media use, residents of small towns and rural areas consume no more conservative media than other citizens, even without controlling for partisanship. Our theoretical model and empirical results call for further attention to the intersections of place and politics in understanding news consumption behaviors and the meanings citizens draw from media content

    The Penny’s Dropped: Renegotiating the contemporary coin deposit

    Get PDF
    This is the Accepted Manuscript of the following article: Ceri Houlbrook, “The penny’s dropped: Renegotiating the contemporary coin deposit”, Journal of Material Culture, Vol. 20(2): 173-189, March 2015. The final published version is available at: http://journals.sagepub.com/doi/pdf/10.1177/1359183515577120#articleCitationDownloadContainer © 2015, © SAGE Publications.This article examines the status of coins as contemporary deposits in the British Isles. With a focus on both historical and contemporary sites, from the Neolithic long barrow of Wayland’s Smithy, Oxfordshire, to the plethora of wishing-wells and coin-trees distributed across the British Isles, it demonstrates the popularity of coins as ritual deposits. The author considers how they are perceived and treated by site custodians, and concludes with a case study of an archaeological excavation, the 2013 Ardmaddy Wishing-Tree Project, which recovered a large amount of contemporary coin deposits. This article does not aim to locate itself within the debates of site custodianship and accessibility, nor does it propose to address the broader dilemmas of a site’s ritual continuity or resurgence. Instead, its aim is to encourage archaeologists to consider the contemporary deposit as an integral part of the ritual narrative of a site, rather than as disposable ‘ritual litter’.Peer reviewedFinal Accepted Versio

    Two novel human cytomegalovirus NK cell evasion functions target MICA for lysosomal degradation

    Get PDF
    NKG2D plays a major role in controlling immune responses through the regulation of natural killer (NK) cells, αÎČ and γΎ T-cell function. This activating receptor recognizes eight distinct ligands (the MHC Class I polypeptide-related sequences (MIC) A andB, and UL16-binding proteins (ULBP)1–6) induced by cellular stress to promote recognition cells perturbed by malignant transformation or microbial infection. Studies into human cytomegalovirus (HCMV) have aided both the identification and characterization of NKG2D ligands (NKG2DLs). HCMV immediate early (IE) gene up regulates NKGDLs, and we now describe the differential activation of ULBP2 and MICA/B by IE1 and IE2 respectively. Despite activation by IE functions, HCMV effectively suppressed cell surface expression of NKGDLs through both the early and late phases of infection. The immune evasion functions UL16, UL142, and microRNA(miR)-UL112 are known to target NKG2DLs. While infection with a UL16 deletion mutant caused the expected increase in MICB and ULBP2 cell surface expression, deletion of UL142 did not have a similar impact on its target, MICA. We therefore performed a systematic screen of the viral genome to search of addition functions that targeted MICA. US18 and US20 were identified as novel NK cell evasion functions capable of acting independently to promote MICA degradation by lysosomal degradation. The most dramatic effect on MICA expression was achieved when US18 and US20 acted in concert. US18 and US20 are the first members of the US12 gene family to have been assigned a function. The US12 family has 10 members encoded sequentially through US12–US21; a genetic arrangement, which is suggestive of an ‘accordion’ expansion of an ancestral gene in response to a selective pressure. This expansion must have be an ancient event as the whole family is conserved across simian cytomegaloviruses from old world monkeys. The evolutionary benefit bestowed by the combinatorial effect of US18 and US20 on MICA may have contributed to sustaining the US12 gene family

    AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.</p> <p>Methods</p> <p>The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.</p> <p>Results</p> <p>We identified an autonomous set of descriptors, the GBox, that provides user and domain specifications for the graphical user interface. This was achieved by identifying a formalism that makes use of an RDF schema to enable the automatic assembly of graphical user interfaces in a meaningful manner while using only resources native to the client web browser (JavaScript interpreter, document object model). We defined a generalized RDF model such that changes in the graphic descriptors are automatically and immediately (locally) reflected into the configuration of the client's interface application.</p> <p>Conclusions</p> <p>The design patterns identified for the GBox benefit from and reflect the specific requirements of interacting with data generated by clinical trials, and they contain clues for a general purpose solution to the challenge of having interfaces automatically assembled for multiple and volatile views of a domain. By coding AGUIA in JavaScript, for which all browsers include a native interpreter, a solution was found that assembles interfaces that are meaningful to the particular user, and which are also ubiquitous and lightweight, allowing the computational load to be carried by the client's machine.</p

    Experiences in the development of a data management system for genomics

    Get PDF
    GMQL is a high-level query language for genomics, which operates on datasets described through GDM, a unifying data model for processed data formats. They are ingredients for the integration of processed genomic datasets, i.e. of signals produced by the genome after sequencing and long data extraction pipelines. While most of the processing load of today’s genomic platforms is due to data extraction pipelines, we anticipate soon a shift of attention towards processed datasets, as such data are being collected by large consortia and are becoming increasingly available. In our view, biology and personalized medicine will increasingly rely on data extraction and analysis methods for inferring new knowledge from existing heterogeneous repositories of processed datasets, typically augmented with the results of experimental data targeting individuals or small populations. While today’s big data are raw reads of the sequencing machines, tomorrow’s big data will also include billions or trillions of genomic regions, each featuring specific values depending on the processing conditions. Coherently, GMQL is a high-level, declarative language inspired by big data management, and its execution engines include classic cloud-based systems, from Pig to Flink to SciDB to Spark. In this paper, we discuss how the GMQL execution environment has been developed, by going through a major version change that marked a complete system redesign; we also discuss our experiences in comparatively evaluating the four platforms

    Treating implicit trauma: a quasi-experimental study comparing the EMDR Therapy Standard Protocol with a ‘Blind 2 Therapist’ version within a trauma capacity building project in Northern Iraq

    Get PDF
    Psychological trauma is a silent epidemic which presents as a global public health issue, often in the form of post- traumatic stress disorder (PTSD). Eye Movement Desensitisation and Reprocessing (EMDR) Therapy is an empirically supported treatment intervention for PTSD and has been used as part of trauma-capacity building, particularly in low- and middle-income countries (LMIC). For some survivor’s, their trauma experiences cannot be spoken of: they may be alluded to, suggested and though not directly expressed. There are several factors as to why these implicit trauma experiences are ‘unspoken’, for example, when the trauma involves a deep-rooted sense of shame or guilt, a distorted sense of over-responsibility or when to speak of the trauma engenders fear of retribution, reprisal and consequence. This paper will explore the effectiveness of using two protocol variations of EMDR Therapy—standard versus a ‘Blind 2 Therapist’ protocol version as part of a quasi-experimental study which took place in Northern Iraq. The study contains two projects and subsequently tested several hypotheses regarding safety, effectiveness, efficiency and relevance of the ‘Blind 2 Therapist’ protocol within EMDR Therapy. Results indicated support for the B2T protocol intervention with various trauma populations including Yezidi survivors of Islamic State of Iraq and the Levant (ISIL)—also known as Daesh
    • 

    corecore