197 research outputs found

    MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

    Get PDF
    Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

    Flow cytometry analysis of the microbiota associated with the midguts of vector mosquitoes

    Get PDF
    Background: The scientific interest to understand the function and structure of the microbiota associated with the midgut of mosquito disease vectors is increasing. The advancement of such a knowledge has encountered challenges and limitations associated with conventional culture-based and PCR techniques. Methods: Flow cytometry (FCM) combined with various cell marking dyes have been successfully applied in the field of ecological microbiology to circumvent the above shortcomings. Here, we describe FCM technique coupled with live/dead differential staining dyes SYBR Green I (SGI) and Propidium Iodide (PI) to quantify and study other essential characteristics of the mosquito gut microbiota. Results: A clear discrimination between cells and debris, as well as between live and dead cells was achieved when the midgut homogenate was subjected to staining with 5 x 103 dilution of the SGI and 30 mu M concentration of the PI. Reproducibly, FCM event collections produced discrete populations including non-fluorescent cells, SYBR positive cells, PI fluorescing cells and cells that fluoresce both in SYBR and PI, all these cell populations representing, respectively, background noise, live bacterial, dead cells and inactive cells with partial permeability to PI. The FCM produced a strong linear relationship between cell counts and their corresponding dilution factors (R-2 = 0.987), and the technique has a better precision compared to qRT-PCR. The FCM count of the microbiota reached a peak load at 18 h post-feeding and started declining at 24 h. The present FCM technique also successfully applied to quantify bacterial cells in fixed midgut samples that were homogenized in 4 % PFA. Conclusion: The FCM technique described here offers enormous potential and possibilities of integration with advanced molecular biochemical techniques for the study of the microbiota community in disease vector mosquitoes

    An expression map for Anopheles gambiae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quantitative transcriptome data for the malaria-transmitting mosquito <it>Anopheles gambiae </it>covers a broad range of biological and experimental conditions, including development, blood feeding and infection. Web-based summaries of differential expression for individual genes with respect to these conditions are a useful tool for the biologist, but they lack the context that a visualisation of <it>all </it>genes with respect to <it>all </it>conditions would give. For most organisms, including <it>A. gambiae</it>, such a systems-level view of gene expression is not yet available.</p> <p>Results</p> <p>We have clustered microarray-based gene-averaged expression values, available from VectorBase, for 10194 genes over 93 experimental conditions using a self-organizing map. Map regions corresponding to known biological events, such as egg production, are revealed. Many individual gene clusters (nodes) on the map are highly enriched in biological and molecular functions, such as protein synthesis, protein degradation and DNA replication. Gene families, such as odorant binding proteins, can be classified into distinct functional groups based on their expression and evolutionary history. Immunity-related genes are non-randomly distributed in several distinct regions on the map, and are generally distant from genes with house-keeping roles. Each immunity-rich region appears to represent a distinct biological context for pathogen recognition and clearance (e.g. the humoral and gut epithelial responses). Several immunity gene families, such as peptidoglycan recognition proteins (PGRPs) and defensins, appear to be specialised for these distinct roles, while three genes with physically interacting protein products (LRIM1/APL1C/TEP1) are found in close proximity.</p> <p>Conclusions</p> <p>The map provides the first genome-scale, multi-experiment overview of gene expression in <it>A. gambiae </it>and should also be useful at the gene-level for investigating potential interactions. A web interface is available through the VectorBase website <url>http://www.vectorbase.org/</url>. It is regularly updated as new experimental data becomes available.</p

    End-to-End Entity Resolution for Big Data: A Survey

    Get PDF
    One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions

    The effect of silencing immunity related genes on longevity in a naturally occurring Anopheles arabiensis mosquito population from southwest Ethiopia

    Get PDF
    Background: Vector control remains the most important tool to prevent malaria transmission. However, it is now severely constrained by the appearance of physiological and behavioral insecticide resistance. Therefore, the development of new vector control tools is warranted. Such tools could include immunization of blood hosts of vector mosquitoes with mosquito proteins involved in midgut homeostasis (anti-mosquito vaccines) or genetic engineering of mosquitoes that can drive population-wide knockout of genes producing such proteins to reduce mosquito lifespan and malaria transmission probability. Methods: To achieve this, candidate genes related to midgut homeostasis regulation need to be assessed for their effect on mosquito survival. Here, different such candidate genes were silenced through dsRNA injection in the naturally occurring Anopheles arabiensis mosquitoes and the effect on mosquito survival was evaluated. Results: Significantly higher mortality rates were observed in the mosquitoes silenced for FN3D1 (AARA003032), FN3D3 (AARA007751) and GPRGr9 (AARA003963) genes as compared to the control group injected with dsRNA against a non-related bacterial gene (LacZ). This observed difference in mortality rate between the candidate genes and the control disappeared when gene-silenced mosquitoes were treated with antibiotic mixtures, suggesting that gut microbiota play a key role in the observed reduction of mosquito survival. Conclusions: We demonstrated that interference with the expression of the FN3D1, FN3D3 or GPRGr9 genes causes a significant reduction of the longevity of An. arabiensis mosquito in the wild

    Testing non-autonomous antimalarial gene drive effectors using self-eliminating drivers in the African mosquito vector Anopheles gambiae

    Get PDF
    Gene drives for mosquito population modification are novel tools for malaria control. Strategies to safely test antimalarial effectors in the field are required. Here, we modified the Anopheles gambiae zpg locus to host a CRISPR/Cas9 integral gene drive allele (zpgD) and characterized its behaviour and resistance profile. We found that zpgD dominantly sterilizes females but can induce efficient drive at other loci when it itself encounters resistance. We combined zpgD with multiple previously characterized non-autonomous payload drives and found that, as zpgD self-eliminates, it leads to conversion of mosquito cage populations at these loci. Our results demonstrate how self-eliminating drivers could allow safe testing of non-autonomous effector-traits by local population modification. They also suggest that after engendering resistance, gene drives intended for population suppression could nevertheless serve to propagate subsequently released non-autonomous payload genes, allowing modification of vector populations initially targeted for suppression

    Simplifying Entity Resolution on Web Data with Schema-agnostic, Non-iterative Matching

    Get PDF
    International audienceEntity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of descriptions published in the Web of Data. To address them, we propose the MinoanER framework that fulfills full automation and support of highly heterogeneous entities. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, indicated only by statistics. For high efficiency, similarities are computed from a set of schema-agnostic blocks and processed in a non-iterative way that involves four threshold-free heuristics. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low heterogeneity in terms of entity types and content. Yet, MinoanER outperforms state-of-the-art ER tools when matching highly heterogeneous KBs
    corecore