2,241 research outputs found

    Structural term extraction for expansion of template-based genomic queries

    Get PDF
    This paper describes our experiments run to address the ad hoc task of the TREC 2005 Genomics track. The task topics were expressed with 5 different structures called Generic Topic Templates (GTTs). We hypothesized the presence of GTT-specific structural terms in the free-text fields of documents relevant to a topic instantiated from that same GTT. Our experiments aimed at extracting and selecting candidate structural terms for each GTT. Selected terms were used to expand initial queries and the quality of the term selection was measured by the impact of the expansion on initial search results. The evaluation used the task training topics and the associated relevance information. This paper describes the two term extraction methods used in the experiments and the resulting two runs sent to NIST for evaluation

    Endonuclease-containing Penelope retrotransposons in the bdelloid rotifer Adineta vaga exhibit unusual structural features and play a role in expansion of host gene families

    Get PDF
    © The Author(s), 2013. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Mobile DNA 4 (2013): 19, doi:10.1186/1759-8753-4-19.Penelope-like elements (PLEs) are an enigmatic group of retroelements sharing a common ancestor with telomerase reverse transcriptases. In our previous studies, we identified endonuclease-deficient PLEs that are associated with telomeres in bdelloid rotifers, small freshwater invertebrates best known for their long-term asexuality and high foreign DNA content. Completion of the high-quality draft genome sequence of the bdelloid rotifer Adineta vaga provides us with the opportunity to examine its genomic transposable element (TE) content, as well as TE impact on genome function and evolution. We performed an exhaustive search of the A. vaga genome assembly, aimed at identification of canonical PLEs combining both the reverse transcriptase (RT) and the GIY-YIG endonuclease (EN) domains. We find that the RT/EN-containing Penelope families co-exist in the A. vaga genome with the EN-deficient RT-containing Athena retroelements. Canonical PLEs are present at very low copy numbers, often as a single-copy, and there is no evidence that they might preferentially co-mobilize EN-deficient PLEs. We also find that Penelope elements can participate in expansion of A. vaga multigene families via trans-action of their enzymatic machinery, as evidenced by identification of intron-containing host genes framed by the Penelope terminal repeats and characteristic target-site duplications generated upon insertion. In addition, we find that Penelope open reading frames (ORFs) in several families have incorporated long stretches of coding sequence several hundred amino acids (aa) in length that are highly enriched in asparagine residues, a phenomenon not observed in other retrotransposons. Our results show that, despite their low abundance and low transcriptional activity in the A. vaga genome, endonuclease-containing Penelope elements can participate in expansion of host multigene families. We conclude that the terminal repeats represent the cis-acting sequences required for mobilization of the intervening region in trans by the Penelope-encoded enzymatic activities. We also hypothesize that the unusual capture of long N-rich segments by the Penelope ORF occurs as a consequence of peculiarities of its replication mechanism. These findings emphasize the unconventional nature of Penelope retrotransposons, which, in contrast to all other retrotransposon types, are capable of dispersing intron-containing genes, thereby questioning the validity of traditional estimates of gene retrocopies in PLE-containing eukaryotic genomes.This research was supported by grants MCB-0821956 and MCB-1121334 from the U.S. National Science Foundation to I.A

    The ATP-Binding Cassette Proteins of the Deep-Branching Protozoan Parasite Trichomonas vaginalis

    Get PDF
    The ATP binding cassette (ABC) proteins are a family of membrane transporters and regulatory proteins responsible for diverse and critical cellular process in all organisms. To date, there has been no attempt to investigate this class of proteins in the infectious parasite Trichomonas vaginalis. We have utilized a combination of bioinformatics, gene sequence analysis, gene expression and confocal microscopy to investigate the ABC proteins of T. vaginalis. We demonstrate that, uniquely among eukaryotes, T. vaginalis possesses no intact full-length ABC transporters and has undergone a dramatic expansion of some ABC protein sub-families. Furthermore, we provide preliminary evidence that T. vaginalis is able to read through in-frame stop codons to express ABC transporter components from gene pairs in a head-to-tail orientation. Finally, with confocal microscopy we demonstrate the expression and endoplasmic reticulum localization of a number of T. vaginalis ABC transporters

    Identification of the Schistosoma mansoni TNF-Alpha Receptor Gene and the Effect of Human TNF-Alpha on the Parasite Gene Expression Profile

    Get PDF
    Schistosoma mansoni is the major causative agent of schistosomiasis in the Americas. This parasite takes advantage of host signaling molecules such as cytokines and hormones to complete its development inside the host. Tumor necrosis factor-alpha (TNF-α) is one of the most important host cytokines involved in the inflammatory response. When cercariae, the infective stage, penetrates the human skin the release of TNF-α is started. In this work the authors describe the complete sequence of a possible TNF-α receptor in S. mansoni and detect that the receptor is most highly expressed in cercariae among all life cycle stages. Aiming to mimic the situation at the site of skin penetration, cercariae were mechanically transformed in vitro into schistosomula and exposed to human TNF-α. Exposure of early-developing schistosomula to the human hormone caused a large-scale change in the expression of parasite genes. Exposure of adult worms to human TNF-α caused gene expression changes as well, and the set of parasite altered genes in the adult parasite was different from that of schistosomula. This work increases the number of known signaling pathways of the parasite, and opens new perspectives into understanding the molecular components of TNF-α response as well as into possibly interfering with parasite–host interaction

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    A cooperative framework for molecular biology database integration using image object selection.

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    Abundance, distribution and functional characterisation of gut-associated Type II toxin-antitoxin systems

    Get PDF
    Prokaryotic toxin-antitoxin (TA) systems (also known as addiction modules), are ubiquitous genetic modules first discovered due to their role in stabilising vertical transmission of plasmids. Generally they are two-gene systems encoding a stable toxin (Tx) and an unstable antitoxin (ATx). Loss of the TA module leads to rapid ATx degradation and depletion, leaving the Tx free to interact with cellular targets and inhibit growth. For plasmid encoded TA systems, this leads to the death of plasmid free daughter cells, ensuring plasmid maintenance in a population, and gives rise to the term "addiction module". More recently, the expansion in microbial genome data has highlighted the prevalence and diversity of TA systems, and demonstrated that they are common features of many bacterial chromosomes. In addition, metagenomic surveys have pointed to the enrichment of some TA families in particular microbial ecosystems; a prime example from surveys of the human gut microbiome and RelBE TA family. Collectively, these observations indicate a wider role for TA modules in bacterial function, with numerous roles for TA systems now hypothesised. These include: i) Stabilisation of TA associated chromosomal DNA during vertical transmission; ii) Formation of "persister" cells resistant to environmental stresses, and; iii) Population level resistance to bacteriophage attack. Additionally, some Tx components have shown activity in eukaryotic cells, raising the potential for a role in prokaryote-eukaryote interaction. Here we undertook a systematic study of Type II TA systems, to provide a comprehensive assessment of their distribution and relative abundance, to confirm activity of prevalent TA systems, and to understand putative roles these may play in gut associated bacteria and the gut microbiome. A comparative genomic and metagenomic analysis of 3919 bacterial chromosomes, 4580 plasmids, 711 bacteriophage genomes, and 781 metagenomes encompassing 16 distinct habitats was conducted using all known Type II TA systems present in the Toxin Antitoxin Database (~10,100 TA genes ~1:1 Tx:ATx). Of the 817 Type II TA system homologues found in human gut datasets, 686 were observed to have significantly higher relative abundance in the human gut microbiome over other microbial ecosystems. In parallel to these in silico findings, PCR and qPCR surveys of microbiomes from 65 stool samples obtained from healthy volunteers, as well as those with polyps or colorectal cancer, were undertaken. This demonstrated a higher ATx presence than Tx or complete module, however no differences in Tx copy number between health groups was seen. To confirm the activity of the most abundant TA system homologues identified in sequence surveys, ORFs were amplified from gut metagenomic DNA, and individual Tx or ATx cloned under the control of inducible promoters. Induction of Tx expression under normal growth conditions resulted in bacterial growth inhibition, while live dead staining showed entry into a viable but non-cultivatable state, commensurate with TA function. Experiments simulating environmental stresses encountered during colonisation of the GI tract (starvation, low pH, bile), indicated that expression of these TA systems could increase cell survival when carbon or nitrogen availability was limited (starvation). Since antibiotics are also commonly encountered by gut associated-bacteria (both as residents of the GI tract and during colonisation of other body sties) a role for gut associated TA systems in facilitating survival during antibiotic exposure was also explored. This revealed an increased number of cells surviving two hours post-treatment with β-lactams when Tx genes were expressed, and in keeping with an impact on cell growth. To test the hypothesis that TA systems may stabilize associated regions of DNA, the composition of gene neighbourhoods surrounding TA systems were also explored. ORFs surrounding TA system homologues identified in metagenomic and genomic datasets were identified using the Metagene annotator, and ORF functions predicted based on searches of the Clusters of Orthlogous Groups (COG) database. This revealed significant increases in ORFs with functions related to replication/recombination/repair and those with unknown functions. It also identified a decrease in the proportion of ORFs encoding functions such as carbohydrate and lipid transport and metabolism in regions surrounding TA systems, suggesting involvement with stabilization of mobile elements. Finally, we explored the potential for gut associated TA systems to modulate phage-microbe, and host-microbe interactions. In the case of phage-host interactions, TA systems have previously been shown to function as mediators of phage resistance at the population level, by directing cells towards a dormant state which prevents phage replication, and permits a sub-set of cells to survive phage attack. Our findings indicated the potential for gut associated TA systems to provide some degree of protection during particular host-phage interactions, but specific modules did not provide universal protection against phage. In the case of host-microbe interaction, some Type II TA system Tx components have been shown to be functional in cultured eukaryotic cells, promoting apoptosis when introduced and expressed in these cell types. However, no studies to date have examined the potential for bacterially expressed TA systems to influence eukaryotic cell health in co-culture models. To investigate this, we assessed the impact of bacterial TA system expression on the health of the intestinal epithelial cell line Caco-2 in co-culture systems specifically focusing on cell apoptosis and necrosis whilst in the presence of Escherichia coli expressing p22-RelBE
    corecore