206 research outputs found

    Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome

    Get PDF
    The serial analysis of gene expression (SAGE) method is used to study global gene expression in cells or tissues in various experimental conditions. However, its reproducibility has not yet been definitively assessed. In this study, we have evaluated the reproducibility of the SAGE method and identified the factors that affect it. The determination coefficient (R(2)) for the reproducibility of SAGE is 0.96. However, there are some factors that can affect the reproducibility of SAGE, such as the replication of concatemers and ditags, the number of sequenced tags and double PCR amplification of ditags. Thus, corrections for these factors must be made to ensure the reproducibility and accuracy of SAGE results. A bioinformatic analysis of SAGE data is also presented in order to eliminate these artifacts. Finally, the current study shows that increasing the number of sequenced tags improves the power of the method to detect transcripts and their regulation by experimental conditions

    DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data

    Full text link
    Abstract. Triple stores are the backbone of increasingly many Data Web appli-cations. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in gen-eral. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been con-verted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applica-tions against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more use-ful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the per-formance of triple stores is by far less homogeneous than suggested by previous benchmarks. 1

    Germline copy number variation in the YTHDC2 gene: does it have a role in finding a novel potential molecular target involved in pancreatic adenocarcinoma susceptibility?

    Get PDF
    Objective: The vast majority of pancreatic cancers occurs sporadically. The discovery of frequent variations in germline gene copy number can significantly influence the expression levels of genes that predispose to pancreatic adenocarcinoma. We prospectively investigated whether patients with sporadic pancreatic adenocarcinoma share specific gene copy number variations (CNVs) in their germline DNA. Patients and methods: DNA samples were analyzed from peripheral leukocytes from 72 patients with a diagnosis of sporadic pancreatic adenocarcinoma and from 60 controls using Affymetrix 500K array set. Multiplex ligation-dependent probe amplification (MLPA) assay was performed using a set of self-designed MLPA probes specific for seven target sequences. Results: We identified a CNV-containing DNA region associated with pancreatic cancer risk. This region shows a deletion of 1 allele in 36 of the 72 analyzed patients but in none of the controls. This region is of particular interest since it contains the YTHDC2 gene encoding for a putative DNA/RNA helicase, such protein being frequently involved in cancer susceptibility. Interestingly, 82.6% of Sicilian patients showed germline loss of one allele. Conclusions: Our results suggest that the YTHDC2 gene could be a potential candidate for pancreatic cancer susceptibility and a useful marker for early detection as well as for the development of possible new therapeutic strategies

    WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p> <p>Results</p> <p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p> <p>Conclusions</p> <p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p

    Semantic Web integration of Cheminformatics resources with the SADI framework

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The diversity and the largely independent nature of chemical research efforts over the past half century are, most likely, the major contributors to the current poor state of chemical computational resource and database interoperability. While open software for chemical format interconversion and database entry cross-linking have partially addressed database interoperability, computational resource integration is hindered by the great diversity of software interfaces, languages, access methods, and platforms, among others. This has, in turn, translated into limited reproducibility of computational experiments and the need for application-specific computational workflow construction and semi-automated enactment by human experts, especially where emerging interdisciplinary fields, such as systems chemistry, are pursued. Fortunately, the advent of the Semantic Web, and the very recent introduction of RESTful Semantic Web Services (SWS) may present an opportunity to integrate all of the existing computational and database resources in chemistry into a machine-understandable, unified system that draws on the entirety of the Semantic Web.</p> <p>Results</p> <p>We have created a prototype framework of Semantic Automated Discovery and Integration (SADI) framework SWS that exposes the QSAR descriptor functionality of the Chemistry Development Kit. Since each of these services has formal ontology-defined input and output classes, and each service consumes and produces RDF graphs, clients can automatically reason about the services and available reference information necessary to complete a given overall computational task specified through a simple SPARQL query. We demonstrate this capability by carrying out QSAR analysis backed by a simple formal ontology to determine whether a given molecule is drug-like. Further, we discuss parameter-based control over the execution of SADI SWS. Finally, we demonstrate the value of computational resource envelopment as SADI services through service reuse and ease of integration of computational functionality into formal ontologies.</p> <p>Conclusions</p> <p>The work we present here may trigger a major paradigm shift in the distribution of computational resources in chemistry. We conclude that envelopment of chemical computational resources as SADI SWS facilitates interdisciplinary research by enabling the definition of computational problems in terms of ontologies and formal logical statements instead of cumbersome and application-specific tasks and workflows.</p

    The Logic of Fashion Cycles

    Get PDF
    Many cultural traits exhibit volatile dynamics, commonly dubbed fashions or fads. Here we show that realistic fashion-like dynamics emerge spontaneously if individuals can copy others' preferences for cultural traits as well as traits themselves. We demonstrate this dynamics in simple mathematical models of the diffusion, and subsequent abandonment, of a single cultural trait which individuals may or may not prefer. We then simulate the coevolution between many cultural traits and the associated preferences, reproducing power-law frequency distributions of cultural traits (most traits are adopted by few individuals for a short time, and very few by many for a long time), as well as correlations between the rate of increase and the rate of decrease of traits (traits that increase rapidly in popularity are also abandoned quickly and vice versa). We also establish that alternative theories, that fashions result from individuals signaling their social status, or from individuals randomly copying each other, do not satisfactorily reproduce these empirical observations

    Word add-in for ontology recognition: semantic enrichment of scientific literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the current era of scientific research, efficient communication of information is paramount. As such, the nature of scholarly and scientific communication is changing; cyberinfrastructure is now absolutely necessary and new media are allowing information and knowledge to be more interactive and immediate. One approach to making knowledge more accessible is the addition of machine-readable semantic data to scholarly articles.</p> <p>Results</p> <p>The Word add-in presented here will assist authors in this effort by automatically recognizing and highlighting words or phrases that are likely information-rich, allowing authors to associate semantic data with those words or phrases, and to embed that data in the document as XML. The add-in and source code are publicly available at <url>http://www.codeplex.com/UCSDBioLit</url>.</p> <p>Conclusions</p> <p>The Word add-in for ontology term recognition makes it possible for an author to add semantic data to a document as it is being written and it encodes these data using XML tags that are effectively a standard in life sciences literature. Allowing authors to mark-up their own work will help increase the amount and quality of machine-readable literature metadata.</p
    corecore