76 research outputs found

    In search of the right literature search engine(s)

    Get PDF
    *Background*
Collecting scientific publications related to a specific topic is crucial for different phases of research, health care and ‘effective text mining’. Available bio-literature search engines vary in their ability to scan different sections of articles, for the user-provided search terms and/or phrases. Since a thorough scientific analysis of all major bibliographic tools has not been done, their selection has often remained subjective. We have considered most of the existing bio-literature search engines (http://www.shodhaka.com/startbioinfo/LitSearch.html) and performed an extensive analysis of 18 literature search engines, over a period of about 3 years. Eight different topics were taken and about 50 searches were performed using the selected search engines. The relevance of retrieved citations was carefully assessed after every search, to estimate the citation retrieval efficiency. Different other features of the search tools were also compared using a semi-quantitative method.
*Results*
The study provides the first tangible comparative account of relative retrieval efficiency, input and output features, resource coverage and a few other utilities of the bio-literature search tools. The results show that using a single search tool can lead to loss of up to 75% relevant citations in some cases. Hence, use of multiple search tools is recommended. But, it would also not be practical to use all or too many search engines. The detailed observations made in the study can assist researchers and health professionals in making a more objective selection among the search engines. A corollary study revealed relative advantages and disadvantages of the full-text scanning tools.
*Conclusion*
While many studies have attempted to compare literature search engines, important questions remained unanswered till date. Following are some of those questions, along with answers provided by the current study:
a)	Which tools should be used to get the maximum number of relevant citations with a reasonable effort? ANSWER: _Using PubMed, Scopus, Google Scholar and HighWire Press individually, and then compiling the hits into a union list is the best option. Citation-Compiler (http://www.shodhaka.com/compiler) can help to compile the results from each of the recommended tool._
b)	What is the approximate percentage of relevant citations expected to be lost if only one search engine is used? ANSWER: _About 39% of the total relevant citations were lost in searches across 4 topics; 49% hits were lost while using PubMed or HighWire Press, while 37% and 20% loss was noticed while using Google Scholar and Scopus, respectively._ 
c)	Which full text search engines can be recommended in general? ANSWER: _HighWire Press and Google Scholar._
d)	Among the mostly used search engines, which one can be recommended for best precision? ANSWER: _EBIMed._
e)	Among the mostly used search engines, which one can be recommended for best recall? ANSWER: _Depending on the type of query used, best recall could be obtained by HighWire Press or Scopus.

    Whole-Exome Sequencing Reveals High Mutational Concordance between Primary and Matched Recurrent Triple-Negative Breast Cancers

    Full text link
    PURPOSE Triple-negative breast cancer (TNBC) is a molecularly complex and heterogeneous breast cancer subtype with distinct biological features and clinical behavior. Although TNBC is associated with an increased risk of metastasis and recurrence, the molecular mechanisms underlying TNBC metastasis remain unclear. We performed whole-exome sequencing (WES) analysis of primary TNBC and paired recurrent tumors to investigate the genetic profile of TNBC. METHODS Genomic DNA extracted from 35 formalin-fixed paraffin-embedded tissue samples from 26 TNBC patients was subjected to WES. Of these, 15 were primary tumors that did not have recurrence, and 11 were primary tumors that had recurrence (nine paired primary and recurrent tumors). Tumors were analyzed for single-nucleotide variants and insertions/deletions. RESULTS The tumor mutational burden (TMB) was 7.6 variants/megabase in primary tumors that recurred (n = 9); 8.2 variants/megabase in corresponding recurrent tumors (n = 9); and 7.3 variants/megabase in primary tumors that did not recur (n = 15). MUC3A was the most frequently mutated gene in all groups. Mutations in MAP3K1 and MUC16 were more common in our dataset. No alterations in PI3KCA were detected in our dataset. CONCLUSIONS We found similar mutational profiles between primary and paired recurrent tumors, suggesting that genomic features may be retained during local recurrence

    A novel tissue-specific meta-analysis approach for gene expression predictions, initiated with a mammalian gene expression testis database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the recent years, there has been a rise in gene expression profiling reports. Unfortunately, it has not been possible to make maximum use of available gene expression data. Many databases and programs can be used to derive the possible expression patterns of mammalian genes, based on existing data. However, these available resources have limitations. For example, it is not possible to obtain a list of genes that are expressed in certain conditions. To overcome such limitations, we have taken up a new strategy to predict gene expression patterns using available information, for one tissue at a time.</p> <p>Results</p> <p>The first step of this approach involved manual collection of maximum data derived from large-scale (genome-wide) gene expression studies, pertaining to mammalian testis. These data have been compiled into a Mammalian Gene Expression Testis-database (MGEx-Tdb). This process resulted in a richer collection of gene expression data compared to other databases/resources, for multiple testicular conditions. The gene-lists collected this way in turn were exploited to derive a 'consensus' expression status for each gene, across studies. The expression information obtained from the newly developed database mostly agreed with results from multiple small-scale studies on selected genes. A comparative analysis showed that MGEx-Tdb can retrieve the gene expression information more efficiently than other commonly used databases. It has the ability to provide a clear expression status (transcribed or dormant) for most genes, in the testis tissue, under several specific physiological/experimental conditions and/or cell-types.</p> <p>Conclusions</p> <p>Manual compilation of gene expression data, which can be a painstaking process, followed by a consensus expression status determination for specific locations and conditions, can be a reliable way of making use of the existing data to predict gene expression patterns. MGEx-Tdb provides expression information for 14 different combinations of specific locations and conditions in humans (25,158 genes), 79 in mice (22,919 genes) and 23 in rats (14,108 genes). It is also the first system that can predict expression of genes with a 'reliability-score', which is calculated based on the extent of agreements and contradictions across gene-sets/studies. This new platform is publicly available at the following web address: <url>http://resource.ibab.ac.in/MGEx-Tdb/</url></p

    Characterization of glycine-N-acyltransferase like 1 (GLYATL1) in prostate cancer

    Full text link
    BackgroundRecent microarray and sequencing studies of prostate cancer showed multiple molecular alterations during cancer progression. It is critical to evaluate these molecular changes to identify new biomarkers and targets. We performed analysis of glycine-N-acyltransferase like 1 (GLYATL1) expression in various stages of prostate cancer in this study and evaluated the regulation of GLYATL1 by androgen.MethodWe performed in silico analysis of cancer gene expression profiling and transcriptome sequencing to evaluate GLYATL1 expression in prostate cancer. Furthermore, we performed immunohistochemistry using specific GLYATL1 antibody using high-density prostate cancer tissue microarray containing primary and metastatic prostate cancer. We also tested the regulation of GLYATL1 expression by androgen and ETS transcription factor ETV1. In addition, we performed RNA-sequencing of GLYATL1 modulated prostate cancer cells to evaluate the gene expression and changes in molecular pathways.ResultsOur in silico analysis of cancer gene expression profiling and transcriptome sequencing we revealed an overexpression of GLYATL1 in primary prostate cancer. Confirming these findings by immunohistochemistry, we show that GLYATL1 is overexpressed in primary prostate cancer compared with metastatic prostate cancer and benign prostatic tissue. Low-grade cancers had higher GLYATL1 expression compared to high-grade prostate tumors. Our studies showed that GLYATL1 is upregulated upon androgen treatment in LNCaP prostate cancer cells which harbors ETV1 gene rearrangement. Furthermore, ETV1 knockdown in LNCaP cells showed downregulation of GLYATL1 suggesting potential regulation of GLYATL1 by ETS transcription factor ETV1. Transcriptome sequencing using the GLYATL1 knockdown prostate cancer cell lines LNCaP showed regulation of multiple metabolic pathways.ConclusionsIn summary, our study characterizes the expression of GLYATL1 in prostate cancer and explores the regulation of its regulation in prostate cancer showing role for androgen and ETS transcription factor ETV1. Future studies are needed to decipher the biological significance of these findings.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/151252/1/pros23887.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/151252/2/pros23887_am.pd

    MGEx-Udb: A Mammalian Uterus Database for Expression-Based Cataloguing of Genes across Conditions, Including Endometriosis and Cervical Cancer

    Get PDF
    Gene expression profiling of uterus tissue has been performed in various contexts, but a significant amount of the data remains underutilized as it is not covered by the existing general resources.). The database can be queried with gene names/IDs, sub-tissue locations, as well as various conditions such as the cervical cancer, endometrial cycles and disorders, and experimental treatments. Accordingly, the output would be a) transcribed and dormant genes listed for the queried condition/location, or b) expression profile of the gene of interest in various uterine conditions. The results also include the reliability score for the expression status of each gene. MGEx-Udb also provides information related to Gene Ontology annotations, protein-protein interactions, transcripts, promoters, and expression status by other sequencing techniques, and facilitates various other types of analysis of the individual genes or co-expressed gene clusters.In brief, MGEx-Udb enables easy cataloguing of co-expressed genes and also facilitates bio-marker discovery for various uterine conditions

    GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species.

    No full text
    Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses.We developed the 'Genomic Repeat Element Analyzer for Mammals' (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species.GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/

    GREAM-Files

    No full text
    <p>The file set includes genome-wide statistics of the repeat element distribution, Homologene database and perl scripts for analysis</p
    corecore