152 research outputs found

    EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping.</p> <p>Methods</p> <p>EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site.</p> <p>Results</p> <p>The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

    Spectroscopic and biochemical correlations during the course of human lens aging

    Get PDF
    BACKGROUND: With age, the human lens accumulates variety of substances that absorbs and fluorescence, which explains the color of yellow, brunescent and nigrescent cataract in terms of aging. The aim of this study was to assess lens fluorophores with properties comparable to those of advanced glycated end products (AGEs) in relation to age in human lenses. These fluorescent compounds are believed to be involved in the development of cataract. METHODS: Spectroscopic (UV-Vis-NIR) and fluorescence photography (CCD-Digital based image analysis) studies were carried out in randomly selected intact human lenses (2–85 years). AGE-like fluorophores were also measured in water soluble and insoluble (alkali soluble) fractions of human lenses (20–80 years). RESULTS: Our experimental findings suggest that there was a progressive shift in the absorbance characteristic of intact lens in the range of λ(210 nm)-λ(470 nm). A relative increase in the absorptivity at λ((511–520 nm)), with age, was also observed. In addition, the ratio of absorptivity at λ((511–520 nm)) versus the maximum absorbance recorded at blue-end cut-off (210–470 nm) was also found to increase, with age. The fluorescent intensity in the intact lens at both UV-B (λ(Ex312 nm)) and UV-A (λ(Ex365 nm)) were found to be positively correlated (r(2 )= 0.91 & 0.94, respectively; Confidence interval 95%) upto 50 years of age. In addition, a concomitant changes in AGE- like fluorophores were also observed in the processed lens samples (soluble and insoluble fractions) along the age. A significant increase in the concentration of AGE- like fluorophores, both in intact and processed lens was observed during the period of 40 – 50 years. CONCLUSION: Based on the present investigation, it was concluded that significant changes do occur in the AGE-like fluorophores of human lenses during the period of 40–50 years

    Quail Genomics: a knowledgebase for Northern bobwhite

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Quail Genomics knowledgebase (<url>http://www.quailgenomics.info</url>) has been initiated to share and develop functional genomic data for Northern bobwhite (<it>Colinus virginianus</it>). This web-based platform has been designed to allow researchers to perform analysis and curate genomic information for this non-model species that has little supporting information in GenBank.</p> <p>Description</p> <p>A multi-tissue, normalized cDNA library generated for Northern bobwhite was sequenced using 454 Life Sciences next generation sequencing. The Quail Genomics knowledgebase represents the 478,142 raw ESTs generated from the sequencing effort in addition to assembled nucleotide and protein sequences including 21,980 unigenes annotated with meta-data. A normalized MySQL relational database was established to provide comprehensive search parameters where meta-data can be retrieved using functional and structural information annotation such as gene name, pathways and protein domain. Additionally, blast hit cutoff levels and microarray expression data are available for batch searches. A Gene Ontology (GO) browser from Amigo is locally hosted providing 8,825 unigenes that are putative orthologs to chicken genes. In an effort to address over abundance of Northern bobwhite unigenes (71,384) caused by non-overlapping contigs and singletons, we have built a pipeline that generates scaffolds/supercontigs by aligning partial sequence fragments against the indexed protein database of chicken to build longer sequences that can be visualized in a web browser. </p> <p>Conclusion</p> <p>Our effort provides a central repository for storage and a platform for functional interrogation of the Northern bobwhite sequences providing comprehensive GO annotations, meta-data and a scaffold building pipeline. The Quail Genomics knowledgebase will be integrated with Japanese quail (<it>Coturnix coturnix</it>) data in future builds and incorporate a broader platform for these avian species. </p

    Bioinformatics research in the Asia Pacific: a 2007 update

    Get PDF
    We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27–30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours

    The Transcriptome Analysis of Strongyloides stercoralis L3i Larvae Reveals Targets for Intervention in a Neglected Disease

    Get PDF
    BackgroundStrongyloidiasis is one of the most neglected diseases distributed worldwide with endemic areas in developed countries, where chronic infections are life threatening. Despite its impact, very little is known about the molecular biology of the parasite involved and its interplay with its hosts. Next generation sequencing technologies now provide unique opportunities to rapidly address these questions.Principal FindingsHere we present the first transcriptome of the third larval stage of S. stercoralis using 454 sequencing coupled with semi-automated bioinformatic analyses. 253,266 raw sequence reads were assembled into 11,250 contiguous sequences, most of which were novel. 8037 putative proteins were characterized based on homology, gene ontology and/or biochemical pathways. Comparison of the transcriptome of S. strongyloides with those of other nematodes, including S. ratti, revealed similarities in transcription of molecules inferred to have key roles in parasite-host interactions. Enzymatic proteins, like kinases and proteases, were abundant. 1213 putative excretory/secretory proteins were compiled using a new pipeline which included non-classical secretory proteins. Potential drug targets were also identified.ConclusionsOverall, the present dataset should provide a solid foundation for future fundamental genomic, proteomic and metabolomic explorations of S. stercoralis, as well as a basis for applied outcomes, such as the development of novel methods of intervention against this neglected parasite

    Validation of an NSP-based (negative selection pattern) gene family identification strategy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene family identification from ESTs can be a valuable resource for analysis of genome evolution but presents unique challenges in organisms for which the entire genome is not yet sequenced. We have developed a novel gene family identification method based on negative selection patterns (NSP) between family members to screen EST-generated contigs. This strategy was tested on five known gene families in Arabidopsis to see if individual paralogs could be identified with accuracy from EST data alone when compared to the actual gene sequences in this fully sequenced genome.</p> <p>Results</p> <p>The NSP method uniquely identified family members in all the gene families tested. Two members of the FtsH gene family, three members each of the PAL, RF1, and ribosomal L6 gene families, and four members of the CAD gene family were correctly identified. Additionally all ESTs from the representative contigs when checked against MapViewer data successfully identify the gene locus predicted.</p> <p>Conclusion</p> <p>We demonstrate the effectiveness of the NSP strategy in identifying specific gene family members in Arabidopsis using only EST data and we describe how this strategy can be used to identify many gene families in agronomically important crop species where they are as yet undiscovered.</p

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Major prospects for exploring canine vector borne diseases and novel intervention methods using 'omic technologies

    Get PDF
    Canine vector-borne diseases (CVBDs) are of major socioeconomic importance worldwide. Although many studies have provided insights into CVBDs, there has been limited exploration of fundamental molecular aspects of most pathogens, their vectors, pathogen-host relationships and disease and drug resistance using advanced, 'omic technologies. The aim of the present article is to take a prospective view of the impact that next-generation, 'omics technologies could have, with an emphasis on describing the principles of transcriptomic/genomic sequencing as well as bioinformatic technologies and their implications in both fundamental and applied areas of CVBD research. Tackling key biological questions employing these technologies will provide a 'systems biology' context and could lead to radically new intervention and management strategies against CVBDs

    Alu Exonization Events Reveal Features Required for Precise Recognition of Exons by the Splicing Machinery

    Get PDF
    Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing
    corecore