93 research outputs found

    Open Biomedical Ontology-based Medline exploration

    Get PDF
    Abstract Background Effective Medline database exploration is critical for the understanding of high throughput experimental results and the development of novel hypotheses about the mechanisms underlying the targeted biological processes. While existing solutions enhance Medline exploration through different approaches such as document clustering, network presentations of underlying conceptual relationships and the mapping of search results to MeSH and Gene Ontology trees, we believe the use of multiple ontologies from the Open Biomedical Ontology can greatly help researchers to explore literature from different perspectives as well as to quickly locate the most relevant Medline records for further investigation. Results We developed an ontology-based interactive Medline exploration solution called PubOnto to enable the interactive exploration and filtering of search results through the use of multiple ontologies from the OBO foundry. The PubOnto program is a rich internet application based on the FLEX platform. It contains a number of interactive tools, visualization capabilities, an open service architecture, and a customizable user interface. It is freely accessible at: http://brainarray.mbni.med.umich.edu/brainarray/prototype/pubonto .http://deepblue.lib.umich.edu/bitstream/2027.42/112693/1/12859_2009_Article_3295.pd

    NGSQC: cross-platform quality analysis pipeline for deep sequencing data

    Get PDF
    Abstract Background While the accuracy and precision of deep sequencing data is significantly better than those obtained by the earlier generation of hybridization-based high throughput technologies, the digital nature of deep sequencing output often leads to unwarranted confidence in their reliability. Results The NGSQC (N ext G eneration S equencing Q uality C ontrol) pipeline provides a set of novel quality control measures for quickly detecting a wide variety of quality issues in deep sequencing data derived from two dimensional surfaces, regardless of the assay technology used. It also enables researchers to determine whether sequencing data related to their most interesting biological discoveries are caused by sequencing quality issues. Conclusions Next generation sequencing platforms have their own share of quality issues and there can be significant lab-to-lab, batch-to-batch and even within chip/slide variations. NGSQC can help to ensure that biological conclusions, in particular those based on relatively rare sequence alterations, are not caused by low quality sequencing.http://deepblue.lib.umich.edu/bitstream/2027.42/112794/1/12864_2010_Article_3466.pd

    Expansion of a novel endogenous retrovirus throughout the pericentromeres of modern humans

    Get PDF
    Abstract Background Approximately 8% of the human genome consists of sequences of retroviral origin, a result of ancestral infections of the germ line over millions of years of evolution. The most recent of these infections is attributed to members of the human endogenous retrovirus type-K (HERV-K) (HML-2) family. We recently reported that a previously undetected, large group of HERV-K (HML-2) proviruses, which are descendants of the ancestral K111 infection, are spread throughout human centromeres. Results Studying the genomes of certain cell lines and the DNA of healthy individuals that seemingly lack K111, we discover new HERV-K (HML-2) members hidden in pericentromeres of several human chromosomes. All are related through a common ancestor, termed K222, which is a virus that infected the germ line approximately 25 million years ago. K222 exists as a single copy in the genomes of baboons and high order primates, but not New World monkeys, suggesting that progenitor K222 infected the primate germ line after the split between New and Old World monkeys. K222 exists in modern humans at multiple loci spread across the pericentromeres of nine chromosomes, indicating it was amplified during the evolution of modern humans. Conclusions Copying of K222 may have occurred through recombination of the pericentromeres of different chromosomes during human evolution. Evidence of recombination between K111 and K222 suggests that these retroviral sequences have been templates for frequent cross-over events during the process of centromere recombination in humans.http://deepblue.lib.umich.edu/bitstream/2027.42/111301/1/13059_2015_Article_641.pd

    Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data

    Get PDF
    Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals ∼30–50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions

    BIOMEDICAL INFORMATICS

    No full text
    High throughput sequencing is without a doubt one of the most influential technological advances in biomedical research. Analyzing the large volume of sequence data generated by high throughput sequencing is a major chal-lenge. Current solutions may miss up to 30 % of the unique matches and are not suitable for longer high throughput sequences. In response, we developed AQUESA (Aligning Q-grams Using Ehanced Suffix Arrays); a sequence alignment algorithm designed for aligning high throughput sequencing results under the assumptions of variable length se-quences, total recall and multiple single base mismatches, single base in-sertions and single base deletions. We unite the “sort and search ” approach of enhanced suffix arrays with q-grams and extend perfect seed matches to a specified mismatch tolerance. Between the suffix array construction and alignment, valid q-grams in the reference and query are indexed so that q-grams in the reference overlap and q-grams in the query do not. These in-clusion vectors permit the re-use of suffix arrays and can themselves be re-used for future alignments. Our method retains the advantages of suffix ar-rays like favorable memory requirements and time complexities (ie. linear or near-linear) at the expense of disk space for indices. We are currently com-paring our implementation in terms of recall, time and machine resources t

    Complete genome sequencing and evolutionary analysis of HCV subtype 6xg from IDUs in Yunnan, China.

    No full text
    BackgroundHCV genotype 6 (HCV-6) typically circulates in Southeast Asia and exhibits the highest genetic diversity among the eight HCV genotypes. In our previous work, a group of HCV-6 sequences was not clearly classified. Here, we further characterized this HCV-6 variant and analyzed the evolutionary history of the enlarged HCV-6 family.MethodsBlood samples from eight HCV seropositive samples collected from intravenous drug users (IDUs) in 2014 in Yunnan Province, China. The full-length HCV genome sequences were amplified by using reverse transcription PCR followed by DNA sequencing and phylogenetic analysis. Bayesian evolutionary analysis was performed with the complete coding region sequences of subtype 6a-6xh.ResultsThe eight genomes had the same coding region of 9051 nucleotides. The complete coding region sequences of the eight HCV isolates formed a distinct phylogenetic group from the previously assigned HCV-6 subtypes (6a-6xf), however which clustered with 6xg reference sequences that were found in Kachin State, Myanmar, and recently assigned and released. The p-distances of the eight isolates to subtype 6a-6xf and 6xh ranged from 0.143 to 0.283. Based on the HCV-6 complete coding region sequences, we constructed a timescaled phylogenetic tree to reveal the HCV-6 evolutionary history, in which there were four HCV-6 phylogenetic subsets, whose median tMRCAs were 294.8, 388.5, 348.5 and 197.0 years ago, respectively. Subtype 6xg clustered into Subset I, and had the most recent common ancestor with subtype 6n, which dated back to 101.2 (95% HPD: 78.7, 125.8) years ago. The genetic evolutionary analysis further confirmed that subtype 6xg originated from Myanmar, and transmitted to Dehong through cross-border IDUs.ConclusionThe HCV-6 variant characterized in this study belonged to newly assigned subtype 6xg. Our finding further confirmed the assignment of 6xg. HCV-6 family was highly divers and had a complicated evolutionary history in Southeast Asia. It is necessary to further characterize HCV-6 genetics in this region
    corecore