Search CORE

131 research outputs found

ESG: Extended Similarity Group method for automated protein function prediction

Author: Changsoon Park
Daisuke Kihara
Meghana Chitale
Troy Hawkins
Publication venue
Publication date: 15/08/2008
Field of study

We present here the Extended Similarity Group (ESG) method, which annotates query sequences with Gene Ontology (GO) terms by assigning probability to each annotation computed based on iterative PSI-BLAST searches. Conventionally sequence homology based function annotation methods, such as BLAST, retrieve function information from top hits with a significant score (E-values). In contrast, the PFP method, which we have presented previously, goes one step ahead in utilizing a PSI-BLAST result by considering very weak hits even an E-value of up to 100 and also by incorporating the functional association between GO terms (FAM matrix) computed using term co-occurrence frequencies in the UniProt database. PFP is very successful which is evidenced by the top rank in the function prediction category in CASP7 competition. Our new approach, ESG method, further improves the accuracy of PFP by essentially employing PFP in an iterative fashion. An advantage of ESG is that it is built in a rigorous statistical framework: Unlike PFP method that assigns a weighted score to each GO term, ESG assigns a probability based on weights computed using the E-value of each hit sequence on the path between the original query sequence and the current hit sequence

Crossref

Nature Precedings

The Greenhouse Stakes of Globalization

Author: Sébastien Dente
Troy Hawkins
Publication venue: 'IntechOpen'
Publication date: 14/03/2012
Field of study

IntechOpen

Crossref

Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP

Author: Chitale Meghana
Hawkins Troy
Kihara Daisuke
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance. Results Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae</it>, and <it>Plasmodium falciparum </it>(malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, <it>i.e. </it>Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the <it>funSim </it>score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the <it>funSim </it>score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted. Conclusion The analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.</p

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools

Author: Hawkins Troy
Kihara Daisuke
Yang Yifeng David
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

The immensely popular fields of cancer research and bioinformatics overlap in many different areas, e.g. large data repositories that allow for users to analyze data from many experiments (data handling, databases), pattern mining, microarray data analysis, and interpretation of proteomics data. There are many newly available resources in these areas that may be unfamiliar to most cancer researchers wanting to incorporate bioinformatics tools and analyses into their work, and also to bioinformaticians looking for real data to develop and test algorithms. This review reveals the interdependence of cancer research and bioinformatics, and highlight the most appropriate and useful resources available to cancer researchers. These include not only public databases, but general and specific bioinformatics tools which can be useful to the cancer researcher. The primary foci are function and structure prediction tools of protein genes. The result is a useful reference to cancer researchers and bioinformaticians studying cancer alike

Directory of Open Access Journals

PubMed Central

Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing

Author: Chen Yu-Hsiang
Cornetta Kenneth
Dinauer Mary
Gao Hongyu
Hawkins Troy
Jasti Aparna
Mockaitis Keithanne
Publication venue: 'MDPI AG'
Publication date: 01/01/2014
Field of study

Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

IUPUIScholarWorks

Directory of Open Access Journals

The Assessment of Fecal Volatile Organic Compounds in Healthy Infants: Electronic Nose Device Predicts Patient Demographics and Microbial Enterotype

Author: Baxter Nielson T.
Hawkins Troy B.
Hosfield Brian D.
Markel Troy A.
Pecoraro Anthony R.
Publication venue: 'Elsevier BV'
Publication date: 01/10/2020
Field of study

Background: The assessment of fecal volatile organic compounds (VOCs) has emerged as a noninvasive biomarker in many different pathologies. Before assessing whether VOCs can be used to diagnose intestinal diseases, including necrotizing enterocolitis (NEC), it is necessary to measure the impact of variable infant demographic factors on VOC signals. Materials and methods: Stool samples were collected from term infants at four hospitals in a large metropolitan area. Samples were heated, and fecal VOCs assessed by the Cyranose 320 Electronic Nose. Twenty-eight sensors were combined into an overall smellprint and were also assessed individually. 16s rRNA gene sequencing was used to categorize infant microbiomes. Smellprints were correlated to feeding type (formula versus breastmilk), sex, hospital of birth, and microbial enterotype. Overall smellprints were assessed by PERMANOVA with Euclidean distances, and individual sensors from each smellprint were assessed by Mann-Whitney U-tests. P < 0.05 was significant. Results: Overall smellprints were significantly different according to diet. Individual sensors were significantly different according to sex and hospital of birth, but overall smellprints were not significantly different. Using a decision tree model, two individual sensors could reliably predict microbial enterotype. Conclusions: Assessment of fecal VOCs with an electronic nose is impacted by several demographic characteristics of infants and can be used to predict microbiome composition. Further studies are needed to design appropriate algorithms that are able to predict NEC based on fecal VOC profiles

IUPUIScholarWorks

PubMed Central

ESG: Extended Similarity Group method for automated protein function prediction

Author: Changsoon Park
Daisuke Kihara
Daisuke Kihara
Meghana Chitale
Troy Hawkins
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Wicked but worth it: student perspectives on socio-hydrology

Author: Bach
Barabási
Blair
Cai
Carey
Churchman
Conklin
Duflo
Elshafei
Garcia
Gober
Grames
Hawkins
Kumar
Lane
Liu
Liu
Marston
Milly
Montanari
Montanari
Ostrom
Ostrom
Ostrom
Pande
Pataki
Poteete
Rajaram
Rittel
Savenije
Schlüter
Sivakumar
Sivapalan
Thompson
Troy
Viglione
Vogel
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

harvestPolicy AnalysisMulti Actor System

Crossref

TU Delft Repository

Spiral - Imperial College Digital Repository

Mutation in erythroid specific transcription factor KLF1 causes Hereditary Spherocytosis in the Nan hemolytic anemia mouse model

Author: Cornetta Kenneth
Fegley Barbara
Gibson Margaret I.
Hawkins Troy
Heruth Daniel P.
Lewing Karen B.
Logsdon Derek P.
Major Stephanie L.
Neville Kathleen A.
Nsumu Ndona N.
Peterson Kenneth R.
Sokolovsky Inna V.
White Robert A.
Woods Gerald M.
Publication venue: 'Elsevier BV'
Publication date: 01/11/2010
Field of study

KLF1 regulates definitive erythropoiesis of red blood cells by facilitating transcription through high affinity binding to CACCC elements within its erythroid specific target genes including those encoding erythrocyte membrane skeleton (EMS) proteins. Deficiencies of EMS proteins in humans lead to the hemolytic anemia Hereditary Spherocytosis (HS) which includes a subpopulation with no known genetic defect. Here we report that a mutation, E339D, in the second zinc finger domain of KLF1 is responsible for HS in the mouse model Nan. The causative nature of this mutation was verified with an allelic test cross between Nan/+ and heterozygous Klf1(+/-) knockout mice. Homology modeling predicted Nan KLF1 binds CACCC elements more tightly, suggesting that Nan KLF1 is a competitive inhibitor of wild-type KLF1. This is the first association of a KLF1 mutation with a disease state in adult mammals and also presents the possibility of being another causative gene for HS in humans

IUPUIScholarWorks

The genetic consequences of dog breed formation-Accumulation of deleterious genetic variation and fixation of mutations associated with myxomatous mitral valve disease in cavalier King Charles spaniels

Author: Axelsson Erik
Bhoumik Priyasma
Conn Laura Bas
Del Rio-Espinola Alberto
Engdahl Karolina
Epe Christian
Grenet Olivier
Gruet Philippe
Hagman Ragnvi
Hanson Jeanette
Hawkins Troy
Hedhammar Åke
Häggström Jens
Kryvokhyzha Dmytro
Lindblad-Toh Kerstin
Ljungvall Ingrid
Mane Shrinivas
Moggs Jonathan
Muren Eva
Ohlsson Åsa
Olsen Lisbeth Hoier
Pettersson Mats
Taillon Bruce
Tawari Nilesh
Publication venue
Publication date: 01/01/2021
Field of study

Selective breeding for desirable traits in strictly controlled populations has generated an extraordinary diversity in canine morphology and behaviour, but has also led to loss of genetic variation and random entrapment of disease alleles. As a consequence, specific diseases are now prevalent in certain breeds, but whether the recent breeding practice led to an overall increase in genetic load remains unclear. Here we generate whole genome sequencing (WGS) data from 20 dogs per breed from eight breeds and document a similar to 10% rise in the number of derived alleles per genome at evolutionarily conserved sites in the heavily bottlenecked cavalier King Charles spaniel breed (cKCs) relative to in most breeds studied here. Our finding represents the first clear indication of a relative increase in levels of deleterious genetic variation in a specific breed, arguing that recent breeding practices probably were associated with an accumulation of genetic load in dogs. We then use the WGS data to identify candidate risk alleles for the most common cause for veterinary care in cKCs-the heart disease myxomatous mitral valve disease (MMVD). We verify a potential link to MMVD for candidate variants near the heart specific NEBL gene in a dachshund population and show that two of the NEBL candidate variants have regulatory potential in heartderived cell lines and are associated with reduced NEBL isoform nebulette expression in papillary muscle (but not in mitral valve, nor in left ventricular wall). Alleles linked to reduced nebulette expression may hence predispose cKCs and other breeds to MMVD via loss of papillary muscle integrity

Epsilon Open Archive

Repository for Publications and Research Data

Publikationer från Uppsala Universitet

PubMed Central

Copenhagen University Research Information System

The Novartis Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas