1,578 research outputs found

    Polytomy identification in microbial phylogenetic reconstruction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies) for simplicity, although polytomies (multifurcating branches) may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties.</p> <p>Results</p> <p>PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression) classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, 'LR' (Leaf rate), 'IntraR' (Intra-subset branch rate) and 'InterR' (Inter-subset branch rate), all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall) of 81% with about 0.9 area under the curve (AUC) of ROC.</p> <p>Conclusions</p> <p>PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from <url>http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip</url>.</p

    De novo Mutations From Whole Exome Sequencing in Neurodevelopmental and Psychiatric Disorders: From Discovery to Application.

    Get PDF
    Neurodevelopmental and psychiatric disorders are a highly disabling and heterogeneous group of developmental and mental disorders, resulting from complex interactions of genetic and environmental risk factors. The nature of multifactorial traits and the presence of comorbidity and polygenicity in these disorders present challenges in both disease risk identification and clinical diagnoses. The genetic component has been firmly established, but the identification of all the causative variants remains elusive. The development of next-generation sequencing, especially whole exome sequencing (WES), has greatly enriched our knowledge of the precise genetic alterations of human diseases, including brain-related disorders. In particular, the extensive usage of WES in research studies has uncovered the important contribution of de novo mutations (DNMs) to these disorders. Trio and quad familial WES are a particularly useful approach to discover DNMs. Here, we review the major WES studies in neurodevelopmental and psychiatric disorders and summarize how genes hit by discovered DNMs are shared among different disorders. Next, we discuss different integrative approaches utilized to interrogate DNMs and to identify biological pathways that may disrupt brain development and shed light on our understanding of the genetic architecture underlying these disorders. Lastly, we discuss the current state of the transition from WES research to its routine clinical application. This review will assist researchers and clinicians in the interpretation of variants obtained from WES studies, and highlights the need to develop consensus analytical protocols and validated lists of genes appropriate for clinical laboratory analysis, in order to reach the growing demands

    SeqRate: sequence-based protein folding type classification and rates prediction

    Get PDF
    Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines.We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs.Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html

    De novo Mutations From Whole Exome Sequencing in Neurodevelopmental and Psychiatric Disorders: From Discovery to Application

    Get PDF
    Neurodevelopmental and psychiatric disorders are a highly disabling and heterogeneous group of developmental and mental disorders, resulting from complex interactions of genetic and environmental risk factors. The nature of multifactorial traits and the presence of comorbidity and polygenicity in these disorders present challenges in both disease risk identification and clinical diagnoses. The genetic component has been firmly established, but the identification of all the causative variants remains elusive. The development of next-generation sequencing, especially whole exome sequencing (WES), has greatly enriched our knowledge of the precise genetic alterations of human diseases, including brain-related disorders. In particular, the extensive usage of WES in research studies has uncovered the important contribution of de novo mutations (DNMs) to these disorders. Trio and quad familial WES are a particularly useful approach to discover DNMs. Here, we review the major WES studies in neurodevelopmental and psychiatric disorders and summarize how genes hit by discovered DNMs are shared among different disorders. Next, we discuss different integrative approaches utilized to interrogate DNMs and to identify biological pathways that may disrupt brain development and shed light on our understanding of the genetic architecture underlying these disorders. Lastly, we discuss the current state of the transition from WES research to its routine clinical application. This review will assist researchers and clinicians in the interpretation of variants obtained from WES studies, and highlights the need to develop consensus analytical protocols and validated lists of genes appropriate for clinical laboratory analysis, in order to reach the growing demands

    ComPhy: Prokaryotic Composite Distance Phylogenies Inferred from Whole-Genome Gene Sets

    Get PDF
    doi:10.1186/1471-2105-10-S1-S5With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes."This work was supported in part by NSF/ITR-IIS-0407204.

    2-(4-tert-Butyl­phen­yl)-5-{3,4-dibutoxy-5-[5-(4-tert-butyl­phen­yl)-1,3,4-oxadiazol-2-yl]-2-thienyl}-1,3,4-oxadiazole

    Get PDF
    In the title compound, C36H44N4O4S, the dihedral angles between the central thio­phene ring and the pendent oxadiazole rings are 12.7 (2) and 13.7 (2)°, and the dihedral angles between the oxadiazole rings and their adjacent benzene rings are 6.1 (2) and 17.5 (2)°. An intra­molecular C—H⋯O inter­action may help to establish the conformation

    Photonic Localization of Interface Modes at the Boundary between Metal and Fibonacci Quasi-Periodic Structure

    Full text link
    We investigated on the interface modes in a heterostructure consisting of a semi-infinite metallic layer and a semi-infinite Fibonacci quasi-periodic structure. Various properties of the interface modes, such as their spatial localizations, self-similarities, and multifractal properties are studied. The interface modes decay exponentially in different ways and the modes in the lower stable gap possess highest spatial localization. A localization index is introduced to understand the localization properties of the interface modes. We found that the localization index of the interface modes in the upper stable gap will converge to two slightly different constants according to the parity of the Fibonacci generation. In addition, the localization-delocalization transition is also found in the interface modes of the transient gap.Comment: 20 pages, 5figure

    Full-length isoform transcriptome of the developing human brain provides further insights into autism.

    Get PDF
    Alternative splicing plays an important role in brain development, but its global contribution to human neurodevelopmental diseases (NDDs) requires further investigation. Here we examine the relationships between splicing isoform expression in the brain and de novo loss-of-function mutations from individuals with NDDs. We analyze the full-length isoform transcriptome of the developing human brain and observe differentially expressed isoforms and isoform co-expression modules undetectable by gene-level analyses. These isoforms are enriched in loss-of-function mutations and microexons, are co-expressed with a unique set of partners, and have higher prenatal expression. We experimentally test the effect of splice-site mutations and demonstrate exon skipping in five NDD risk genes, including SCN2A, DYRK1A, and BTRC. Our results suggest that the splice site mutation in BTRC reduces translational efficiency, likely affecting Wnt signaling through impaired degradation of β-catenin. We propose that functional effects of mutations should be investigated at the isoform- rather than gene-level resolution

    Evaluating China's fossil-fuel CO2 emissions from a comprehensive dataset of nine inventories

    Get PDF
    China s fossil-fuel CO2 (FFCO2) emissions accounted for approximately 28% of the global total FFCO2 in 2016. An accurate estimate of China s FFCO2 emissions is a prerequisite for global and regional carbon budget analyses and the monitoring of carbon emission reduction efforts. However, significant uncertainties and discrepancies exist in estimations of China s FFCO2 emissions due to a lack of detailed traceable emission factors (EFs) and multiple statistical data sources. Here, we evaluated China s FFCO2 emissions from nine published global and regional emission datasets. These datasets show that the total emissions increased from 3.4 (3.0 3.7) in 2000 to 9.8 (9.2 10.4) Gt CO2 yr-1 in 2016. The variations in these estimates were largely due to the different EF (0.491 0.746 t C per t of coal) and activity data. The large-scale patterns of gridded emissions showed a reasonable agreement, with high emissions being concentrated in major city clusters, and the standard deviation mostly ranged from 10% to 40% at the provincial level. However, patterns beyond the provincial scale varied significantly, with the top 5% of the grid level accounting for 50 % 90% of total emissions in these datasets. Our findings highlight the significance of using locally measured EF for Chinese coal. To reduce uncertainty, we recommend using physical CO2 measurements and use these values for dataset validation, key input data sharing (e.g., point sources), and finer-resolution validations at various levels
    corecore