117 research outputs found
GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies
Phylogenetic inference, grounded in molecular evolution models, is essential
for understanding the evolutionary relationships in biological data. Accounting
for the uncertainty of phylogenetic tree variables, which include tree
topologies and evolutionary distances on branches, is crucial for accurately
inferring species relationships from molecular data and tasks requiring
variable marginalization. Variational Bayesian methods are key to developing
scalable, practical models; however, it remains challenging to conduct
phylogenetic inference without restricting the combinatorially vast number of
possible tree topologies. In this work, we introduce a novel, fully
differentiable formulation of phylogenetic inference that leverages a unique
representation of topological distributions in continuous geometric spaces.
Through practical considerations on design spaces and control variates for
gradient estimations, our approach, GeoPhy, enables variational inference
without limiting the topological candidates. In experiments using real
benchmark datasets, GeoPhy significantly outperformed other approximate
Bayesian methods that considered whole topologies.Comment: 23 pages, 5 figure
Predictions of RNA secondary structure by combining homologous sequence information
Motivation: Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted
Improving the accuracy of predicting secondary structure for aligned RNA sequences
Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms
Prediction of RNA secondary structure by maximizing pseudo-expected accuracy
<p>Abstract</p> <p>Background</p> <p>Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.</p> <p>Results</p> <p>Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.</p> <p>Conclusions</p> <p>This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p
Recombinant human FGF-2 for the treatment of early-stage osteonecrosis of the femoral head: TRION, a single-arm, multicenter, Phase II trial
Aim: This study aimed to evaluate the 2-year outcomes from a clinical trial of recombinant human FGF-2 (rhFGF-2) for osteonecrosis of the femoral head (ONFH). Patients & methods: Sixty-four patients with nontraumatic, precollapse and large ONFHs were percutaneously administered with 800 μg rhFGF-2 contained in gelatin hydrogel. Setting the end point of radiological collapse, we analyzed the joint preservation period of the historical control. Changes in two validated clinical scores, bone regeneration and safety were evaluated. Results: Radiological joint preservation time was significantly higher in the rhFGF-2 group than in the control group. The ONFHs tended to improve to smaller ONFHs. The postoperative clinical scores significantly improved. Thirteen serious adverse events showed recovery. Conclusion: rhFGF-2 treatment increases joint preservation time with clinical efficacy, radiological bone regeneration and safety
Parameters for accurate genome alignment
<p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p
- …