23 research outputs found
Accelerated Profile HMM Searches
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches
The Effect of Rural-to-Urban Migration on Obesity and Diabetes in India: A Cross-Sectional Study
Shah Ebrahim and colleagues examine the distribution of obesity, diabetes, and other cardiovascular risk factors among urban migrant factory workers in India, together with their rural siblings. The investigators identify patterns of change of cardiovascular risk factors associated with urban migration
Reduced Quantitative Ultrasound Bone Mineral Density in HIV-Infected Patients on Antiretroviral Therapy in Senegal
Background: Bone status in HIV-infected patients on antiretroviral treatment (ART) is poorly documented in resource-limited settings. We compared bone mineral density between HIV-infected patients and control subjects from Dakar, Senegal. Methods: A total of 207 (134 women and 73 men) HIV-infected patients from an observational cohort in Dakar (ANRS 1215) and 207 age-and sex-matched controls from the general population were enrolled. Bone mineral density was assessed by quantitative ultrasound (QUS) at the calcaneus, an alternative to the reference method (i.e. dual X-absorptiometry), often not available in resource-limited countries. Results: Mean age was 47.0 (+/- 8.5) years. Patients had received ART for a median duration of 8.8 years; 45% received a protease inhibitor and 27% tenofovir; 84% had undetectable viral load. Patients had lower body mass index (BMI) than controls (23 versus 26 kg/m(2), P<0.001). In unadjusted analysis, QUS bone mineral density was lower in HIV-infected patients than in controls (difference: -0.36 standard deviation, 95% confidence interval (CI): -0.59;-0.12, P = 0.003). Adjusting for BMI, physical activity, smoking and calcium intake attenuated the difference (-0.27, CI: -0.53; -0.002, P = 0.05). Differences in BMI between patients and controls explained a third of the difference in QUS bone mineral density. Among patients, BMI was independently associated with QUS bone mineral density (P<0.001). An association between undetectable viral load and QUS bone density was also suggested (beta = 0.48, CI: 0.02; 0.93; P = 0.04). No association between protease inhibitor or tenofovir use and QUS bone mineral density was found. Conclusion: Senegalese HIV-infected patients had reduced QUS bone mineral density in comparison with control subjects, in part related to their lower BMI. Further investigation is needed to clarify the clinical significance of these observations
Ecological Thresholds in the Savanna Landscape: Developing a Protocol for Monitoring the Change in Composition and Utilisation of Large Trees
BACKGROUND: Acquiring greater understanding of the factors causing changes in vegetation structure -- particularly with the potential to cause regime shifts -- is important in adaptively managed conservation areas. Large trees (> or =5 m in height) play an important ecosystem function, and are associated with a stable ecological state in the African savanna. There is concern that large tree densities are declining in a number of protected areas, including the Kruger National Park, South Africa. In this paper the results of a field study designed to monitor change in a savanna system are presented and discussed. METHODOLOGY/PRINCIPAL FINDINGS: Developing the first phase of a monitoring protocol to measure the change in tree species composition, density and size distribution, whilst also identifying factors driving change. A central issue is the discrete spatial distribution of large trees in the landscape, making point sampling approaches relatively ineffective. Accordingly, fourteen 10 m wide transects were aligned perpendicular to large rivers (3.0-6.6 km in length) and eight transects were located at fixed-point photographic locations (1.0-1.6 km in length). Using accumulation curves, we established that the majority of tree species were sampled within 3 km. Furthermore, the key ecological drivers (e.g. fire, herbivory, drought and disease) which influence large tree use and impact were also recorded within 3 km. CONCLUSIONS/SIGNIFICANCE: The technique presented provides an effective method for monitoring changes in large tree abundance, size distribution and use by the main ecological drivers across the savanna landscape. However, the monitoring of rare tree species would require individual marking approaches due to their low densities and specific habitat requirements. Repeat sampling intervals would vary depending on the factor of concern and proposed management mitigation. Once a monitoring protocol has been identified and evaluated, the next stage is to integrate that protocol into a decision-making system, which highlights potential leading indicators of change. Frequent monitoring would be required to establish the rate and direction of change. This approach may be useful in generating monitoring protocols for other dynamic systems
Federated learning enables big data for rare cancer boundary detection.
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
Author Correction: Federated learning enables big data for rare cancer boundary detection.
10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
Using naïve Bayesian classification as a meta-predictor to improve start codon prediction accuracy in prokaryotic organisms
Modern gene location prediction techniques are able to achieve near-perfect accuracy for prokaryotic organisms, but this reported accuracy is generally only for the stop codon locations. Accurate prediction of the start codon locations is more difficult to attain, and different approaches often produce conflicting predictions for the same gene. In this paper, we describe a new approach to resolve these conflicts and improve start codon prediction accuracy. Our approach uses a set of gene location prediction results from other popular prediction approaches to find consistently predicted gene locations. It then uses these consistent genes as a training set for a naïve Bayesian classifier to improve accuracy in the ambiguous genes, those in which there are some inconsistencies in the predicted start codon location among the original predictions. The methods detailed here apply to prokaryotic organisms, using E. coli and the EcoGene Verified Set database as a case study