12 research outputs found

    SkinGPT: A Dermatology Diagnostic System with Vision Large Language Model

    Full text link
    Skin and subcutaneous diseases are among the major causes of the nonfatal disease burden worldwide, affecting a significant proportion of the population. However, there are three major challenges in the field of dermatology diagnosis. Firstly, there is a shortage of dermatologists available to diagnose patients. Secondly, accurately diagnosing dermatological pictures can be challenging. Lastly, providing user-friendly diagnostic reports can be difficult. Recent advancements in the field of large language models (LLMs) have shown potential for clinical applications. However, current LLMs have difficulty processing images, and there are potential privacy concerns associated with using ChatGPT's API for uploading data. In this paper, we propose SkinGPT, which is the first dermatology diagnostic system that utilizes an advanced vision-based large language model. SkinGPT is the first system of its kind, incorporating a fine-tuned version of MiniGPT-4 with a vast collection of in-house skin disease images, accompanied by doctor's notes. With SkinGPT, users can upload their own skin photos for diagnosis, and the system can autonomously determine the characteristics and categories of skin conditions, perform analysis, and provide treatment recommendations. The ability to deploy it locally and protect user privacy makes SkinGPT an attractive option for patients seeking an accurate and reliable diagnosis of their skin conditions

    An Interpretable Computer-Aided Diagnosis Method for Periodontitis From Panoramic Radiographs

    Get PDF
    Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20–50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local communities and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and developing interpretable models is crucial for disease diagnosis. Based on these considerations, we propose an interpretable method called Deetal-Perio to predict the severity degree of periodontitis in dental panoramic radiographs. In our method, alveolar bone loss (ABL), the clinical hallmark for periodontitis diagnosis, could be interpreted as the key feature. To calculate ABL, we also propose a method for teeth numbering and segmentation. First, Deetal-Perio segments and indexes the individual tooth via Mask R-CNN combined with a novel calibration method. Next, Deetal-Perio segments the contour of the alveolar bone and calculates a ratio for individual tooth to represent ABL. Finally, Deetal-Perio predicts the severity degree of periodontitis given the ratios of all the teeth. The Macro F1-score and accuracy of the periodontitis prediction task in our method reach 0.894 and 0.896, respectively, on Suzhou data set, and 0.820 and 0.824, respectively on Zhongshan data set. The entire architecture could not only outperform state-of-the-art methods and show robustness on two data sets in both periodontitis prediction, and teeth numbering and segmentation tasks, but also be interpretable for doctors to understand the reason why Deetal-Perio works so well

    Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning [version 2; peer review: 2 approved, 3 approved with reservations]

    Get PDF
    Background The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models. Methods In this study, we developed SGPT-RL, which uses a generative pre-trained transformer (GPT) as the policy network of the reinforcement learning (RL) agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison. Results The results on the Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration. Conclusions The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates

    Repetitive DNA sequence detection and its role in the human genome

    No full text
    Abstract Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases

    AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning

    No full text
    Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341)

    Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning

    No full text
    Background: The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models. Methods: In this study, we developed Simplified molecular input line entry system Generative Pretrained Transformer with Reinforcement Learning (SGPT-RL), which uses a transformer decoder as the policy network of the reinforcement learning agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison. Results: The results on Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration. Conclusions: The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates

    DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

    No full text
    Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in the same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, Deep Regulatory Code and Tools for Alternative Polyadenylation (DeeReCT-APA), to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a convolutional neural network-long short-term memory (CNN-LSTM) architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo

    Global fertility in 204 countries and territories, 1950–2021, with forecasts to 2100: a comprehensive demographic analysis for the Global Burden of Disease Study 2021

    Get PDF
    BackgroundAccurate assessments of current and future fertility—including overall trends and changing population age structures across countries and regions—are essential to help plan for the profound social, economic, environmental, and geopolitical challenges that these changes will bring. Estimates and projections of fertility are necessary to inform policies involving resource and health-care needs, labour supply, education, gender equality, and family planning and support. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 produced up-to-date and comprehensive demographic assessments of key fertility indicators at global, regional, and national levels from 1950 to 2021 and forecast fertility metrics to 2100 based on a reference scenario and key policy-dependent alternative scenarios. MethodsTo estimate fertility indicators from 1950 to 2021, mixed-effects regression models and spatiotemporal Gaussian process regression were used to synthesise data from 8709 country-years of vital and sample registrations, 1455 surveys and censuses, and 150 other sources, and to generate age-specific fertility rates (ASFRs) for 5-year age groups from age 10 years to 54 years. ASFRs were summed across age groups to produce estimates of total fertility rate (TFR). Livebirths were calculated by multiplying ASFR and age-specific female population, then summing across ages 10–54 years. To forecast future fertility up to 2100, our Institute for Health Metrics and Evaluation (IHME) forecasting model was based on projections of completed cohort fertility at age 50 years (CCF50; the average number of children born over time to females from a specified birth cohort), which yields more stable and accurate measures of fertility than directly modelling TFR. CCF50 was modelled using an ensemble approach in which three sub-models (with two, three, and four covariates variously consisting of female educational attainment, contraceptive met need, population density in habitable areas, and under-5 mortality) were given equal weights, and analyses were conducted utilising the MR-BRT (meta-regression—Bayesian, regularised, trimmed) tool. To capture time-series trends in CCF50 not explained by these covariates, we used a first-order autoregressive model on the residual term. CCF50 as a proportion of each 5-year ASFR was predicted using a linear mixed-effects model with fixed-effects covariates (female educational attainment and contraceptive met need) and random intercepts for geographical regions. Projected TFRs were then computed for each calendar year as the sum of single-year ASFRs across age groups. The reference forecast is our estimate of the most likely fertility future given the model, past fertility, forecasts of covariates, and historical relationships between covariates and fertility. We additionally produced forecasts for multiple alternative scenarios in each location: the UN Sustainable Development Goal (SDG) for education is achieved by 2030; the contraceptive met need SDG is achieved by 2030; pro-natal policies are enacted to create supportive environments for those who give birth; and the previous three scenarios combined. Uncertainty from past data inputs and model estimation was propagated throughout analyses by taking 1000 draws for past and present fertility estimates and 500 draws for future forecasts from the estimated distribution for each metric, with 95% uncertainty intervals (UIs) given as the 2·5 and 97·5 percentiles of the draws. To evaluate the forecasting performance of our model and others, we computed skill values—a metric assessing gain in forecasting accuracy—by comparing predicted versus observed ASFRs from the past 15 years (2007–21). A positive skill metric indicates that the model being evaluated performs better than the baseline model (here, a simplified model holding 2007 values constant in the future), and a negative metric indicates that the evaluated model performs worse than baseline. FindingsDuring the period from 1950 to 2021, global TFR more than halved, from 4·84 (95% UI 4·63–5·06) to 2·23 (2·09–2·38). Global annual livebirths peaked in 2016 at 142 million (95% UI 137–147), declining to 129 million (121–138) in 2021. Fertility rates declined in all countries and territories since 1950, with TFR remaining above 2·1—canonically considered replacement-level fertility—in 94 (46·1%) countries and territories in 2021. This included 44 of 46 countries in sub-Saharan Africa, which was the super-region with the largest share of livebirths in 2021 (29·2% [28·7–29·6]). 47 countries and territories in which lowest estimated fertility between 1950 and 2021 was below replacement experienced one or more subsequent years with higher fertility; only three of these locations rebounded above replacement levels. Future fertility rates were projected to continue to decline worldwide, reaching a global TFR of 1·83 (1·59–2·08) in 2050 and 1·59 (1·25–1·96) in 2100 under the reference scenario. The number of countries and territories with fertility rates remaining above replacement was forecast to be 49 (24·0%) in 2050 and only six (2·9%) in 2100, with three of these six countries included in the 2021 World Bank-defined low-income group, all located in the GBD super-region of sub-Saharan Africa. The proportion of livebirths occurring in sub-Saharan Africa was forecast to increase to more than half of the world's livebirths in 2100, to 41·3% (39·6–43·1) in 2050 and 54·3% (47·1–59·5) in 2100. The share of livebirths was projected to decline between 2021 and 2100 in most of the six other super-regions—decreasing, for example, in south Asia from 24·8% (23·7–25·8) in 2021 to 16·7% (14·3–19·1) in 2050 and 7·1% (4·4–10·1) in 2100—but was forecast to increase modestly in the north Africa and Middle East and high-income super-regions. Forecast estimates for the alternative combined scenario suggest that meeting SDG targets for education and contraceptive met need, as well as implementing pro-natal policies, would result in global TFRs of 1·65 (1·40–1·92) in 2050 and 1·62 (1·35–1·95) in 2100. The forecasting skill metric values for the IHME model were positive across all age groups, indicating that the model is better than the constant prediction. InterpretationFertility is declining globally, with rates in more than half of all countries and territories in 2021 below replacement level. Trends since 2000 show considerable heterogeneity in the steepness of declines, and only a small number of countries experienced even a slight fertility rebound after their lowest observed rate, with none reaching replacement level. Additionally, the distribution of livebirths across the globe is shifting, with a greater proportion occurring in the lowest-income countries. Future fertility rates will continue to decline worldwide and will remain low even under successful implementation of pro-natal policies. These changes will have far-reaching economic and societal consequences due to ageing populations and declining workforces in higher-income countries, combined with an increasing share of livebirths among the already poorest regions of the world. FundingBill & Melinda Gates Foundation
    corecore