99 research outputs found
Domain-Agnostic Molecular Generation with Self-feedback
The generation of molecules with desired properties has gained tremendous
popularity, revolutionizing the way scientists design molecular structures and
providing valuable support for chemical and drug design. However, despite the
potential of language models in molecule generation, they face numerous
challenges such as the generation of syntactically or chemically flawed
molecules, narrow domain focus, and limitations in creating diverse and
directionally feasible molecules due to a dearth of annotated data or external
molecular databases. To this end, we introduce MolGen, a pre-trained molecular
language model tailored specifically for molecule generation. MolGen acquires
intrinsic structural and grammatical insights by reconstructing over 100
million molecular SELFIES, while facilitating knowledge transfer between
different domains through domain-agnostic molecular prefix tuning. Moreover, we
present a self-feedback paradigm that inspires the pre-trained model to align
with the ultimate goal of producing molecules with desirable properties.
Extensive experiments on well-known benchmarks confirm MolGen's optimization
capabilities, encompassing penalized logP, QED, and molecular docking
properties. Further analysis shows that MolGen can accurately capture molecule
distributions, implicitly learn their structural characteristics, and
efficiently explore chemical space. The pre-trained model, codes, and datasets
are publicly available for future research at https://github.com/zjunlp/MolGen.Comment: Work in progress. Add results of binding affinit
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Large Language Models (LLMs), with their remarkable task-handling
capabilities and innovative outputs, have catalyzed significant advancements
across a spectrum of fields. However, their proficiency within specialized
domains such as biomolecular studies remains limited. To address this
challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive
instruction dataset expressly designed for the biomolecular realm.
Mol-Instructions is composed of three pivotal components: molecule-oriented
instructions, protein-oriented instructions, and biomolecular text
instructions, each curated to enhance the understanding and prediction
capabilities of LLMs concerning biomolecular features and behaviors. Through
extensive instruction tuning experiments on the representative LLM, we
underscore the potency of Mol-Instructions to enhance the adaptability and
cognitive acuity of large models within the complex sphere of biomolecular
studies, thereby promoting advancements in the biomolecular research community.
Mol-Instructions is made publicly accessible for future research endeavors and
will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add
quantitative evaluation
Recommended from our members
Dissociate lattice oxygen redox reactions from capacity and voltage drops of battery electrodes.
The oxygen redox (OR) activity is conventionally considered detrimental to the stability and kinetics of batteries. However, OR reactions are often confused by irreversible oxygen oxidation. Here, based on high-efficiency mapping of resonant inelastic x-ray scattering of both the transition metal and oxygen, we distinguish the lattice OR in Na0.6[Li0.2Mn0.8]O2 and compare it with Na2/3[Mg1/3Mn2/3]O2. Both systems display strong lattice OR activities but with distinct electrochemical stability. The comparison shows that the substantial capacity drop in Na0.6[Li0.2Mn0.8]O2 stems from non-lattice oxygen oxidations, and its voltage decay from an increasing Mn redox contribution upon cycling, contrasting those in Na2/3[Mg1/3Mn2/3]O2. We conclude that lattice OR is not the ringleader of the stability issue. Instead, irreversible oxygen oxidation and the changing cationic reactions lead to the capacity and voltage fade. We argue that lattice OR and other oxygen activities should/could be studied and treated separately to achieve viable OR-based electrodes
BMP10 preserves cardiac function through its dual activation of SMAD-mediated and STAT3-mediated pathways
Bone morphogenetic protein 10 (BMP10) is a cardiac peptide growth factor belonging to the transforming growth factor β superfamily that critically controls cardiovascular development, growth, and maturation. It has been shown that BMP10 elicits its intracellular signaling through a receptor complex of activin receptor-like kinase 1 with morphogenetic protein receptor type II or activin receptor type 2A. Previously, we generated and characterized a transgenic mouse line expressing BMP10 from the α-myosin heavy chain gene promoter and found that these mice have normal cardiac hypertrophic responses to both physiological and pathological stimuli. In this study, we report that these transgenic mice exhibit significantly reduced levels of cardiomyocyte apoptosis and cardiac fibrosis in response to a prolonged administration of the β-adrenoreceptor agonist isoproterenol. We further confirmed this cardioprotective function with a newly generated conditional Bmp10 transgenic mouse line, in which Bmp10 was activated in adult hearts by tamoxifen. Moreover, the intraperitoneal administration of recombinant human BMP10 was found to effectively protect hearts from injury, suggesting potential therapeutic utility of using BMP10 to prevent heart failure. Gene profiling and biochemical analyses indicated that BMP10 activates the SMAD-mediated canonical pathway and, unexpectedly, also the signal transducer and activator of transcription 3 (STAT3)-mediated signaling pathway both in vivo and in vitro Additional findings further supported the notion that BMP10's cardioprotective function likely is due to its dual activation of SMAD- and STAT3-regulated signaling pathways, promoting cardiomyocyte survival and suppressing cardiac fibrosis
Genotype Profile of Global EYS-Associated Inherited Retinal Dystrophy and Clinical Findings in a Large Chinese Cohort
PurposeThe aim of this study was to probe the global profile of the EYS-associated genotype-phenotype trait in the worldwide reported IRD cases and to build a model for predicting disease progression as a reference for clinical consultation.MethodsThis retrospective study of 420 well-documented IRD cases with mutations in the EYS gene included 39 patients from a genotype-phenotype study of inherited retinal dystrophy (IRD) conducted at the Beijing Institute of Ophthalmology and 381 cases retrieved from global reports. All patients underwent ophthalmic evaluation. Mutations were revealed using next-generation sequencing, followed by Sanger DNA sequencing and real-time quantitative PCR analysis. Multiple regression models and statistical analysis were used to assess the genotype and phenotype characteristics and traits in this large cohort.ResultsA total of 420 well-defined patients with 841 identified mutations in the EYS gene were successfully obtained. The most common pathogenic variant was a frameshift c.4957dupA (p.S1653Kfs∗2) in exon 26, with an allele frequency of 12.7% (107/841), followed by c.8805C > A (p.Y2935X) in exon 43, with an allele frequency of 5.9% (50/841). Two new hot spots were identified in the Chinese cohort, c.1750G > T (p.E584X) and c.7492G > C (p.A2498P). Several EYS mutation types were identified, with CNV being relatively common. The mean age of onset was 20.54 ± 11.33 (4–46) years. Clinical examinations revealed a typical progression of RPE atrophy from the peripheral area to the macula.ConclusionThis large global cohort of 420 IRD cases, with 262 distinct variants, identified genotype-phenotype correlations and mutation spectra with hotspots in the EYS gene
Recommended from our members
Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients
Circulating tumor cells (CTCs) enter peripheral blood from primary tumors and seed metastases. The genome sequencing of CTCs could offer noninvasive prognosis or even diagnosis, but has been hampered by low single-cell genome coverage of scarce CTCs. Here, we report the use of the recently developed multiple annealing and looping-based amplification cycles for whole-genome amplification of single CTCs from lung cancer patients. We observed characteristic cancer-associated single-nucleotide variations and insertions/deletions in exomes of CTCs. These mutations provided information needed for individualized therapy, such as drug resistance and phenotypic transition, but were heterogeneous from cell to cell. In contrast, every CTC from an individual patient, regardless of the cancer subtypes, exhibited reproducible copy number variation (CNV) patterns, similar to those of the metastatic tumor of the same patient. Interestingly, different patients with the same lung cancer adenocarcinoma (ADC) shared similar CNV patterns in their CTCs. Even more interestingly, patients of small-cell lung cancer have CNV patterns distinctly different from those of ADC patients. Our finding suggests that CNVs at certain genomic loci are selected for the metastasis of cancer. The reproducibility of cancer-specific CNVs offers potential for CTC-based cancer diagnostics.Chemistry and Chemical Biolog
Design and implementation of the START (STem cells for ARDS Treatment) trial, a phase 1/2 trial of human mesenchymal stem/stromal cells for the treatment of moderate-severe acute respiratory distress syndrome
Background: Despite advances in supportive care, moderate-severe acute respiratory distress syndrome (ARDS) is associated with high mortality rates, and novel therapies to treat this condition are needed. Compelling pre-clinical data from mouse, rat, sheep and ex vivo perfused human lung models support the use of human mesenchymal stem (stromal) cells (MSCs) as a novel intravenous therapy for the early treatment of ARDS. Methods: This article describes the study design and challenges encountered during the implementation and phase 1 component of the START (STem cells for ARDS Treatment) trial, a phase 1/2 trial of bone marrow-derived human MSCs for moderate-severe ARDS. A trial enrolling 69 subjects is planned (9 subjects in phase 1, 60 subjects in phase 2 treated with MSCs or placebo in a 2:1 ratio). Results: This report describes study design features that are unique to a phase 1 trial in critically ill subjects and the specific challenges of implementation of a cell-based therapy trial in the ICU. Conclusions: Experience gained during the design and implementation of the START study will be useful to investigators planning future phase 1 clinical trials based in the ICU, as well as trials of cell-based therapy for other acute illnesses. Trial registration Clinical Trials Registration: NCT01775774 and NCT02097641
QKI is a critical pre-mRNA alternative splicing regulator of cardiac myofibrillogenesis and contractile function
The RNA-binding protein QKI belongs to the hnRNP K-homology domain protein family, a well-known regulator of pre-mRNA alternative splicing and is associated with several neurodevelopmental disorders. Qki is found highly expressed in developing and adult hearts. By employing the human embryonic stem cell (hESC) to cardiomyocyte differentiation system and generating QKI-deficient hESCs (hESCs-QKIdel) using CRISPR/Cas9 gene editing technology, we analyze the physiological role of QKI in cardiomyocyte differentiation, maturation, and contractile function. hESCs-QKIdel largely maintain normal pluripotency and normal differentiation potential for the generation of early cardiogenic progenitors, but they fail to transition into functional cardiomyocytes. In this work, by using a series of transcriptomic, cell and biochemical analyses, and the Qki-deficient mouse model, we demonstrate that QKI is indispensable to cardiac sarcomerogenesis and cardiac function through its regulation of alternative splicing in genes involved in Z-disc formation and contractile physiology, suggesting that QKI is associated with the pathogenesis of certain forms of cardiomyopathies
Cost-effectiveness of a national population-based screening program for type 2 diabetes: the Brazil experience
Background: The cost-effectiveness of screening for type 2 diabetes mellitus (DM2) in developing countries remains unknown. The Brazilian government conducted a nationwide population screening program for type 2 diabetes mellitus (BNDSP) in which 22 million capillary glucose tests were performed in individuals aged 40 years and older. The objective of this study was to evaluate the life-time cost-effectiveness of a national population-based screening program for DM2 conducted in Brazil. Methods: We used a Markov-based cost-effectiveness model to simulate the long-term costs and benefits of screening for DM2, compared to no screening program. The analysis was conducted from a public health care system perspective. Sensitivity analyses were conducted to examine the robustness of results to key model parameters. Results: Brazilian National diabetes screening program will yield a large health benefit and higher costs. Compared with no screening, screen detection of undiagnosed diabetes resulted in US 22,695/QALY. When benefits from early glycemic control on cardiovascular outcomes were considered, the cost per QALY gained would reduce significantly. Conclusions: In the base case analysis, not considering the intangible benefit of transferring diabetes management to primary care nor the benefit of using statin to treat eligible diabetic patients, CE ratios were not cost-effective considering thresholds proposed by the World Health Organization. However, significant uncertainty was demonstrated in sensitivity analysis. Our results indicate that policy-makers should carefully balance the benefit and cost of the program while considering using a population-based approach to screen for diabetes
- …