12 research outputs found

    Leveraging Language Representation for Material Recommendation, Ranking, and Exploration

    Full text link
    Data-driven approaches for material discovery and design have been accelerated by emerging efforts in machine learning. While there is enormous progress towards learning the structure to property relationship of materials, methods that allow for general representations of crystals to effectively explore the vast material search space and identify high-performance candidates remain limited. In this work, we introduce a material discovery framework that uses natural language embeddings derived from material science-specific language models as representations of compositional and structural features. The discovery framework consists of a joint scheme that, given a query material, first recalls candidates based on representational similarity, and ranks the candidates based on target properties through multi-task learning. The contextual knowledge encoded in language representations is found to convey information about material properties and structures, enabling both similarity analysis for recall, and multi-task learning to share information for related properties. By applying the discovery framework to thermoelectric materials, we demonstrate diversified recommendations of prototype structures and identify under-studied high-performance material spaces, including halide perovskite, delafossite-like, and spinel-like structures. By leveraging material language representations, our framework provides a generalized means for effective material recommendation, which is task-agnostic and can be applied to various material systems

    Occupational exposure to formaldehyde, hematotoxicity and leukemia-specific chromosome changes in cultured myeloid progenitor cells - Response

    Get PDF
    There are concerns about the health effects of formaldehyde exposure, including carcinogenicity, in light of elevated indoor air levels in new homes and occupational exposures experienced by workers in health care, embalming, manufacturing and other industries. Epidemiological studies suggest that formaldehyde exposure is associated with an increased risk of leukemia. However, the biological plausibility of these findings has been questioned because limited information is available on formaldehyde’s ability to disrupt hematopoietic function. Our objective was to determine if formaldehyde exposure disrupts hematopoietic function and produces leukemia-related chromosome changes in exposed humans. We examined the ability of formaldehyde to disrupt hematopoiesis in a study of 94 workers in China (43 exposed to formaldehyde and 51 frequency-matched controls) by measuring complete blood counts and peripheral stem/progenitor cell colony formation. Further, myeloid progenitor cells, the target for leukemogenesis, were cultured from the workers to quantify the level of leukemia-specific chromosome changes, including monosomy 7 and trisomy 8, in metaphase spreads of these cells. Among exposed workers, peripheral blood cell counts were significantly lowered in a manner consistent with toxic effects on the bone marrow and leukemia-specific chromosome changes were significantly elevated in myeloid blood progenitor cells. These findings suggest that formaldehyde exposure can have an adverse impact on the hematopoietic system and that leukemia induction by formaldehyde is biologically plausible, which heightens concerns about its leukemogenic potential from occupational and environmental exposures

    Enhancing the Throughput of FT Mass Spectrometry Imaging Using Compressed Sensing and Subspace Modeling

    No full text
    Mass spectrometry imaging (MSI) allows for untargeted mapping of the chemical compositions of tissues with attomole detection limits. MSI using Fourier transform-based mass spectrometers, such as FT-ion cyclotron resonance (FT-ICR), grants the ability to examine the chemical space with unmatched mass resolution and mass accuracy. However, direct imaging of large tissue samples on FT-ICR is restrictively slow. In this work, we present an approach that combines the subspace modeling of ICR temporal signals with compressed sensing to accelerate high-resolution FT-ICR MSI. A joint subspace and sparsity constrained reconstruction enables the creation of high-resolution imaging data from the sparsely sampled and short-time acquired transients. Simulation studies and experimental implementation of the proposed acquisition in investigation of brain tissues demonstrate a factor of 10 enhancement in throughput of FT-ICR MSI, without the need for instrumental or hardware modifications

    Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

    No full text
    ABSTRACTOver the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, [Formula: see text], of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ([Formula: see text] 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs

    Comparison of hematological alterations and markers of B-cell activation in workers exposed to benzene, formaldehyde and trichloroethylene

    No full text
    Benzene, formaldehyde (FA) and trichloroethylene (TCE) are ubiquitous chemicals in workplaces and the general environment. Benzene is an established myeloid leukemogen and probable lymphomagen. FA is classified as a myeloid leukemogen but has not been associated with non-Hodgkin lymphoma (NHL), whereas TCE has been associated with NHL but not myeloid leukemia. Epidemiologic associations between FA and myeloid leukemia, and between benzene, TCE and NHL are, however, still debated. Previously, we showed that these chemicals are associated with hematotoxicity in cross-sectional studies of factory workers in China, which included extensive personal monitoring and biological sample collection. Here, we compare and contrast patterns of hematotoxicity, monosomy 7 in myeloid progenitor cells (MPCs), and B-cell activation biomarkers across these studies to further evaluate possible mechanisms of action and consistency of effects with observed hematologic cancer risks. Workers exposed to benzene or FA, but not TCE, showed declines in cell types derived from MPCs, including granulocytes and platelets. Alterations in lymphoid cell types, including B cells and CD4+ T cells, and B-cell activation markers were apparent in workers exposed to benzene or TCE. Given that alterations in myeloid and lymphoid cell types are associated with hematological malignancies, our data provide biologic insight into the epidemiological evidence linking benzene and FA exposure with myeloid leukemia risk, and TCE and benzene exposure with NHL risk

    Comparison of hematological alterations and markers of B-cell activation in workers exposed to benzene, formaldehyde and trichloroethylene

    No full text
    Benzene, formaldehyde (FA) and trichloroethylene (TCE) are ubiquitous chemicals in workplaces and the general environment. Benzene is an established myeloid leukemogen and probable lymphomagen. FA is classified as a myeloid leukemogen but has not been associated with non-Hodgkin lymphoma (NHL), whereas TCE has been associated with NHL but not myeloid leukemia. Epidemiologic associations between FA and myeloid leukemia, and between benzene, TCE and NHL are, however, still debated. Previously, we showed that these chemicals are associated with hematotoxicity in cross-sectional studies of factory workers in China, which included extensive personal monitoring and biological sample collection. Here, we compare and contrast patterns of hematotoxicity, monosomy 7 in myeloid progenitor cells (MPCs), and B-cell activation biomarkers across these studies to further evaluate possible mechanisms of action and consistency of effects with observed hematologic cancer risks. Workers exposed to benzene or FA, but not TCE, showed declines in cell types derived from MPCs, including granulocytes and platelets. Alterations in lymphoid cell types, including B cells and CD4+ T cells, and B-cell activation markers were apparent in workers exposed to benzene or TCE. Given that alterations in myeloid and lymphoid cell types are associated with hematological malignancies, our data provide biologic insight into the epidemiological evidence linking benzene and FA exposure with myeloid leukemia risk, and TCE and benzene exposure with NHL risk

    CEPC Conceptual Design Report: Volume 2 - Physics & Detector

    No full text
    The Circular Electron Positron Collider (CEPC) is a large international scientific facility proposed by the Chinese particle physics community to explore the Higgs boson and provide critical tests of the underlying fundamental physics principles of the Standard Model that might reveal new physics. The CEPC, to be hosted in China in a circular underground tunnel of approximately 100 km in circumference, is designed to operate as a Higgs factory producing electron-positron collisions with a center-of-mass energy of 240 GeV. The collider will also operate at around 91.2 GeV, as a Z factory, and at the WW production threshold (around 160 GeV). The CEPC will produce close to one trillion Z bosons, 100 million W bosons and over one million Higgs bosons. The vast amount of bottom quarks, charm quarks and tau-leptons produced in the decays of the Z bosons also makes the CEPC an effective B-factory and tau-charm factory. The CEPC will have two interaction points where two large detectors will be located. This document is the second volume of the CEPC Conceptual Design Report (CDR). It presents the physics case for the CEPC, describes conceptual designs of possible detectors and their technological options, highlights the expected detector and physics performance, and discusses future plans for detector R&D and physics investigations. The final CEPC detectors will be proposed and built by international collaborations but they are likely to be composed of the detector technologies included in the conceptual designs described in this document. A separate volume, Volume I, recently released, describes the design of the CEPC accelerator complex, its associated civil engineering, and strategic alternative scenarios
    corecore