12 research outputs found

    Characterisation of the roles of Poz1 and Stn1 at Schizosaccharomyces pombe telomeres

    Get PDF
    Telomeres protect the ends of chromosomes from the activity of DNA repair machinery and provide a solution to the end-replication problem. In humans, the core protein complex located at telomeres is known as shelterin and consists of six protein subunits. Although variation is seen in the telomeric complex between species, in fission yeast the complex has notable similarities to that of humans. Separately to shelterin, the CST complex (Cdc13/Stn1/Ten1) is conserved in budding yeast, plants and mammals and is thought to negatively regulate telomerase, in addition to being required for telomere protection. However, unlike Stn1 and Ten1, Cdc13 has not yet been identified in fission yeast. Poz1 is a bridging molecule equivalent to TIN2 in human shelterin, which links the Taz1-Rap1 and the Pot1-Tpz1-Ccq1 sub-complexes, respectively bound to double- and single-stranded DNA at telomeres. Poz1 is required for the regulation of telomerase activity, and it has been hypothesised that it might do so by playing a structural role in the switching of telomeres from an open to a closed state. In this study, a reverse-2-hybrid approach was used to generate Poz1 alleles unable to interact with Rap1 or Tpz1 specifically. These alleles were subjected to phenotypic and biochemical analysis which indicated that neither individual interaction is sufficient to maintain telomere homeostasis. With telomere lengths similar to a Poz1 deletion, it is proposed that negative regulation cannot occur without the ability to form a closed complex. Given that Cdc13 is currently the only missing component in fission yeast, a second study was initiated aiming to identify a homologue by yeast-2-hybrid screening of a cDNA library, using Stn1 and Ten1 as baits. However, this approach did not yield any positive candidates. In an alternative approach, Stn1 temperature-sensitive (ts) alleles were generated and characterised. These were used to screen a genomic library for suppressors of the Stn1 ts phenotype. Several candidates were identified that require further examination while the ts allele analysis indicated that telomeres are lost in their entirety at non-permissive temperatures and that survivors of this process did so by chromosome circularisation, similar to Pot1 mutants

    Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm

    Full text link
    While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (MOGA) with a neural network inter-atomic potential (IAP) model to find energetically optimal crystal structures given chemical compositions. We enhance the NSGA-III algorithm by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of 2.5622.562 across 5555 diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structure

    MD-HIT: Machine learning for materials property prediction with dataset redundancy control

    Full text link
    Materials datasets are usually featured by the existence of many redundant (highly similar) materials due to the tinkering material design practice over the history of materials research. For example, the materials project database has many perovskite cubic structure materials similar to SrTiO3_3. This sample redundancy within the dataset makes the random splitting of machine learning model evaluation to fail so that the ML models tend to achieve over-estimated predictive performance which is misleading for the materials science community. This issue is well known in the field of bioinformatics for protein function prediction, in which a redundancy reduction procedure (CD-Hit) is always applied to reduce the sample redundancy by ensuring no pair of samples has a sequence similarity greater than a given threshold. This paper surveys the overestimated ML performance in the literature for both composition based and structure based material property prediction. We then propose a material dataset redundancy reduction algorithm called MD-HIT and evaluate it with several composition and structure based distance threshold sfor reducing data set sample redundancy. We show that with this control, the predicted performance tends to better reflect their true prediction capability. Our MD-hit code can be freely accessed at https://github.com/usccolumbia/MD-HITComment: 12page

    Investigating model explanation of bug report assignment recommenders

    Get PDF
    Software projects receive a lot of bug reports, and each bug report needs to be triaged. An objective of the bug report triaging process is to find an appropriate developer who can fix the reported bug. As this process can be time-consuming and requires a lot of effort, researchers have implemented recommender systems using a variety of algorithms to automate this process. Although using these recommender systems has a number of benefits, there are still many obstacles to overcome. A key obstacle is that commonly used algorithms are black-box, making it difficult for practitioners to comprehend how the models make decisions. Lack of explainability results in a lack of trust and transparency in the recommendations. This work investigates approaches that lead to visually explainable bug report assignment recommender systems. First, we developed and compared six different recommender systems using three distinct machine learning algorithms: Random Forest (RF), MLP Classifier and Bidirectional Neural Networks (BNN) and two different feature extraction techniques: TF-IDF and Word2Vec. Second, we examine the use of WordNet to improve recommender accuracy. Third, we explore the explanation of a bug report assignment recommender using the feature-based local model LIME. Finally, we assess the use of a positivenegative horizontal bar chart, feature table, and word cloud to explain the recommender systems visually. Our analytical analysis indicates that the optimum approach for developing a bug report assignment recommender system uses TF-IDF with RF and visually explains the recommendation with a word cloud and LIME as a local model

    Accurate Prediction of Voltage of Battery Electrode Materials Using Attention Based Graph Neural Networks

    No full text
    Performing first principle calculations to discover electrodes’ properties in the large chemical space is a challenging task. While machine learning (ML) has been applied to effectively accelerate those discoveries, most of the applied methods ignore the materials’ spatial information and only use pre-defined features: based only on chemical compositions. We propose two attention-based graph convolutional neural network techniques to learn the average voltage of electrodes. Our proposed method, which combines both atomic composition and atomic coordinates in 3D-space, improves the accuracy in voltage prediction by 17% when compared to composition based ML models. The first model directly learns the chemical reaction of electrodes and metal-ions to predict their average voltage, whereas the second model combines electrodes’ ML predicted formation energy (Eform) to compute their average voltage. Our models demonstrates improved accuracy in transferability from our subset of learned metal-ions to other metal-ions

    Material transformers: deep learning language models for generative materials design

    No full text
    Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer
    corecore