12 research outputs found
Characterisation of the roles of Poz1 and Stn1 at Schizosaccharomyces pombe telomeres
Telomeres protect the ends of chromosomes from the activity of DNA repair machinery and provide a solution to the end-replication problem. In humans, the core protein complex located at telomeres is known as shelterin and consists of six protein subunits. Although variation is seen in the telomeric complex between species, in fission yeast the complex has notable similarities to that of humans. Separately to shelterin, the CST complex (Cdc13/Stn1/Ten1) is conserved in budding yeast, plants and mammals and is thought to negatively regulate telomerase, in addition to being required for telomere protection. However, unlike Stn1 and Ten1, Cdc13 has not yet been identified in fission yeast.
Poz1 is a bridging molecule equivalent to TIN2 in human shelterin, which links the Taz1-Rap1 and the Pot1-Tpz1-Ccq1 sub-complexes, respectively bound to double- and single-stranded DNA at telomeres. Poz1 is required for the regulation of telomerase activity, and it has been hypothesised that it might do so by playing a structural role in the switching of telomeres from an open to a closed state. In this study, a reverse-2-hybrid approach was used to generate Poz1 alleles unable to interact with Rap1 or Tpz1 specifically. These alleles were subjected to phenotypic and biochemical analysis which indicated that neither individual interaction is sufficient to maintain telomere homeostasis. With telomere lengths similar to a Poz1 deletion, it is proposed that negative regulation cannot occur without the ability to form a closed complex.
Given that Cdc13 is currently the only missing component in fission yeast, a second study was initiated aiming to identify a homologue by yeast-2-hybrid screening of a cDNA library, using Stn1 and Ten1 as baits. However, this approach did not yield any positive candidates. In an alternative approach, Stn1 temperature-sensitive (ts) alleles were generated and characterised. These were used to screen a genomic library for suppressors of the Stn1 ts phenotype. Several candidates were identified that require further examination while the ts allele analysis indicated that telomeres are lost in their entirety at non-permissive temperatures and that survivors of this process did so by chromosome circularisation, similar to Pot1 mutants
Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm
While crystal structure prediction (CSP) remains a longstanding challenge, we
introduce ParetoCSP, a novel algorithm for CSP, which combines a
multi-objective genetic algorithm (MOGA) with a neural network inter-atomic
potential (IAP) model to find energetically optimal crystal structures given
chemical compositions. We enhance the NSGA-III algorithm by incorporating the
genotypic age as an independent optimization criterion and employ the M3GNet
universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art
neural potential based CSP algorithm, ParetoCSP demonstrated significantly
better predictive capabilities, outperforming by a factor of across
diverse benchmark structures, as evaluated by seven performance metrics.
Trajectory analysis of the traversed structures of all algorithms shows that
ParetoCSP generated more valid structures than other algorithms, which helped
guide the GA to search more effectively for the optimal structure
MD-HIT: Machine learning for materials property prediction with dataset redundancy control
Materials datasets are usually featured by the existence of many redundant
(highly similar) materials due to the tinkering material design practice over
the history of materials research. For example, the materials project database
has many perovskite cubic structure materials similar to SrTiO. This sample
redundancy within the dataset makes the random splitting of machine learning
model evaluation to fail so that the ML models tend to achieve over-estimated
predictive performance which is misleading for the materials science community.
This issue is well known in the field of bioinformatics for protein function
prediction, in which a redundancy reduction procedure (CD-Hit) is always
applied to reduce the sample redundancy by ensuring no pair of samples has a
sequence similarity greater than a given threshold. This paper surveys the
overestimated ML performance in the literature for both composition based and
structure based material property prediction. We then propose a material
dataset redundancy reduction algorithm called MD-HIT and evaluate it with
several composition and structure based distance threshold sfor reducing data
set sample redundancy. We show that with this control, the predicted
performance tends to better reflect their true prediction capability. Our
MD-hit code can be freely accessed at https://github.com/usccolumbia/MD-HITComment: 12page
Investigating model explanation of bug report assignment recommenders
Software projects receive a lot of bug reports, and each bug report needs to be triaged.
An objective of the bug report triaging process is to find an appropriate developer who
can fix the reported bug. As this process can be time-consuming and requires a lot of
effort, researchers have implemented recommender systems using a variety of algorithms
to automate this process. Although using these recommender systems has a number of
benefits, there are still many obstacles to overcome. A key obstacle is that commonly
used algorithms are black-box, making it difficult for practitioners to comprehend how the
models make decisions. Lack of explainability results in a lack of trust and transparency in
the recommendations.
This work investigates approaches that lead to visually explainable bug report assignment
recommender systems. First, we developed and compared six different recommender
systems using three distinct machine learning algorithms: Random Forest (RF), MLP Classifier
and Bidirectional Neural Networks (BNN) and two different feature extraction techniques:
TF-IDF and Word2Vec. Second, we examine the use of WordNet to improve recommender
accuracy. Third, we explore the explanation of a bug report assignment recommender
using the feature-based local model LIME. Finally, we assess the use of a positivenegative
horizontal bar chart, feature table, and word cloud to explain the recommender
systems visually.
Our analytical analysis indicates that the optimum approach for developing a bug report
assignment recommender system uses TF-IDF with RF and visually explains the recommendation
with a word cloud and LIME as a local model
Accurate Prediction of Voltage of Battery Electrode Materials Using Attention Based Graph Neural Networks
Performing first principle calculations to discover electrodesâ properties in the large chemical space is a challenging task. While machine learning (ML) has been applied to effectively accelerate those discoveries, most of the applied methods ignore the materialsâ spatial information and only use pre-defined features: based only on chemical compositions. We propose two attention-based graph convolutional neural network techniques to learn the average voltage of electrodes. Our proposed method, which combines both atomic composition and atomic coordinates in 3D-space, improves the accuracy in voltage prediction by 17% when compared to composition based ML models. The first model directly learns the chemical reaction of electrodes and metal-ions to predict their average voltage, whereas the second model combines electrodesâ ML predicted formation energy (Eform) to compute their average voltage. Our models demonstrates improved accuracy in transferability from our subset of learned metal-ions to other metal-ions
Material transformers: deep learning language models for generative materials design
Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer