46 research outputs found
Enhancing protein interaction prediction using deep learning and protein language models
Proteins are large macromolecules that play critical roles in many cellular activities in living organisms. These include catalyzing metabolic reactions, mediating signal transduction, DNA replication, responding to stimuli, and transporting molecules, to name a few. Proteins perform their functions by interacting with other proteins and molecules. As a result, determining the nature of such interactions is critically important in many areas of biology and medicine. The primary structure of a protein refers to its specific sequence of amino acids, while the tertiary structure refers to its unique 3D shape, and the quaternary structure refers to the interaction of multiple protein subunits to form a larger, more complex structure. While the number of experimentally determined tertiary and quaternary structures are limited, databases of protein sequences continue to grow at an unprecedented rate, providing a wealth of information for training and improving sequence-based models.
Recent developments in the sequence-based model using machine learning and deep learning has shown significant progress toward solving protein-related problems. Specifically, attention-based transformer models, a recent breakthrough in Natural Language Processing (NLP), has shown that large models trained on unlabeled data are able to learn powerful representations of protein sequences and can lead to significant improvements in understanding protein folding, function, and interactions, as well as in drug discovery and protein engineering.
The research in this thesis has pursued two objectives using sequence-based modeling. The first is to use deep learning techniques based on NLP to address an important problem in cellular immune system studies, namely, predicting Major Histocompatibility Complex (MHC)-Peptide binding. The second is to improve the performance of the Cluspro docking server, a well-known protein-protein docking tool, in three ways: (i) integrating Cluspro with AlphaFold2, a well-known accurate protein structure predictor, for enhanced protein model docking, (ii) predicting distance maps to improve docking accuracy, and (iii) using regression techniques to rank protein clusters for better results
Methods of Liver Stem Cell Therapy in Rodents as Models of Human Liver Regeneration in Hepatic Failure
Cell
therapy is a promising intervention for treating liver diseases and liver
failure. Different animal models of human liver cell therapy have been
developed in recent years. Rats and mice are the most commonly used liver
failure models. In fact, rodent models of hepatic failure have shown
significant improvement in liver function after cell infusion. With the advent
of stem-cell technologies, it is now possible to re-programme adult somatic
cells such as skin or hair-follicle cells from individual patients to stem-like
cells and differentiate them into liver cells. Such regenerative stem cells are
highly promising in the personalization of cell therapy. The present review
article will summarize current approaches to liver stem cell therapy with
rodent models. In addition, we discuss common cell tracking techniques and how
tracking data help to direct liver cell therapy research in animal models of
hepatic failure
Impact of Initial Population Density of the Dubas Bug, Ommatissus lybicus (Hemiptera: Tropiduchidae), on Oviposition Behaviour, Chlorophyll, Biomass and Nutritional Response of Date Palm (Phoenix dactylifera)
The Dubas bug (Ommatissus lybicus) is an economically significant pest of date palms. In this study, the effect of the population density of O. lybicus on chlorophyll, measured by the soil plant analysis development (SPAD) chlorophyll meter, palm biomass, and the nutritional composition of date palms, were investigated. A further objective was to determine significant relationships between the population density of O. lybicus, the number of honeydew droplets, and oviposited eggs. Reductions of up to 8–11% and 29–34% in chlorophyll content and plant biomass, respectively, were caused by infestations exceeding 300 nymphs per palm seedling. Increasing the population density of O. lybicus to 600 insects per palm decreased oviposition by females, suggesting intraspecific competition for resources. There was a significant relationship between honeydew droplets produced by the pest population and chlorophyll content in the rachis, suggesting that treatment can be triggered at 3–6 nymphs/leaflet. Egg oviposition was preferentially on the rachis. Ca, Mg, K, and P were the main nutrients affected by the activity of the pest. Mg content was associated with reduced chlorophyll content under increasing pest density, suggesting that supplemental nutrition can be potentially utilized to sustain chlorophyll and increase palm tolerance to pest infestation
Tris(4,4′-bi-1,3-thiazole-κ2 N,N′)iron(II) tetrabromidoferrate(III) bromide
In the [Fe(4,4′-bit)3]2+ (4,4′-bit is 4,4′-bi-1,3-thiazole) cation of the title compound, [Fe(C6H4N2S2)3][FeBr4]Br, the FeII atom (3 symmetry) is six-coordinated in a distorted octahedral geometry by six N atoms from three 4,4′-bit ligands. In the [FeBr4]− anion, the FeIII atom (3 symmetry) is four-coordinated in a distorted tetrahedral geometry. In the crystal, intermolecular C—H⋯Br hydrogen bonds and Br⋯π interactions [Br⋯centroid distances = 3.562 (3) and 3.765 (2) Å] link the cations and anions, stabilizing the structure
Association between heavy metals and colon cancer: an ecological study based on geographical information systems in North-Eastern Iran
Background: Colorectal cancer has increased in Middle Eastern countries and exposure to environmental
pollutants such as heavy metals has been implicated. However, data linking them to this disease are generally
lacking. This study aimed to explore the spatial pattern of age-standardized incidence rate (ASR) of colon cancer
and its potential association with the exposure level of the amount of heavy metals existing in rice produced in
north-eastern Iran.
Methods: Cancer data were drawn from the Iranian population-based cancer registry of Golestan Province, northeastern Iran. Samples of 69 rice milling factories were analysed for the concentration levels of cadmium, nickel,
cobalt, copper, selenium, lead and zinc. The inverse distance weighting (IDW) algorithm was used to interpolate the
concentration of this kind of heavy metals on the surface of the study area. Exploratory regression analysis was
conducted to build ordinary least squares (OLS) models including every possible combination of the candidate
explanatory variables and chose the most useful ones to show the association between heavy metals and the ASR
of colon cancer. Results: The highest concentrations of heavy metals were found in the central part of the province and particularly
counties with higher amount of cobalt were shown to be associated with higher ASR of men with colon cancer. In
contrast, selenium concentrations were higher in areas with lower ASR of colon cancer in men. A significant
regression equation for men with colon cancer was found (F(4,137) = 38.304, P < .000) with an adjusted R2 of 0.77.
The predicted ASR of men colon cancer was − 58.36 with the coefficients for cobalt = 120.33; cadmium = 80.60;
selenium = − 6.07; nickel = − 3.09; and zinc = − 0.41. The association of copper and lead with colon cancer in men
was not significant. We did not find a significant outcome for colon cancer in women.
Conclusion: Increased amounts of heavy metals in consumed rice may impact colon cancer incidence, both
positively and negatively. While there were indications of an association between high cobalt concentrations and
an increased risk for colon cancer, we found that high selenium concentrations might instead decrease the risk.
Further investigations are needed to clarify if there are ecological or other reasons for these discrepancies. Regular
monitoring of the amount of heavy metals in consumed rice is recommended.This study was supported by Golestan University of Medical Sciences (grant
number of 90–10–1-30209)
Spatial-time analysis of cardiovascular emergency medical requests: enlightening policy and practice
Background: Response time to cardiovascular emergency medical requests is an important indicator in reducing cardiovascular disease (CVD) -related mortality. This study aimed to visualize the spatial-time distribution of response time, scene time, and call-to-hospital time of these emergency requests. We also identified patterns of clusters of CVD-related calls. Methods: This cross-sectional study was conducted in Mashhad, north-eastern Iran, between August 2017 and December 2019. The response time to every CVD-related emergency medical request call was computed using spatial and classical statistical analyses. The Anselin Local Moran's I was performed to identify potential clusters in the patterns of CVD-related calls, response time, call-to-hospital arrival time, and scene-to-hospital arrival time at small area level (neighborhood level) in Mashhad, Iran. Results: There were 84,239 CVD-related emergency request calls, 61.64% of which resulted in the transport of patients to clinical centers by EMS, while 2.62% of callers (a total of 2218 persons) died before EMS arrival. The number of CVD-related emergency calls increased by almost 7% between 2017 and 2018, and by 19% between 2017 and 2019. The peak time for calls was between 9 p.m. and 1 a.m., and the lowest number of calls were recorded between 3 a.m. and 9 a.m. Saturday was the busiest day of the week in terms of call volume. There were statistically significant clusters in the pattern of CVD-related calls in the south-eastern region of Mashhad. Further, we found a large spatial variation in scene-to-hospital arrival time and call-to-hospital arrival time in the area under study. Conclusion: The use of geographical information systems and spatial analyses in modelling and quantifying EMS response time provides a new vein of knowledge for decision makers in emergency services management. Spatial as well as temporal clustering of EMS calls were present in the study area. The reasons for clustering of unfavorable time indices for EMS response requires further exploration. This approach enables policymakers to design tailored interventions to improve response time and reduce CVD-related mortality.This study was financially sponsored by Mashhad University of Medical
Sciences (Project grant: 980861)
Improved prediction of MHC-peptide binding using protein language models
Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art
Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment
We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70–75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70–80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.Cancer Research UK, Grant/Award Number: FC001003; Changzhou Science and Technology Bureau, Grant/Award Number: CE20200503; Department of Energy and Climate Change, Grant/Award Numbers: DE-AR001213, DE-SC0020400, DE-SC0021303; H2020 European Institute of Innovation and Technology, Grant/Award Numbers: 675728, 777536, 823830; Institut national de recherche en informatique et en automatique (INRIA), Grant/Award Number: Cordi-S; Lietuvos Mokslo Taryba, Grant/Award Numbers: S-MIP-17-60, S-MIP-21-35; Medical Research Council, Grant/Award Number: FC001003; Japan Society for the Promotion of Science KAKENHI, Grant/Award Number: JP19J00950; Ministerio de Ciencia e Innovación, Grant/Award Number: PID2019-110167RB-I00; Narodowe Centrum Nauki, Grant/Award Numbers: UMO-2017/25/B/ST4/01026, UMO-2017/26/M/ST4/00044, UMO-2017/27/B/ST4/00926; National Institute of General Medical Sciences, Grant/Award Numbers: R21GM127952, R35GM118078, RM1135136, T32GM132024; National Institutes of Health, Grant/Award Numbers: R01GM074255, R01GM078221, R01GM093123, R01GM109980, R01GM133840, R01GN123055, R01HL142301, R35GM124952, R35GM136409; National Natural Science Foundation of China, Grant/Award Number: 81603152; National Science Foundation, Grant/Award Numbers: AF1645512, CCF1943008, CMMI1825941, DBI1759277, DBI1759934, DBI1917263, DBI20036350, IIS1763246, MCB1925643; NWO, Grant/Award Number: TOP-PUNT 718.015.001; Wellcome Trust, Grant/Award Number: FC00100
Recommended from our members
Global burden of 288 causes of death and life expectancy decomposition in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021
BACKGROUND Regular, detailed reporting on population health by underlying cause of death is fundamental for public health decision making. Cause-specific estimates of mortality and the subsequent effects on life expectancy worldwide are valuable metrics to gauge progress in reducing mortality rates. These estimates are particularly important following large-scale mortality spikes, such as the COVID-19 pandemic. When systematically analysed, mortality rates and life expectancy allow comparisons of the consequences of causes of death globally and over time, providing a nuanced understanding of the effect of these causes on global populations. METHODS The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 cause-of-death analysis estimated mortality and years of life lost (YLLs) from 288 causes of death by age-sex-location-year in 204 countries and territories and 811 subnational locations for each year from 1990 until 2021. The analysis used 56 604 data sources, including data from vital registration and verbal autopsy as well as surveys, censuses, surveillance systems, and cancer registries, among others. As with previous GBD rounds, cause-specific death rates for most causes were estimated using the Cause of Death Ensemble model-a modelling tool developed for GBD to assess the out-of-sample predictive validity of different statistical models and covariate permutations and combine those results to produce cause-specific mortality estimates-with alternative strategies adapted to model causes with insufficient data, substantial changes in reporting over the study period, or unusual epidemiology. YLLs were computed as the product of the number of deaths for each cause-age-sex-location-year and the standard life expectancy at each age. As part of the modelling process, uncertainty intervals (UIs) were generated using the 2·5th and 97·5th percentiles from a 1000-draw distribution for each metric. We decomposed life expectancy by cause of death, location, and year to show cause-specific effects on life expectancy from 1990 to 2021. We also used the coefficient of variation and the fraction of population affected by 90% of deaths to highlight concentrations of mortality. Findings are reported in counts and age-standardised rates. Methodological improvements for cause-of-death estimates in GBD 2021 include the expansion of under-5-years age group to include four new age groups, enhanced methods to account for stochastic variation of sparse data, and the inclusion of COVID-19 and other pandemic-related mortality-which includes excess mortality associated with the pandemic, excluding COVID-19, lower respiratory infections, measles, malaria, and pertussis. For this analysis, 199 new country-years of vital registration cause-of-death data, 5 country-years of surveillance data, 21 country-years of verbal autopsy data, and 94 country-years of other data types were added to those used in previous GBD rounds. FINDINGS The leading causes of age-standardised deaths globally were the same in 2019 as they were in 1990; in descending order, these were, ischaemic heart disease, stroke, chronic obstructive pulmonary disease, and lower respiratory infections. In 2021, however, COVID-19 replaced stroke as the second-leading age-standardised cause of death, with 94·0 deaths (95% UI 89·2-100·0) per 100 000 population. The COVID-19 pandemic shifted the rankings of the leading five causes, lowering stroke to the third-leading and chronic obstructive pulmonary disease to the fourth-leading position. In 2021, the highest age-standardised death rates from COVID-19 occurred in sub-Saharan Africa (271·0 deaths [250·1-290·7] per 100 000 population) and Latin America and the Caribbean (195·4 deaths [182·1-211·4] per 100 000 population). The lowest age-standardised death rates from COVID-19 were in the high-income super-region (48·1 deaths [47·4-48·8] per 100 000 population) and southeast Asia, east Asia, and Oceania (23·2 deaths [16·3-37·2] per 100 000 population). Globally, life expectancy steadily improved between 1990 and 2019 for 18 of the 22 investigated causes. Decomposition of global and regional life expectancy showed the positive effect that reductions in deaths from enteric infections, lower respiratory infections, stroke, and neonatal deaths, among others have contributed to improved survival over the study period. However, a net reduction of 1·6 years occurred in global life expectancy between 2019 and 2021, primarily due to increased death rates from COVID-19 and other pandemic-related mortality. Life expectancy was highly variable between super-regions over the study period, with southeast Asia, east Asia, and Oceania gaining 8·3 years (6·7-9·9) overall, while having the smallest reduction in life expectancy due to COVID-19 (0·4 years). The largest reduction in life expectancy due to COVID-19 occurred in Latin America and the Caribbean (3·6 years). Additionally, 53 of the 288 causes of death were highly concentrated in locations with less than 50% of the global population as of 2021, and these causes of death became progressively more concentrated since 1990, when only 44 causes showed this pattern. The concentration phenomenon is discussed heuristically with respect to enteric and lower respiratory infections, malaria, HIV/AIDS, neonatal disorders, tuberculosis, and measles. INTERPRETATION Long-standing gains in life expectancy and reductions in many of the leading causes of death have been disrupted by the COVID-19 pandemic, the adverse effects of which were spread unevenly among populations. Despite the pandemic, there has been continued progress in combatting several notable causes of death, leading to improved global life expectancy over the study period. Each of the seven GBD super-regions showed an overall improvement from 1990 and 2021, obscuring the negative effect in the years of the pandemic. Additionally, our findings regarding regional variation in causes of death driving increases in life expectancy hold clear policy utility. Analyses of shifting mortality trends reveal that several causes, once widespread globally, are now increasingly concentrated geographically. These changes in mortality concentration, alongside further investigation of changing risks, interventions, and relevant policy, present an important opportunity to deepen our understanding of mortality-reduction strategies. Examining patterns in mortality concentration might reveal areas where successful public health interventions have been implemented. Translating these successes to locations where certain causes of death remain entrenched can inform policies that work to improve life expectancy for people everywhere. FUNDING Bill & Melinda Gates Foundation
An investigation of the use of the sustainable drainage for groundwater recharge: a case study
There is a considerable amount of evidence to suggest that the use of groundwater for water supplies in urban areas and developing cities is increasing as surface water supplies become polluted or fully exploited. Other factors include the increase in water demand occurring in response to population growth, with increasing use per capita. However, groundwater resources are only sustainable if the aquifers that provide them can be recharged, and in many localities, natural recharge is thought to be potentially, adversely affected by urbanisation, as this covers large areas with impermeable surfaces such as roads and buildings, which divert much needed water into surface water courses or artificial drainage.
The aim of this research is therefore to investigate the use of sustainable drainage for groundwater recharge, using a case study of the area around Leighton Buzzard, in Bedfordshire, England. The detailed objectives of the research on which this thesis is based on includes: conducting comprehensive reviews of the geology, aquifers, groundwater pollution and statutory policies that relate to the study of the area. Within these generic studies, particular emphasis has been given to soil properties, infiltration design structure and the impact of urbanisation on groundwater recharge. Sustainability of groundwater resources have been considered as a primary
objective for the authorities, groundwater sustainability and protection goes in parallel with conservation; therefore recharge of groundwater resources through the Sustainable urban drainage system (SuDS) achieves the sustainability objective of the environment, therefore SuDS is the most suitable and effective approach to recharge groundwater resources, to minimise environmental risk and to deliver future environmental benefits