Search CORE

94 research outputs found

Insights into the role of Val45 and Gln182 of Escherichia coli MutY in DNA substrate binding and specificity

Author: Chang Po-Wen
Lu A-Lien
Madabushi Amrita
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

Author: Lee Mark
Leon Frances A. Laureano De
Madabushi Harish Tayyar
Publication venue: arXiv
Publication date: 07/03/2024
Field of study

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages. Despite its widespread use online and recent research trends in this area, research in code-switching presents unique challenges, primarily stemming from the scarcity of labelled data and available resources. In this study we investigate how pre-trained Language Models handle code-switched text in three dimensions: a) the ability of PLMs to detect code-switched text, b) variations in the structural information that PLMs utilise to capture code-switched text, and c) the consistency of semantic information representation in code-switched text. To conduct a systematic and controlled evaluation of the language models in question, we create a novel dataset of well-formed naturalistic code-switched text along with parallel translations into the source languages. Our findings reveal that pre-trained language models are effective in generalising to code-switched text, shedding light on the abilities of these models to generalise representations to CS corpora. We release all our code and data including the novel corpus at https://github.com/francesita/code-mixed-probes

University of Birmingham Research Portal

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

Author: De Leon Frances A. Laureano
Lee Mark
Madabushi Harish Tayyar
Publication venue
Publication date: 07/05/2024
Field of study

arXiv.org e-Print Archive

SemEval-2022 Task 2 : multilingual idiomaticity detection and sentence embedding

Author: Garcia M.
Gow-Smith E.
Idiart M.
Madabushi H.T.
Scarton C.
Villavicencio A.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/07/2022
Field of study

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context. Each subtask includes different settings regarding the amount of training data. Besides the task description, this paper introduces the datasets in English, Portuguese, and Galician and their annotation procedure, the evaluation metrics, and a summary of the participant systems and their results. The task had close to 100 registered participants organised into twenty five teams making over 650 and 150 submissions in the practice and evaluation phases respectively

White Rose Research Online

SIRT6 protein deacetylase interacts with MYH DNA glycosylase, APE1 endonuclease, and Rad9-Rad1-Hus1 checkpoint clamp

Author: Gao Y
Guan X
Hwang BJ
Jin J
Lan L
Lu AL
Madabushi A
Nakajima S
Shi G
Yan A
Zalzman M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2015
Field of study

Background: SIRT6, a member of the NAD+-dependent histone/protein deacetylase family, regulates genomic stability, metabolism, and lifespan. MYH glycosylase and APE1 are two base excision repair (BER) enzymes involved in mutation avoidance from oxidative DNA damage. Rad9-Rad1-Hus1 (9-1-1) checkpoint clamp promotes cell cycle checkpoint signaling and DNA repair. BER is coordinated with the checkpoint machinery and requires chromatin remodeling for efficient repair. SIRT6 is involved in DNA double-strand break repair and has been implicated in BER. Here we investigate the direct physical and functional interactions between SIRT6 and BER enzymes. Results: We show that SIRT6 interacts with and stimulates MYH glycosylase and APE1. In addition, SIRT6 interacts with the 9-1-1 checkpoint clamp. These interactions are enhanced following oxidative stress. The interdomain connector of MYH is important for interactions with SIRT6, APE1, and 9-1-1. Mutagenesis studies indicate that SIRT6, APE1, and Hus1 bind overlapping but different sequence motifs on MYH. However, there is no competition of APE1, Hus1, or SIRT6 binding to MYH. Rather, one MYH partner enhances the association of the other two partners to MYH. Moreover, APE1 and Hus1 act together to stabilize the MYH/SIRT6 complex. Within human cells, MYH and SIRT6 are efficiently recruited to confined oxidative DNA damage sites within transcriptionally active chromatin, but not within repressive chromatin. In addition, Myh foci induced by oxidative stress and Sirt6 depletion are frequently localized on mouse telomeres. Conclusions: Although SIRT6, APE1, and 9-1-1 bind to the interdomain connector of MYH, they do not compete for MYH association. Our findings indicate that SIRT6 forms a complex with MYH, APE1, and 9-1-1 to maintain genomic and telomeric integrity in mammalian cells

Crossref

Springer - Publisher Connector

PubMed Central

D-Scholarship@Pitt

PVS: a web server for protein sequence variability analysis tuned to facilitate conserved epitope discovery

Author: Altschul
Baczkowski
Bui
C. M. Diez-Rivero
Calhoun
Dahiyat
del Sol Mesa
E. L. Reinherz
Edgar
Hannenhalli
Kuhlman
Lichtarge
M. Garcia-Boronat
Madabushi
Mendis
Mihalek
P. A. Reche
Padlan
Phillips
Poirot
Pupko
Reche
Reche
Reche
Reche
Rose
Stanfield
Stern
Stewart
Thibert
Weber
Zolla-Pazner
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

We have developed PVS (Protein Variability Server), a web-based tool that uses several variability metrics to compute the absolute site variability in multiple protein-sequence alignments (MSAs). The variability is then assigned to a user-selected reference sequence consisting of either the first sequence in the alignment or a consensus sequence. Subsequently, PVS performs tasks that are relevant for structure-function studies, such as plotting and visualizing the variability in a relevant 3D-structure. Neatly, PVS also implements some other tasks that are thought to facilitate the design of epitope discovery-driven vaccines against pathogens where sequence variability largely contributes to immune evasion. Thus, PVS can return the conserved fragments in the MSA—as defined by a user-provided variability threshold—and locate them in a relevant 3D-structure. Furthermore, PVS can return a variability-masked sequence, which can be directly submitted to the RANKPEP server for the prediction of conserved T-cell epitopes. PVS is freely available at: http://imed.med.ucm.es/PVS/

Docta Complutense

Crossref

Harvard University - DASH

PubMed Central

The Türki̇ye earthquake sequence of February 2023: A longitudinal study report by EEFIT

Author: Acikgoz S
Adamidis O
Aktas Y
Andonov A
Asinari M
Bashein M
Bektas N
Boulton SJ
Byun J-E
Cabuk E
Dede S
Donmez K
Efeoglu T
Freddi F
Free M
Giardina G
Gokce T
Gonnuru P
Gozenoglu O
Gutierrez-Urzua F
Johnson C
Kalkan A
Macchiarulo V
Madabushi G
Malcioglu FS
Markov HP
Milillo P
Nathan J
Novelli V
O'Kane A
Opabola E
Ozden AT
Parammal Vatteri A
Rossetto T
So E
Tavakkoli A
Tetik T
Triantafyllou I
Verrucci E
Voelker B
Publication venue: EEFIT
Publication date: 06/02/2024
Field of study

On 6 February 2023 at 4:17 am local time, a large area in southeastern Türkiye and northern Syria was hit by an Mw 7.8 earthquake, which was followed by an Mw 7.5 earthquake at 1:24 pm local time, causing the loss of more than 50,000 lives, some 100,000 injuries and significant damage to buildings and infrastructure, estimated to be in the range of 84.1 billion USD for Türkiye alone. The largest earthquake in Türkiye since the deadly 1939 Erzincan earthquake with however much larger losses, the sequence immediately attracted the attention of the global post-disaster reconnaissance/engineering communities. This included the Earthquake Engineering Field Investigation Team (EEFIT), who, within one week of the event, gathered a team with 30 people from academia and industry in the UK (19), Türkiye (5), New Zealand (1), Hungary (1), Bulgaria (1), Greece (1) and USA (1) with two support members from the UK and the Netherlands, to study the events and their impacts, and also to develop suggestions to reduce the existing vulnerabilities in the future. The team was organised in the form of 6 working groups as shown below, which were (1) strong ground motions and seismotectonics, (2) geotechnics, (3) structures, (4) infrastructure, (5) remote sensing and (6) relief response and recovery

UCL Discovery

How accurate and statistically robust are catalytic site predictions based on closeness centrality?

Author: A Armon
A del Sol
A Gutteridge
AG Murzin
AH Elcock
AR Panchenko
B Thibert
CA Innis
D La
D La
Dennis R Livesay
DR Livesay
DR Livesay
DR Livesay
Eric Chea
F Pazos
F Pazos
G Cheng
GJ Bartlett
GM Alter
H Yao
JD Watson
KC Usher
KV Brinda
LC Kurz
LH Greene
M Vendruscolo
MA del Sol
MJ Ondrechen
MT Neves-Petersen
NV Dokholyan
O Lichtarge
OS Soyer
P Aloy
PJ Bickel
PP Wangikar
R Landgraf
RJ Russell
S Jones
S Madabushi
W Kabsch
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex <it>i </it>and all other vertices. Results We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined. Conclusion Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling

Author: A Armon
A Prlic
Alessandra Carbone
BW Matthews
CA Innis
CA Innis
CJ Tsai
CT Porter
DR Caffrey
E Kanamori
ELL Sonnhammer
G Cheng
GH Gonnet
H Chen
I Mihalek
JA Studier
JR Bradford
Ladislas A. Trojan
Michael Levitt
O Lichtarge
O Lichtarge
P Chakrabarti
Richard Lavery
RP Bahadur
S Henikoff
S Jones
S Madabushi
S Miller
SF Altschul
SJ Hubbard
Sophie Sacquin-Mora
SS Negi
Stefan Engelen
T Pupko
W Humphrey
WSJ Valdar
Y Ofran
Y Ofran
ZJ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

Abstract Background A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 203 different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content. Results Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results. Conclusion The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central