Search CORE

26 research outputs found

MDAT- Aligning multiple domain arrangements

Author: Bitard-Feildel T. (Tristan)
Bornberg-Bauer E. (Erich)
Kemena C. (Carsten)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/01/2015
Field of study

Background: Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments. Results: We developed an alignment program, called MDAT, which aligns multiple domain arrangements. MDAT extends earlier programs which perform pairwise alignments of domain arrangements. MDAT uses a domain similarity matrix to score domain pairs and aligns the domain arrangements using a consistency supported progressive alignment method. Conclusion: MDAT will be useful for analysing changes in domain arrangements within and between protein families and will thus provide valuable insights into the evolution of proteins and their domains. MDAT is coded in C++, and the source code is freely available for download at http://www.bornberglab.org/pages/mda

Springer - Publisher Connector

PubMed Central

Münstersches Informations und Archivsystem für Multimediale Inhalte

Domain similarity based orthology detection

Author: Bitard-Feildel T. (Tristan)
Bornberg-Bauer E. (Erich)
Greenwood J.M. (Jenny)
Kemena C. (Carsten)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2015
Field of study

Background: Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. Results: We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. Conclusion: We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda.<br

Springer - Publisher Connector

PubMed Central

Münstersches Informations und Archivsystem für Multimediale Inhalte

Domain similarity based orthology detection

Author: AD Moore
AD Moore
AK Björklund
AR Kersting
AsK Björklund
Carsten Kemena
CD Bingle
CH Papadimitriou
E Bornberg-Bauer
Erich Bornberg-Bauer
F Jacob
F Pedregosa
J Huerta-Cepas
J Ruan
J Söding
J Weiner
Jenny M Greenwood
JH Fong
JM Joseph
K Lin
K Sjölander
K Trachana
L Li
LY Geer
M Lechner
M Levitt
M Punta
MA Messih
MJ de Hoon
N Song
N Song
N Terrapon
N Terrapon
P Rice
S Powell
SF Altschul
SK Kummerfeld
SR Eddy
Tristan Bitard-Feildel
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Hemimetabolous genomes reveal molecular basis of termite eusociality

Around 150 million years ago, eusocial termites evolved from within the cockroaches, 50 million years before eusocial Hymenoptera, such as bees and ants, appeared. Here, we report the 2-Gb genome of the German cockroach, Blattella germanica, and the 1.3-Gb genome of the drywood termite Cryptotermes secundus. We show evolutionary signatures of termite eusociality by comparing the genomes and transcriptomes of three termites and the cockroach against the background of 16 other eusocial and non-eusocial insects. Dramatic adaptive changes in genes underlying the production and perception of pheromones confirm the importance of chemical communication in the termites. These are accompanied by major changes in gene regulation and the molecular evolution of caste determination. Many of these results parallel molecular mechanisms of eusocial evolution in Hymenoptera. However, the specific solutions are remarkably different, thus revealing a striking case of convergence in one of the major evolutionary transitions in biological complexity

ZENODO

Copenhagen University Research Information System

IST Austria: PubRep (Institute of Science and Technology)

IST PubRep

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MDAT- Aligning multiple domain arrangements

Author: AD Moore
AR Kersting
B Paten
C Notredame
Carsten Kemena
CO Buckee
D Ekman
DG Higgins
E Bornberg-Bauer
Erich Bornberg-Bauer
F Sievers
H Fang
JA Marsh
JD Thompson
JS Papadopoulos
K Forslund
K Katoh
L Leclère
LA Ait
LY Geer
M Levitt
M Punta
MOSRM Dayhoff
N Terrapon
O Gotoh
RA de Maagd
RD Finn
RD Finn
S Henikoff
SR Eddy
Söding J
T Kawashima
Tristan Bitard-Feildel
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Critical assessment of protein intrinsic disorder prediction

Author: Aykac-Fas Burcu
Bassot Claudio
Benítez Guillermo Ignacio
Bevilacqua Martina
Bitard-Feildel Tristan
Caid Predictors
Callebaut Isabelle
Chasapi Anastasia
Chemes Lucia Beatriz
Cheng Jianlin
Cozzetto Domenico
Davey Norman
Davidović Radoslav
Disprot Curators
Dosztányi Zsuzsanna
Dunker A. Keith
Elofsson Arne
Erdős Gábor
Galzitskaya Oxana Valerianovna
Gao Jianzhao
González-Foutel Nicolás S.
Govindarajan Sudha
Gsponer Jörg
Guharoy Mainak
Hajdu-Soltész Borbála
Hanson Jack
Hatos András
Hoque Md Tamjidul
Horvath Tamas
Hu Gang
Iglesias Valentin
Iqbal Sumaiya
Jones David T.
Kajava Andrey V.
Kovacs Orsolya Panna
Kurgan Lukasz
Lamb John
Lambrughi Matteo
Lazar Tamas
Leclercq Jeremy Y.
Leonardi Emanuela
Litfin Thomas
Lobanov Michail Yu
Macedo-Ribeiro Sandra
Macossay-Castillo Mauricio
Maiani Emiliano
Malhis Nawar
Manso Jose Antonio
Marino-Buslje Cristina
Martínez-Pérez Elizabeth
Meng Fanchi
Minervini Giovanni
Mirabello Claudio
Mičetić Ivan
Monzon Alexander Miguel
Murvai Nikoletta
Mészáros Bálint
Necci Marco
Orlando Gabriele
Ouzounis Christos
Pajkos Mátyás
Paladin Lisanna
Paliwal Kuldip
Palopoli Nicolás
Pancsa Rita
Papaleo Elena
Parisi Gustavo
Peng Zhenling
Pereira Pedro José Barbosa
Piovesan Damiano
Promponas Vasilis J.
Pujols Jordi
Quaglia Federica
Raimondi Daniele
Salvatore Marco
Schad Eva
Sharma Alok
Sharma Ronesh
Sormanni Pietro
Szabo Beata
Szaniszló Tamás
Tamana Stella
Tantos Agnes
Tompa Peter
Tosatto Silvio C. E.
Veljkovic Nevena
Vendruscolo Michele
Ventura Salvador
Vranken Wim
Wallner Björn
Walsh Ian
Wang Chen
Wang Kui
Wang Sheng
Wu Tianqi
Wu Zhonghua
Xu Jinbo
Yan Jing
Zhou Yaoqi
Álvarez Lucía
Publication venue: Nature Methods
Publication date: 01/01/2021
Field of study

Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

HAL-IRD

Diposit Digital de Documents de la UAB

Apollo (Cambridge)

Navigating the amino acid sequence space between functional proteins using a deep learning framework

Author: Bitard-Feildel Tristan
Publication venue: 'PeerJ'
Publication date: 01/09/2021
Field of study

International audienceMotivation: Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution. Results: This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces

HAL-Inserm

Directory of Open Access Journals

Création d'une bibliothèque de coeurs structuraux pour le protein threading. Utilisation des familles structurales.

Author: Bitard Feildel Tristan
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Encadrement du stage par François CosteNous nous intéressons à la constitution automatique d'une bibliothèque de coeurs structuraux pour le logiciel de protein threading FROSTO. À partir de la représentation en familles structurales issue de la classification SCOP, nous avons réalisé des outils permettant l'identification des zones similaires dans un niveau hiérarchique de la classification. Ces zones de similarités éloignées sont identifiées à partir de logiciels utilisant l'information, soit de la séquence, soit de la structure, des membres des familles structurales. Une fois ces zones identifiées nous avons réalisé des méthodes de découpages en blocs afin de redéfinir les coeurs structuraux. Les méthodes utilisant l'identification de zones similaires à partir de la séquence permettent d'améliorer la reconnaissance des membres d'une même famille structurale ou d'une même super famille structurale

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Computational Identification of Novel Genes: Current and Future Perspectives

Author: Ludovic Mallet
Steffen Klasberg
Tristan Bitard-Feildel
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies

Directory of Open Access Journals

PubMed Central

Münstersches Informations und Archivsystem für Multimediale Inhalte

Apprentissage auto-supervisé appliqué à la détection d'erreurs de labélisation

Author: Bitard-Feildel Tristan
Chérubin Amélie
Lamboley Corentin
Publication venue: HAL CCSD
Publication date: 16/11/2022
Field of study

International audienceConfidence in data is critical to learn safe and reliable AI models. In this paper, we explore the ability of embedding space learned through contrastive training to capture outliers and labeling errors. Ideally, embedding space learned with a contrastive approach should enforce proximity of data points in embedding space. This property of the embedding should facilitate the application of anomaly detection methods to detect erroneous data points. The study focus on the CIFAR-10 dataset. Once the embedding learned, we evaluated the ability of the anomaly detection methods to identify correctly erroneous data points using controlled noise through label flipping. We tested several anomaly detection methods with different hyper parameters. The data separation per class are easily observed in the embedding space and class outliers can be identified, highlighting the presence of data point outside the domain distribution of each class. The embedding space is also resilient to noisy data, and the tested anomaly detection methods can capture data points not corresponding to the classes used to learn the embedding. This preliminary work shows the potential growth of embedding space learned with unsupervised method to capture outliers and preprocess data

Hal-Diderot