Search CORE

3,422 research outputs found

An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

Author: A Sandelin
A Sandelin
A Sharov
A Tomovic
Adrian J Shepherd
Armando Blanco
C Lawrence
D Denning
E Baker
E Szmidt
E Wingender
F Garcia
F Lam
F Lopez
F Offner
F Zare-Mirakabad
Fernando Garcia-Alcalde
G Chamilos
G Diop
G Hertz
J Hanley
J Hughes
J Sainz
J Van Helden
J Zhao
K Atanassov
K Atanassov
K Atanassov
K Atanassov
K Won
L Liang
L Zadeh
M Bulyk
M Das
M Eisen
N Dror
N Kim
P Benos
P Bochud
P Schling
R Gordan
S De
T Bailey
T Fawcett
T Hehlgans
T Tamura
T Tamura
V Khatibi
W Hung
W Wasserman
X Chen
Y Haudry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Repositorio Institucional Universidad de Granada

Birkbeck Institutional Research Online

FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral

Author: A Sandelin
A Sandelin
Armando Blanco
BJ Wilson
BP Gomez
Carlos Cano
DE Schones
Fernando Garcia
Francisco J Lopez
G Pavesi
HD Das MK
HJ Zimmerman
IG Choi
J Keller
J Torchia
JA Hanley
JD Hughes
KA Becker
L Kaufman
L Zadeh
M Dutertre
M Sugeno
M Tompa
P D'haeseleer
R Osada
S Gupta
S Mahony
S Pietrokovski
S Roepcke
SJ Van Laere
T Sørlie
T Wang
TL Bailey
UJ Pape
V Matys
XS Liu
Y Huang
Y Pan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources.Results We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches.Conclusion FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven.This work has been carried out as part of projects P08-TIC-4299 of J. A., Sevilla and TIN2006-13177 of DGICT, Madrid

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositorio Institucional Universidad de Granada

Defining the Plasticity of Transcription Factor Binding Sites by Deconstructing DNA Consensus Sequences: The PhoP-Binding Sites among Gamma/Enterobacteria

Author: A Aguirre
A Hochschild
A Kato
A Manson McGuire
A Martinez-Antonio
AG Blanco
AH Ko
AL Halpern
AM Moses
AM Moses
AP Gasch
B Anand
B Everitt
C Mouslim
C Mouslim
D Greene
D Knuth
D Shin
DF Browning
E Alm
E Bauer
E Benitez-Bellon
EA Groisman
EA Groisman
EA Groisman
Eduardo A. Groisman
F Depardieu
F Herrera
GD Stormo
GJ Klir
GK Smyth
GZ Hertz
H Li
H O'Geen
H Ochman
H Salgado
H Salgado
Henry Huang
HR Berenji
I Holmes
I Zwir
I Zwir
Igor Zwir
J Gertz
JA Hering
JC Bezdek
JC Perez
JC Perez
JD Hughes
JT Wade
K Deb
K Hollands
L McCue
L Ni
M Sugeno
M Thomas-Chollier
M Tompa
MB Eisen
MD Snavely
N Rajewsky
O Cordon
Oscar Harari
P Hong
P Monsieurs
QX Liu
R Janky
R Kohavi
R Krishnapuram
R Nadon
S Lejona
S Mahony
S Minagawa
S Roy
S Tavazoie
SL Pond
Sun-Yang Park
T-P Hong
TL Bailey
TL Bailey
TM Mitchell
Wyeth W. Wasserman
Y Barash
Y Benjamini
Y Setty
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg2+ homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the “Divide & Conquer” strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Repositorio Institucional Universidad de Granada

Digital Commons@Becker

Probabilistic $K$ -mean with local alignment for clustering and motif discovery in functional data

Author: Chiaromonte Francesca
Cremona Marzia A.
Publication venue
Publication date: 07/07/2020
Field of study

We develop a new method to locally cluster curves and discover functional motifs, i.e.~typical ``shapes'' that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical ``shape''). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Berkeley growth data, Italian Covid-19 death curves and ``Omics'' data related to mutagenesis.Comment: 22 pages, 6 figures. This work has been presented at various conference

arXiv.org e-Print Archive

Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries

Author: Chen Bernard
Publication venue: ScholarWorks @ Georgia State University
Publication date: 17/07/2009
Field of study

Protein sequence motifs are gathering more and more attention in the field of sequence analysis. The recurring patterns have the potential to determine the conformation, function and activities of the proteins. In our work, we obtained protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is essential. We use two granular computing models, Fuzzy Improved K-means (FIK) and Fuzzy Greedy K-means (FGK), in order to efficiently generate protein motif information. After that, we develop an efficient Super Granular SVM Feature Elimination model to further extract the motif information. During the motifs searching process, setting up a fixed window size in advance may simplify the computational complexity and increase the efficiency. However, due to the fixed size, our model may deliver a number of similar motifs simply shifted by some bases or including mismatches. We develop a new strategy named Positional Association Super-Rule to confront the problem of motifs generated from a fixed window size. It is a combination approach of the super-rule analysis and a novel Positional Association Rule algorithm. We use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified HHK clustering, which requires no parameter setup to identify the similarities and dissimilarities between the motifs. The positional association rule is created and applied to search similar motifs that are shifted some residues. By analyzing the motifs results generated by our approaches, we realize that these motifs are not only significant in sequence area, but also in secondary structure similarity and biochemical properties

ScholarWorks @ Georgia State University

Reverse Engineering Gene Regulatory Networks by Integrating Multi-Source Biological Data

Author: Habtom W. Ressom
Jean-Pierre A. Kocher
Yuji Zhang
Publication venue: 'IntechOpen'
Publication date: 07/03/2012
Field of study

IntechOpen

On the use of algorithms to discover motifs in DNA sequences

Author: Martínez Ballesteros María del Mar
Martínez Álvarez Francisco
Riquelme Santos José Cristóbal
Rubio Escudero Cristina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Many approaches are currently devoted to find DNA motifs in nucleotide sequences. However, this task remains challenging for specialists nowadays due to the difficulties they find to deeply understand gene regulatory mechanisms, especially when analyzing binding sites in DNA. These sites or specific nucleotide sequences are known to be responsible for transcription processes. Thus, this work aims at providing an updated overview on strategies developed to discover meaningful motifs in DNA-related sequences, and, in particular, their attempts to find out relevant binding sites. From all existing approaches, this work is focused on dictionary, ensemble, and artificial intelligence-based algorithms since they represent the classical and the leading ones, respectively.Ministerio de Ciencia y Tecnología TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611

idUS. Depósito de Investigación Universidad de Sevilla