42 research outputs found
Impact of residue accessible surface area on the prediction of protein secondary structures
<p>Abstract</p> <p>Background</p> <p>The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.</p> <p>Results</p> <p>We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.</p> <p>Conclusion</p> <p>The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.</p
Impact of RNA structure on the prediction of donor and acceptor splice sites
BACKGROUND: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms. RESULTS: we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models. CONCLUSION: Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information
FFCA: a feasibility-based method for flux coupling analysis of metabolic networks
<p>Abstract</p> <p>Background</p> <p>Flux coupling analysis (FCA) is a useful method for finding dependencies between fluxes of a metabolic network at steady-state. FCA classifies reactions into subsets (called coupled reaction sets) in which activity of one reaction implies activity of another reaction. Several approaches for FCA have been proposed in the literature.</p> <p>Results</p> <p>We introduce a new FCA algorithm, FFCA (Feasibility-based Flux Coupling Analysis), which is based on checking the feasibility of a system of linear inequalities. We show on a set of benchmarks that for genome-scale networks FFCA is faster than other existing FCA methods.</p> <p>Conclusions</p> <p>We present FFCA as a new method for flux coupling analysis and prove it to be faster than existing approaches. A corresponding software tool is freely available for non-commercial use at <url>http://www.bioinformatics.org/ffca/</url>.</p
A Systems-Based Approach for Cyanide Overproduction by Bacillus megaterium for Gold Bioleaching Enhancement
With the constant accumulation of electronic waste, extracting precious metals contained therein is becoming a major challenge for sustainable development. Bacillus megaterium is currently one of the microbes used for the production of cyanide, which is the main leaching agent for gold recovery. The present study aimed to propose a strategy for metabolic engineering of B. megaterium to overproduce cyanide, and thus ameliorate the bioleaching process. For this, we employed constraint-based modeling, running in silico simulations on iJA1121, the genome-scale metabolic model of B. megaterium DSM319. Flux balance analysis (FBA) was initially used to identify amino acids to be added to the culture medium. Considering cyanide as the desired product, we used growth-coupled methods, constrained minimal cut sets (cMCSs) and OptKnock to identify gene inactivation targets. To identify gene overexpression targets, flux scanning based on enforced objective flux (FSEOF) was performed. Further analysis was carried out on the identified targets to determine compounds with beneficial regulatory effects. We have proposed a chemical-defined medium for accelerating cyanide production on the basis of microplate assays to evaluate the components with the greatest improving effects. Accordingly, the cultivation of B. megaterium DSM319 in a chemically-defined medium with 5.56 mM glucose as the carbon source, and supplemented with 413 ÎŒM cysteine, led to the production of considerably increased amounts of cyanide. Bioleaching experiments were successfully performed in this medium to recover gold and copper from telecommunication printed circuit boards. The results of inductively coupled plasma (ICP) analysis confirmed that gold recovery peaked out at around 55% after 4 days, whereas copper recovery continued to increase for several more days, peaking out at around 85%. To further validate the bioleaching results, FESEM, XRD, FTIR, and EDAX mapping analyses were performed. We concluded that the proposed strategy represents a viable route for improving the performance of the bioleaching processes
A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins
<p>Abstract</p> <p>Background</p> <p>It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete.</p> <p>Results</p> <p>In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.</p> <p>Conclusion</p> <p>We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.</p
How accessibility influences citation counts: The case of citations to the full text articles available from ResearchGate
It is generally believed that the number of citations to an article can positively be correlated to its free online availability. In the present study, we investigated the possible impact of academic social networks on the number of citations. We chose the social web service âResearchGateâ as a case. This website acts both as a social network to connect researchers, and at the same time, as an open access repository to publish post-print version of the accepted manuscripts and final versions of open access articles. We collected the data of 1823 articles published by the authors from four different universities. By analyzing these data, we showed that although different levels of full text availability are observed for the four universities, there is always a significant positive correlation between full text availability and the citation count. Moreover, we showed that both post-print version and publisherâs version (i.e., final published version) of the archived manuscripts receive more citations than non-OA articles, and the difference in the citation counts of post-print manuscripts and publisherâs version articles is nonsignificant
Constraintbasierte Analyse von Unterstrukturen metabolischer Netzwerke
Constraint-based methods (CBMs) are promising tools for the analysis of
metabolic networks, as they do not require detailed knowledge of the
biochemical reactions. Some of these methods only need information about the
stoichiometric coefficients of the reactions and their reversibility types,
i.e., constraints for steady-state conditions. Nevertheless, CBMs have their
own limitations. For example, these methods may be sensitive to missing
information in the models. Additionally, they may be slow for the analysis of
genome-scale metabolic models. As a result, some studies prefer to consider
substructures of networks, instead of complete models. Some other studies have
focused on better implementations of the CBMs. In Chapter 2, the sensitivity
of flux coupling analysis (FCA) to missing reactions is studied. Genome-scale
metabolic reconstructions are comprehensive, yet incomplete, models of real-
world metabolic networks. While FCA has proved an appropriate method for
analyzing metabolic relationships and for detecting functionally related
reactions in such models, little is known about the impact of missing
reactions on the accuracy of FCA. Note that having missing reactions is
equivalent to deleting reactions, or to deleting columns from the
stoichiometric matrix. Based on an alternative characterization of flux
coupling relations using elementary flux modes, we study the changes that flux
coupling relations may undergo due to missing reactions. In particular, we
show that two uncoupled reactions in a metabolic network may be detected as
directionally, partially or fully coupled in an incomplete version of the same
network. Even a single missing reaction can cause significant changes in flux
coupling relations. In case of two consecutive E. coli genome-scale networks,
many fully-coupled reaction pairs in the incomplete network become
directionally coupled or even uncoupled in the more complete reconstruction.
In this context, we found gene expression correlation values being
significantly higher for the pairs that remained fully coupled than for the
uncoupled or directionally coupled pairs. Our study clearly suggests that FCA
results are indeed sensitive to missing reactions. Since the currently
available genome-scale metabolic models are incomplete, we advise to use FCA
results with care. In Chapter 3, a different, but related problem is
considered. Due to the large size of genome-scale metabolic networks, some
studies suggest to analyze subsystems, instead of original genome-scale
models. Note that analysis of a subsystem is equivalent to deletion of some
rows from the stoichiometric matrix, or identically, assuming some internal
metabolites to be external. We show mathematically that analysis of a
subsystem instead of the original model can lead the flux coupling relations
to undergo certain changes. In particular, a pair of (fully, partially or
directionally) coupled reactions may be detected as uncoupled in the chosen
subsystem. Interestingly, this behavior is the opposite of the flux coupling
changes that may happen due to the existence of missing reactions, or
equivalently, deletion of reactions. We also show that analysis of organelle
subsystems has relatively little influence on the results of FCA, and
therefore, many of these subsystems may be studied independent of the rest of
the network. In Chapter 4, we introduce a rapid FCA method, which is
appropriate for genome-scale networks. Previously, several approaches for FCA
have been proposed in the literature, namely flux coupling finder algorithm,
FCA based on minimal metabolic behaviors, and FCA based on elementary flux
patterns. To the best of our knowledge none of these methods are available as
a freely available software. Here, we introduce a new FCA algorithm FFCA
(Feasibility-based Flux Coupling Analysis). This method is based on checking
the feasibility of a system of linear inequalities. We show on a set of
benchmarks that for genome-scale networks FFCA is faster than other existing
FCA methods. Using FFCA, flux coupling analysis of genome-scale networks of S.
cerevisiae and E. coli can be performed in a few hours on a normal PC. A
corresponding software tool is freely available for non-commercial use. In
Chapter 5, we introduce a new concept which can be useful in the analysis of
fluxes in network substructures. Analysis of elementary modes (EMs) is proven
to be a powerful CBM in the study of metabolic networks. However, enumeration
of EMs is a hard computational task. Additionally, due to their large numbers,
one cannot simply use them as an input for subsequent analyses. One
possibility is to restrict the analysis to a subset of interesting reactions,
rather than the whole network. However, analysis of an isolated subnetwork can
result in finding incorrect EMs, i.e. the ones which are not part of any
steady-state flux distribution in the original network. The ideal set of
vectors to describe the usage of reactions in a subnetwork would be the set of
all EMs projected onto the subset of interesting reactions. Recently, the
concept of ``elementary flux patterns'' (EFPs) has been proposed. Each EFP is
a subset of the support (i.e. non-zero elements) of at least one EM. In the
present work, we introduce the concept of ProCEMs (Projected Cone Elementary
Modes). The ProCEM set can be computed by projecting the flux cone onto the
lower-dimensional subspace and enumerating the extreme rays of the projected
cone. In contrast to EFPs, ProCEMs are not merely a set of reactions, but from
the mathematical point of view they are projected EMs. We additionally prove
that the set of EFPs is included in the set of ProCEM supports. Finally,
ProCEMs and EFPs are compared in the study of substructures in biological
networks.Constraintbasierte Methoden (CBM) sind vielversprechende Werkzeuge fĂŒr die
Analyse von metabolischen Netzwerken, da sie keine detaillierte Kenntnis der
biochemischen Reaktionen verlangen. Einige dieser Methoden verlangen nur
Informationen ĂŒber die stöchiometrischen Koeffizienten der Reaktionen und
deren ReversibilitĂ€ts-Typus, d.h. EinschrĂ€nkungen fĂŒr Steady-State-
Bedingungen. Dennoch haben CBM ihre eigenen Grenzen. Zum Beispiel können diese
Methoden empfindlich auf fehlende Informationen in den Modellen reagieren.
DarĂŒber hinaus können sie bei der Analyse von genomweiten metabolischen
Modellen langsam sein. Deshalb ziehen es einige Studien vor, statt kompletten
Modellen Substrukturen von Netzwerken zu untersuchen. Andere Studien
konzentrieren sich auf eine bessere Implementierung der CBM. In Kapitel 2 wird
die Empfindlichkeit der Flusskopplungsanalyse (FCA) auf fehlende Reaktionen
untersucht. Genomweite metabolische Rekonstruktionen sind umfassende, aber
dennoch unvollstÀndige, Modelle von realen metabolischen Netzwerken. WÀhrend
FCA sich als geeignete Methode zur Analyse von metabolischen Beziehungen und
zur Erfassung funktionell verwandter Reaktionen in solchen Modellen bewÀhrt
hat, ist wenig ĂŒber die Auswirkungen der fehlenden Reaktionen auf die
Genauigkeit der FCA bekannt. Fehlende Reaktionen sind Àquivalent mit dem
Löschen von Reaktionen oder dem Löschen von Spalten der stöchiometrischen
Matrix. Basierend auf einer alternativen Charakterisierung der
Flusskopplungsbeziehungen mithilfe von elementaren Flussmodi untersuchen wir
die VerÀnderungen, die fehlende Reaktionen in Flusskopplungsbeziehungen
bewirken. Insbesondere zeigen wir, dass zwei ungekoppelte Reaktionen in einem
metabolischen Netzwerk als gerichtet, teilweise oder vollstÀndig gekoppelt in
einer unvollstÀndigen Version des gleichen Netzwerks wahrgenommen werden
können. Sogar eine einzige fehlende Reaktion kann zu erheblichen VerÀnderungen
der Flusskopplungsbeziehungen fĂŒhren. Bei zwei aufeinander folgenden E. coli-
genomweiten Netzwerken werden viele vollstÀndig gekoppelte Reaktionen im
unvollstÀndigen Netzwerk zu gerichtet gekoppelten oder sogar ungekoppelten
Paaren in kompletteren Rekonstruktionen. In diesem Zusammenhang haben wir
festgestellt, dass die Genexpressionskorrelationswerte deutlich höher fĂŒr
solche Paare waren, die vollstÀndig gekoppelt blieben, als bei ungekoppelten
oder gerichtet gekoppelten Paaren. Unsere Studie zeigt eindeutig, dass FCA-
Ergebnisse tatsÀchlich empfindlich auf fehlende Reaktionen reagieren. Da die
derzeit verfĂŒgbaren genomweiten metabolischen Modelle unvollstĂ€ndig sind,
empfehlen wir, FCA-Ergebnisse mit Vorsicht zu verwenden. In Kapitel 3 wird ein
verwandtes Problem betrachtet. Aufgrund der GröĂe von genomweiten
metabolischen Netzwerken werden in einigen Studien Subsysteme analysiert,
statt der ursprĂŒnglichen genomweiten Modelle. Dies entspricht der Streichung
einiger Zeilen aus der stöchiometrischen Matrix oder auch der Behandlung
einiger interner Metabolite als extern. Wir zeigen mathematisch, dass die
Analyse eines Subsystems statt des ursprĂŒnglichen Modells zu bestimmten
VerĂ€nderungen der Flusskopplungsbeziehungen fĂŒhren kann. Insbesondere ein Paar
voll, teilweise oder direktional gekoppelter Reaktionen kann im gewÀhlten
Subsystem als ungekoppelt wahrgenommen werden. Interessanterweise ist dieses
Verhalten das Gegenteil von den VerÀnderungen, die aufgrund von fehlenden
Reaktionen oder Streichung von Reaktionen geschehen. Wir zeigen auch, dass die
Analyse von Organellen-Subsystemen relativ wenig Einfluss auf die Ergebnisse
der FCA hat, und daher viele dieser Subsysteme unabhÀngig vom Rest des
Netzwerkes untersucht werden können. In Kapitel 4 stellen wir eine schnelle
FCA-Methode vor, die sich fĂŒr genomweite Netzwerke eignet. Bisher wurden
mehrere AnsĂ€tze fĂŒr FCA vorgeschlagen, nĂ€mlich der
Flusskopplungsfindungsalgorithmus, FCA basierend auf minimal metabolischem
Verhalten und FCA basierend auf elementaren Flussmustern. Soweit wir wissen
ist keine dieser Methoden frei als Software verfĂŒgbar. Hier stellen wir einen
neuen FCA-Algorithmus, FFCA (ZulÀssigkeitsbasierte-Flusskopplungsanalyse),
vor. Bei dieser Methode wird die ZulÀssigkeit eines Systems von linearen
Ungleichungen geprĂŒft. Wir zeigen an einer Reihe von Benchmarks, dass FFCA fĂŒr
genomweite Netzwerke schneller ist als andere bestehende FCA Methoden. Mit
FFCA, kann die Flusskopplunganalyse von genomweiten Netzwerken von S.
cerevisiae und E. coli in ein paar Stunden auf einem normalen PC durchgefĂŒhrt
werden. Ein entsprechendes Software-Tool ist zur nicht-kommerziellen Nutzung
frei verfĂŒgbar. In Kapitel 5 stellen wir ein neues Konzept vor, das bei der
Analyse der FlĂŒsse in Netzwerk-Substrukturen nĂŒtzlich ist. Die Analyse der
Elementarmodi (EMs) ist bewiesenermaĂen eine mĂ€chtige CBM bei der Studie von
metabolischen Netzwerken. Allerdings bedeutet die AufzÀhlung von EMs einen
groĂen Rechenaufwand. DarĂŒber hinaus kann man sie aufgrund ihrer groĂen Zahl
nicht einfach als Input fĂŒr spĂ€tere Analysen nutzen. Eine Möglichkeit ist es,
die Analyse auf eine Teilmenge interessanter Reaktionen zu beschrÀnken.
Allerdings kann die Analyse eines isolierten Subnetzes inkorrekte EMs
aufdecken, d.h., solche, die nicht Teil einer Steady-State Flussverteilung im
ursprĂŒnglichen Netzwerk sind. Die ideale Menge von Vektoren, um die Nutzung
von Reaktionen in einem Teilnetz zu beschreiben, ist die Menge aller EMs,
projiziert auf die Teilmenge der interessanten Reaktionen. Vor kurzem wurde
das Konzept der elementaren Flussmuster (EFP) vorgeschlagen. Jedes EFP ist
eine Teilmenge der TrÀger (d.h. Nicht-Null-Elemente) von mindestens einem EM.
In der vorliegenden Arbeit stellen wir das Konzept der ProCEMs (Projected Cone
Elementary Modes) vor. Das ProCEM Set kann durch Projektion des Flusskegels
auf den unteren Teilraum und der AufzÀhlung der extremen Strahlen der
projizierten Kegel berechnet werden. Im Gegensatz zu EFPs sind ProCEMs nicht
nur eine Reihe von Reaktionen, sondern sind, mathematisch gesehen, projizierte
EMs. Wir weisen auĂerdem nach, dass jedes EFP gleich dem TrĂ€ger von mindestens
einem ProCEM ist. SchlieĂlich werden ProCEMs und EFPs in der Analyse von
Substrukturen in biologischen Netzwerken verglichen
ADDITION OF CONTACT NUMBER INFORMATION CAN IMPROVE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEURAL NETWORKS
Prediction of protein secondary structures is one of the oldest problems in Bioinformatics. Although several different methods have been proposed to tackle this problem, none of these methods are perfect. Recently, it is proposed that addition of other structural information like accessible surface area of residues or prior information about protein structural class can significantly improve the prediction of secondary structures. In this work, we propose that contact number information can be considered as another useful source of information for improvement of secondary structure prediction. Since contact number, i. e. the number of other amino acid residues in the structural neighbourhood of a certain residue, depends on the secondary structure of the residue, we conjectured that contact number data can improve secondary structure prediction. We used two closely related neural networks to predict secondary structures. The only difference in the neural networks was that one of them was also provided with residue contact numbers as an additional input. Results suggested that addition of contact number information can result in a small, but significant improvement in prediction of secondary structures in proteins. Our results suggest that residue contact numbers can be used as a rich source of information for improvement of protein secondary structure prediction