42 research outputs found

    Impact of residue accessible surface area on the prediction of protein secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.</p> <p>Results</p> <p>We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.</p> <p>Conclusion</p> <p>The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.</p

    Impact of RNA structure on the prediction of donor and acceptor splice sites

    Get PDF
    BACKGROUND: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms. RESULTS: we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models. CONCLUSION: Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information

    FFCA: a feasibility-based method for flux coupling analysis of metabolic networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Flux coupling analysis (FCA) is a useful method for finding dependencies between fluxes of a metabolic network at steady-state. FCA classifies reactions into subsets (called coupled reaction sets) in which activity of one reaction implies activity of another reaction. Several approaches for FCA have been proposed in the literature.</p> <p>Results</p> <p>We introduce a new FCA algorithm, FFCA (Feasibility-based Flux Coupling Analysis), which is based on checking the feasibility of a system of linear inequalities. We show on a set of benchmarks that for genome-scale networks FFCA is faster than other existing FCA methods.</p> <p>Conclusions</p> <p>We present FFCA as a new method for flux coupling analysis and prove it to be faster than existing approaches. A corresponding software tool is freely available for non-commercial use at <url>http://www.bioinformatics.org/ffca/</url>.</p

    A Systems-Based Approach for Cyanide Overproduction by Bacillus megaterium for Gold Bioleaching Enhancement

    Get PDF
    With the constant accumulation of electronic waste, extracting precious metals contained therein is becoming a major challenge for sustainable development. Bacillus megaterium is currently one of the microbes used for the production of cyanide, which is the main leaching agent for gold recovery. The present study aimed to propose a strategy for metabolic engineering of B. megaterium to overproduce cyanide, and thus ameliorate the bioleaching process. For this, we employed constraint-based modeling, running in silico simulations on iJA1121, the genome-scale metabolic model of B. megaterium DSM319. Flux balance analysis (FBA) was initially used to identify amino acids to be added to the culture medium. Considering cyanide as the desired product, we used growth-coupled methods, constrained minimal cut sets (cMCSs) and OptKnock to identify gene inactivation targets. To identify gene overexpression targets, flux scanning based on enforced objective flux (FSEOF) was performed. Further analysis was carried out on the identified targets to determine compounds with beneficial regulatory effects. We have proposed a chemical-defined medium for accelerating cyanide production on the basis of microplate assays to evaluate the components with the greatest improving effects. Accordingly, the cultivation of B. megaterium DSM319 in a chemically-defined medium with 5.56 mM glucose as the carbon source, and supplemented with 413 ÎŒM cysteine, led to the production of considerably increased amounts of cyanide. Bioleaching experiments were successfully performed in this medium to recover gold and copper from telecommunication printed circuit boards. The results of inductively coupled plasma (ICP) analysis confirmed that gold recovery peaked out at around 55% after 4 days, whereas copper recovery continued to increase for several more days, peaking out at around 85%. To further validate the bioleaching results, FESEM, XRD, FTIR, and EDAX mapping analyses were performed. We concluded that the proposed strategy represents a viable route for improving the performance of the bioleaching processes

    A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete.</p> <p>Results</p> <p>In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.</p> <p>Conclusion</p> <p>We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.</p

    How accessibility influences citation counts: The case of citations to the full text articles available from ResearchGate

    Get PDF
    It is generally believed that the number of citations to an article can positively be correlated to its free online availability. In the present study, we investigated the possible impact of academic social networks on the number of citations. We chose the social web service “ResearchGate” as a case. This website acts both as a social network to connect researchers, and at the same time, as an open access repository to publish post-print version of the accepted manuscripts and final versions of open access articles. We collected the data of 1823 articles published by the authors from four different universities. By analyzing these data, we showed that although different levels of full text availability are observed for the four universities, there is always a significant positive correlation between full text availability and the citation count. Moreover, we showed that both post-print version and publisher’s version (i.e., final published version) of the archived manuscripts receive more citations than non-OA articles, and the difference in the citation counts of post-print manuscripts and publisher’s version articles is nonsignificant

    Constraintbasierte Analyse von Unterstrukturen metabolischer Netzwerke

    No full text
    Constraint-based methods (CBMs) are promising tools for the analysis of metabolic networks, as they do not require detailed knowledge of the biochemical reactions. Some of these methods only need information about the stoichiometric coefficients of the reactions and their reversibility types, i.e., constraints for steady-state conditions. Nevertheless, CBMs have their own limitations. For example, these methods may be sensitive to missing information in the models. Additionally, they may be slow for the analysis of genome-scale metabolic models. As a result, some studies prefer to consider substructures of networks, instead of complete models. Some other studies have focused on better implementations of the CBMs. In Chapter 2, the sensitivity of flux coupling analysis (FCA) to missing reactions is studied. Genome-scale metabolic reconstructions are comprehensive, yet incomplete, models of real- world metabolic networks. While FCA has proved an appropriate method for analyzing metabolic relationships and for detecting functionally related reactions in such models, little is known about the impact of missing reactions on the accuracy of FCA. Note that having missing reactions is equivalent to deleting reactions, or to deleting columns from the stoichiometric matrix. Based on an alternative characterization of flux coupling relations using elementary flux modes, we study the changes that flux coupling relations may undergo due to missing reactions. In particular, we show that two uncoupled reactions in a metabolic network may be detected as directionally, partially or fully coupled in an incomplete version of the same network. Even a single missing reaction can cause significant changes in flux coupling relations. In case of two consecutive E. coli genome-scale networks, many fully-coupled reaction pairs in the incomplete network become directionally coupled or even uncoupled in the more complete reconstruction. In this context, we found gene expression correlation values being significantly higher for the pairs that remained fully coupled than for the uncoupled or directionally coupled pairs. Our study clearly suggests that FCA results are indeed sensitive to missing reactions. Since the currently available genome-scale metabolic models are incomplete, we advise to use FCA results with care. In Chapter 3, a different, but related problem is considered. Due to the large size of genome-scale metabolic networks, some studies suggest to analyze subsystems, instead of original genome-scale models. Note that analysis of a subsystem is equivalent to deletion of some rows from the stoichiometric matrix, or identically, assuming some internal metabolites to be external. We show mathematically that analysis of a subsystem instead of the original model can lead the flux coupling relations to undergo certain changes. In particular, a pair of (fully, partially or directionally) coupled reactions may be detected as uncoupled in the chosen subsystem. Interestingly, this behavior is the opposite of the flux coupling changes that may happen due to the existence of missing reactions, or equivalently, deletion of reactions. We also show that analysis of organelle subsystems has relatively little influence on the results of FCA, and therefore, many of these subsystems may be studied independent of the rest of the network. In Chapter 4, we introduce a rapid FCA method, which is appropriate for genome-scale networks. Previously, several approaches for FCA have been proposed in the literature, namely flux coupling finder algorithm, FCA based on minimal metabolic behaviors, and FCA based on elementary flux patterns. To the best of our knowledge none of these methods are available as a freely available software. Here, we introduce a new FCA algorithm FFCA (Feasibility-based Flux Coupling Analysis). This method is based on checking the feasibility of a system of linear inequalities. We show on a set of benchmarks that for genome-scale networks FFCA is faster than other existing FCA methods. Using FFCA, flux coupling analysis of genome-scale networks of S. cerevisiae and E. coli can be performed in a few hours on a normal PC. A corresponding software tool is freely available for non-commercial use. In Chapter 5, we introduce a new concept which can be useful in the analysis of fluxes in network substructures. Analysis of elementary modes (EMs) is proven to be a powerful CBM in the study of metabolic networks. However, enumeration of EMs is a hard computational task. Additionally, due to their large numbers, one cannot simply use them as an input for subsequent analyses. One possibility is to restrict the analysis to a subset of interesting reactions, rather than the whole network. However, analysis of an isolated subnetwork can result in finding incorrect EMs, i.e. the ones which are not part of any steady-state flux distribution in the original network. The ideal set of vectors to describe the usage of reactions in a subnetwork would be the set of all EMs projected onto the subset of interesting reactions. Recently, the concept of ``elementary flux patterns'' (EFPs) has been proposed. Each EFP is a subset of the support (i.e. non-zero elements) of at least one EM. In the present work, we introduce the concept of ProCEMs (Projected Cone Elementary Modes). The ProCEM set can be computed by projecting the flux cone onto the lower-dimensional subspace and enumerating the extreme rays of the projected cone. In contrast to EFPs, ProCEMs are not merely a set of reactions, but from the mathematical point of view they are projected EMs. We additionally prove that the set of EFPs is included in the set of ProCEM supports. Finally, ProCEMs and EFPs are compared in the study of substructures in biological networks.Constraintbasierte Methoden (CBM) sind vielversprechende Werkzeuge fĂŒr die Analyse von metabolischen Netzwerken, da sie keine detaillierte Kenntnis der biochemischen Reaktionen verlangen. Einige dieser Methoden verlangen nur Informationen ĂŒber die stöchiometrischen Koeffizienten der Reaktionen und deren ReversibilitĂ€ts-Typus, d.h. EinschrĂ€nkungen fĂŒr Steady-State- Bedingungen. Dennoch haben CBM ihre eigenen Grenzen. Zum Beispiel können diese Methoden empfindlich auf fehlende Informationen in den Modellen reagieren. DarĂŒber hinaus können sie bei der Analyse von genomweiten metabolischen Modellen langsam sein. Deshalb ziehen es einige Studien vor, statt kompletten Modellen Substrukturen von Netzwerken zu untersuchen. Andere Studien konzentrieren sich auf eine bessere Implementierung der CBM. In Kapitel 2 wird die Empfindlichkeit der Flusskopplungsanalyse (FCA) auf fehlende Reaktionen untersucht. Genomweite metabolische Rekonstruktionen sind umfassende, aber dennoch unvollstĂ€ndige, Modelle von realen metabolischen Netzwerken. WĂ€hrend FCA sich als geeignete Methode zur Analyse von metabolischen Beziehungen und zur Erfassung funktionell verwandter Reaktionen in solchen Modellen bewĂ€hrt hat, ist wenig ĂŒber die Auswirkungen der fehlenden Reaktionen auf die Genauigkeit der FCA bekannt. Fehlende Reaktionen sind Ă€quivalent mit dem Löschen von Reaktionen oder dem Löschen von Spalten der stöchiometrischen Matrix. Basierend auf einer alternativen Charakterisierung der Flusskopplungsbeziehungen mithilfe von elementaren Flussmodi untersuchen wir die VerĂ€nderungen, die fehlende Reaktionen in Flusskopplungsbeziehungen bewirken. Insbesondere zeigen wir, dass zwei ungekoppelte Reaktionen in einem metabolischen Netzwerk als gerichtet, teilweise oder vollstĂ€ndig gekoppelt in einer unvollstĂ€ndigen Version des gleichen Netzwerks wahrgenommen werden können. Sogar eine einzige fehlende Reaktion kann zu erheblichen VerĂ€nderungen der Flusskopplungsbeziehungen fĂŒhren. Bei zwei aufeinander folgenden E. coli- genomweiten Netzwerken werden viele vollstĂ€ndig gekoppelte Reaktionen im unvollstĂ€ndigen Netzwerk zu gerichtet gekoppelten oder sogar ungekoppelten Paaren in kompletteren Rekonstruktionen. In diesem Zusammenhang haben wir festgestellt, dass die Genexpressionskorrelationswerte deutlich höher fĂŒr solche Paare waren, die vollstĂ€ndig gekoppelt blieben, als bei ungekoppelten oder gerichtet gekoppelten Paaren. Unsere Studie zeigt eindeutig, dass FCA- Ergebnisse tatsĂ€chlich empfindlich auf fehlende Reaktionen reagieren. Da die derzeit verfĂŒgbaren genomweiten metabolischen Modelle unvollstĂ€ndig sind, empfehlen wir, FCA-Ergebnisse mit Vorsicht zu verwenden. In Kapitel 3 wird ein verwandtes Problem betrachtet. Aufgrund der GrĂ¶ĂŸe von genomweiten metabolischen Netzwerken werden in einigen Studien Subsysteme analysiert, statt der ursprĂŒnglichen genomweiten Modelle. Dies entspricht der Streichung einiger Zeilen aus der stöchiometrischen Matrix oder auch der Behandlung einiger interner Metabolite als extern. Wir zeigen mathematisch, dass die Analyse eines Subsystems statt des ursprĂŒnglichen Modells zu bestimmten VerĂ€nderungen der Flusskopplungsbeziehungen fĂŒhren kann. Insbesondere ein Paar voll, teilweise oder direktional gekoppelter Reaktionen kann im gewĂ€hlten Subsystem als ungekoppelt wahrgenommen werden. Interessanterweise ist dieses Verhalten das Gegenteil von den VerĂ€nderungen, die aufgrund von fehlenden Reaktionen oder Streichung von Reaktionen geschehen. Wir zeigen auch, dass die Analyse von Organellen-Subsystemen relativ wenig Einfluss auf die Ergebnisse der FCA hat, und daher viele dieser Subsysteme unabhĂ€ngig vom Rest des Netzwerkes untersucht werden können. In Kapitel 4 stellen wir eine schnelle FCA-Methode vor, die sich fĂŒr genomweite Netzwerke eignet. Bisher wurden mehrere AnsĂ€tze fĂŒr FCA vorgeschlagen, nĂ€mlich der Flusskopplungsfindungsalgorithmus, FCA basierend auf minimal metabolischem Verhalten und FCA basierend auf elementaren Flussmustern. Soweit wir wissen ist keine dieser Methoden frei als Software verfĂŒgbar. Hier stellen wir einen neuen FCA-Algorithmus, FFCA (ZulĂ€ssigkeitsbasierte-Flusskopplungsanalyse), vor. Bei dieser Methode wird die ZulĂ€ssigkeit eines Systems von linearen Ungleichungen geprĂŒft. Wir zeigen an einer Reihe von Benchmarks, dass FFCA fĂŒr genomweite Netzwerke schneller ist als andere bestehende FCA Methoden. Mit FFCA, kann die Flusskopplunganalyse von genomweiten Netzwerken von S. cerevisiae und E. coli in ein paar Stunden auf einem normalen PC durchgefĂŒhrt werden. Ein entsprechendes Software-Tool ist zur nicht-kommerziellen Nutzung frei verfĂŒgbar. In Kapitel 5 stellen wir ein neues Konzept vor, das bei der Analyse der FlĂŒsse in Netzwerk-Substrukturen nĂŒtzlich ist. Die Analyse der Elementarmodi (EMs) ist bewiesenermaßen eine mĂ€chtige CBM bei der Studie von metabolischen Netzwerken. Allerdings bedeutet die AufzĂ€hlung von EMs einen großen Rechenaufwand. DarĂŒber hinaus kann man sie aufgrund ihrer großen Zahl nicht einfach als Input fĂŒr spĂ€tere Analysen nutzen. Eine Möglichkeit ist es, die Analyse auf eine Teilmenge interessanter Reaktionen zu beschrĂ€nken. Allerdings kann die Analyse eines isolierten Subnetzes inkorrekte EMs aufdecken, d.h., solche, die nicht Teil einer Steady-State Flussverteilung im ursprĂŒnglichen Netzwerk sind. Die ideale Menge von Vektoren, um die Nutzung von Reaktionen in einem Teilnetz zu beschreiben, ist die Menge aller EMs, projiziert auf die Teilmenge der interessanten Reaktionen. Vor kurzem wurde das Konzept der elementaren Flussmuster (EFP) vorgeschlagen. Jedes EFP ist eine Teilmenge der TrĂ€ger (d.h. Nicht-Null-Elemente) von mindestens einem EM. In der vorliegenden Arbeit stellen wir das Konzept der ProCEMs (Projected Cone Elementary Modes) vor. Das ProCEM Set kann durch Projektion des Flusskegels auf den unteren Teilraum und der AufzĂ€hlung der extremen Strahlen der projizierten Kegel berechnet werden. Im Gegensatz zu EFPs sind ProCEMs nicht nur eine Reihe von Reaktionen, sondern sind, mathematisch gesehen, projizierte EMs. Wir weisen außerdem nach, dass jedes EFP gleich dem TrĂ€ger von mindestens einem ProCEM ist. Schließlich werden ProCEMs und EFPs in der Analyse von Substrukturen in biologischen Netzwerken verglichen

    ADDITION OF CONTACT NUMBER INFORMATION CAN IMPROVE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEURAL NETWORKS

    Get PDF
    Prediction of protein secondary structures is one of the oldest problems in Bioinformatics. Although several different methods have been proposed to tackle this problem, none of these methods are perfect. Recently, it is proposed that addition of other structural information like accessible surface area of residues or prior information about protein structural class can significantly improve the prediction of secondary structures. In this work, we propose that contact number information can be considered as another useful source of information for improvement of secondary structure prediction. Since contact number, i. e. the number of other amino acid residues in the structural neighbourhood of a certain residue, depends on the secondary structure of the residue, we conjectured that contact number data can improve secondary structure prediction. We used two closely related neural networks to predict secondary structures. The only difference in the neural networks was that one of them was also provided with residue contact numbers as an additional input. Results suggested that addition of contact number information can result in a small, but significant improvement in prediction of secondary structures in proteins. Our results suggest that residue contact numbers can be used as a rich source of information for improvement of protein secondary structure prediction
    corecore