134 research outputs found

    Evolution of Endosperm Starch Synthesis Pathway genes in the Context of Rice: Oryza sativa) Domestication

    Get PDF
    The evolution of metabolic pathways is a fundamental but poorly understood aspect of evolutionary change. The rice endosperm starch biosynthetic pathway is one of the most thoroughly characterized biosynthesis pathways in plants, and starch is a trait that has evolved in response to strong selection during rice domestication and subsequent crop improvement. In this study, I have examined six key genes in the rice endosperm starch biosynthesis pathway to investigate the evolution of this pathway before rice domestication and during rice domestication. Oryza rufipogon is the wild ancestor of cultivated rice: Oryza sativa). Oryza sativa has five variety groups: aus, indica, tropical japonica, temperate japonica and aromatic. I have sequenced five genes: shrunken2, Sh2; brittle2, Bt2; waxy, Wx; starch synthase IIa, SsIIa; starch branching enzyme IIb, SbeIIb; and isoamylase1, Iso1) in 70 O. rufipogon accessions, 99 cultivated rice accessions: aus, 10; indica, 34; tropical japonica, 26; temperate japonica, 21; aromatic rice, 8) and two accessions of two closely related species, O. barthii, O. meridionalis. The published sequence data for Wx in rice are included in the analysis as well. The difficulty of detecting selection is often caused by the complex demographic history of a species. Genome-wide sequence data in a species would mainly reflect its demographic history. I have compared the pattern of nucleotide variation at each starch gene with published genome-wide sequence data and with a standard neutral model for detecting selection. Results show no evidence of deviations from neutrality at these six starch genes in O. rufipogon and no evidence of deviations from neutrality at four starch genes in O. sativa. Evidence of selection is observed at Wx in tropical japonica and temperate japonica, and at Wx and SbeIIb in aromatic rice. Starch quality is one of the most important agronomic traits in rice. Starch synthase IIa: SsIIa) has been mapped as a gene which contributes to the starch quality variation in cultivated rice, O. sativa. Within the gene, three nonsysnonymous mutations in the exon 8 region were shown to affect its enzyme activity in Escherichia coli. In order to identify the mutation in SsIIa exon 8 region that is responsible for starch quality variation in rice, I have sequenced SSIIa exon 8 region and recorded the alkali spreading score in 57 O. rufipogon accessions and 151 cultivated rice accessions: aus, 8; indica, 51; tropical japonica, 55; temperate japonica, 29; aromatic, 8). Starch alkali spreading score is used to quantify rice endosperm starch quality and has been shown to be significantly associated with SsIIa enzyme activity in rice. Both a general linear model and nested clade analysis were used to detect an association between the three nonsynonymous mutations in SSIIa exon 8 and the alkali spreading score. In order to avoid the effect of population structure on the association analysis, both association analyses are conducted within each rice variety group. Among the previously identified nonsynonymous mutations, my results show strong evidence of association at one nonsynonymous mutation: SNP3, see Fig 2 of Chapter 2), and evidence of no association at another nonsynonymous mutation. Tests of association for the other nonsynonymous mutation are inconclusive with current samples and will require further investigation. This dissertation reveals the relative role of evolutionary forces in shaping the variation pattern of six starch genes in O. sativa and its wild ancestor, O. rufipogon. It also reveals an association between a nonsynonymous mutation in SSIIa exon 8 and rice endosperm starch quality

    Algorithms for automated assignment of solution-state and solid-state protein NMR spectra.

    Get PDF
    Protein nuclear magnetic resonance spectroscopy (Protein NMR) is an invaluable analytical technique for studying protein structure, function, and dynamics. There are two major types of NMR spectroscopy that are used for investigation of protein structure – solution-state and solid-state NMR. Solution-based NMR spectroscopy is typically applied to proteins of small and medium size that are soluble in water. Solid-state NMR spectroscopy is amenable for proteins that are insoluble in water. In the vast majority NMR-based protein studies, the first step after experiment optimization is the assignment of protein resonances via the association of chemical shift values to specific atoms in a protein macromolecule. Depending on the quality of the spectra, a manual protein resonance assignment process often requires a considerable amount of time, from weeks to months-worth of effort even, by an experienced NMR spectroscopist . The resonance assignment processes for solution-state and solid-state protein NMR studies are conceptually similar, but have distinct differences due to the utilization of different NMR experiments and to the use of different resonances for grouping peaks into spin systems. Currently, there is a shortage of robust, effective software tools that can perform solid-state protein resonance assignment and there is no general software that can perform both solution-state and solid-state protein resonance assignment in a reliable, automated fashion. Hence, the motivation of this research is to design and implement algorithms and software tools that will automate the resonance assignment problem. As a result of this research, several algorithms and software packages that aid several important steps in the protein resonance assignment process were developed. For example, the nmrstarlib software package can access and utilize data deposited in the NMR-STAR format; the core of this library is the lexical analyzer for NMR-STAR syntax that acts as a generator-based state-machine for token processing. The jpredapi software package provides an easy-to-use API to submit and retrieve results from secondary structure prediction server. The single peak list and pairwise peak list registration algorithms address the problem of multiple sources of variance within single peak list and between different peak lists and is capable of calculating the match tolerance values necessary for spin system grouping. The single peak list and pairwise peak list grouping algorithms are based on the well-known DBSCAN clustering algorithm and are designed to group peaks into spin systems within single peak list as well as between different peak lists

    Application of Singular Spectrum Analysis (SSA), Independent Component Analysis (ICA) and Empirical Mode Decomposition (EMD) for automated solvent suppression and automated baseline and phase correction from multi-dimensional NMR spectra

    Get PDF
    A common problem on protein structure determination by NMR spectroscopy is due to the solvent artifact. Typically, a deuterated solvent is used instead of normal water. However, several experimental methods have been developed to suppress the solvent signal in the case that one has to use a protonated solvent or if the signals of the remaining protons even in a highly deuterated sample are still too strong. For a protein dissolved in 90% H2O / 10% D2O, the concentration of solvent protons is about five orders of magnitude greater than the concentration of the protons of interest in the solute. Therefore, the evaluation of multi-dimensional NMR spectra may be incomplete since certain resonances of interest (e.g. Hα proton resonances) are hidden by the solvent signal and since signal parts of the solvent may be misinterpreted as cross peaks originating from the protein. The experimental solvent suppression procedures typically are not able to recover these significant protein signals. Many post-processing methods have been designed in order to overcome this problem. In this work, several algorithms for the suppression of the water signal have been developed and compared. In particular, it has been shown that the Singular Spectrum Analysis (SSA) can be applied advantageously to remove the solvent artifact from NMR spectra of any dimensionality both digitally and analogically acquired. In particular, the investigated time domain signals (FIDs) are decomposed into water and protein related components by means of an initial embedding of the data in the space of time-delayed coordinates. Eigenvalue decomposition is applied on these data and the component with the highest variance (typically represented by the dominant solvent signal) is neglected before reverting the embedding. Pre-processing (group delay management and signal normalization) and post-processing (inverse normalization, Fourier transformation and phase and baseline corrections) of the NMR data is mandatory in order to obtain a better performance of the suppression. The optimal embedding dimension has been empirically determined in accordance to a specific qualitative and quantitative analysis of the extracted components applied on a back-calculated two-dimensional spectrum of HPr protein from Staphylococcus aureus. Moreover, the investigation of experimental data (three-dimensional 1H13C HCCH-TOCSY spectrum of Trx protein from Plasmodium falciparum and two-dimensional NOESY and TOCSY spectra of HPr protein from Staphylococcus aureus) has revealed the ability of the algorithm to recover resonances hidden underneath the water signal. Pathological diseases and the effects of drugs and lifestyle can be detected from NMR spectroscopy applied on samples containing biofluids (e.g. urine, blood, saliva). The detection of signals of interest in such spectra can be hampered by the solvent as well. The SSA has also been successfully applied to one-dimensional urine, blood and cell spectra. The algorithm for automated solvent suppression has been introduced in the AUREMOL software package (AUREMOL_SSA). It is optionally followed by an automated baseline correction in the frequency domain (AUREMOL_ALS) that can be also used out the former algorithm. The automated recognition of baseline points is differently performed in dependence on the dimensionality of the data. In order to investigate the limitations of the SSA, it has been applied to spectra whose dominant signal is not the solvent (as in case of watergate solvent suppression and in case of back-calculated data not including any experimental water signal) determining the optimal solvent-to-solute ratio. The Independent Component Analysis (ICA) represents a valid alternative for water suppression when the solvent signal is not the dominant one in the spectra (when it is smaller than the half of the strongest solute resonance). In particular, two components are obtained: the solvent and the solute. The ICA needs as input at least as many different spectra (mixtures) as the number of components (source signals), thus the definition of a suitable protocol for generating a dataset of one-dimensional ICA-tailored inputs is straightforward. The ICA has revealed to overcome the SSA limitations and to be able to recover resonances of interest that cannot be detected applying the SSA. The ICA avoids all the pre- and post-processing steps, since it is directly applied in the frequency domain. On the other hand, the selection of the component to be removed is automatically detected in the SSA case (having the highest variance). In the ICA, a visual inspection of the extracted components is still required considering that the output is permutable and scale and sign ambiguities may occur. The Empirical Mode Decomposition (EMD) has revealed to be more suitable for automated phase correction than for solvent suppression purposes. It decomposes the FID into several intrinsic mode functions (IMFs) whose frequency of oscillation decreases from the first to the last ones (that identifies the solvent signal). The automatically identified non-baseline regions in the Fourier transform of the sum of the first IMFs are separately evaluated and genetic algorithms are applied in order to determine the zero- and first-order terms suitable for an optimal phase correction. The SSA and the ALS algorithms have been applied before assigning the two-dimensional NOESY spectrum (with the program KNOWNOE) of the PSCD4-domain of the pleuralin protein in order to increase the number of already existing distance restraints. A new routine to derive 3JHNHα couplings from torsion angles (Karplus relation) and vice versa, has been introduced in the AUREMOL software. Using the newly developed tools a refined three-dimensional structure of the PSCD4-domain could be obtained

    Mathematische Verfahren zur AufklĂ€rung der Struktur, Dynamik und biologischen AktivitĂ€t von MolekĂŒlen unter Verwendung von NMR spektroskopischen und empirischen Parametern

    Get PDF
    In der vorliegenden Arbeit werden Verfahren der Mathematik und Informatik entwickelt und eingesetzt, um Struktur, Dynamik und biologische AktivitĂ€t aus NMR spektroskopischen und empirischen Parametern zu bestimmen. Dolastatin 10 und Epothilon A sind potentielle Wirkstoffe gegen Krebs, da sie durch Wechselwirkung mit Tubulin die Zellteilung unterbinden. Die 3D Struktur beider Wirkstoffe in Lösung und die Struktur von an Tubulin gebundenem Epothilon A wird aus NMR spektroskopischen Parametern bestimmt. Dolastatin 10 liegt in einem konformationellen Gleichgewicht zwischen der cis -- und trans -- Konformation in der ungewöhnlichen AminosĂ€ure DAP vor. Beide Konformationen des flexiblen Pentapeptids können bestimmt werden mit RMSD = 1.423 Å fĂŒr das cis -- Konformer und RMSD = 1.488 Å fĂŒr das trans -- Konformer. WĂ€hrend das trans -- Konformer gestreckt vorliegt, faltet das cis -- Konformer am DAP zurĂŒck. Epothilone A ist durch einen Makrozyklus weniger flexibel und sowohl die an Tubulin gebundene Struktur (RMSD = 0.537 Å) als auch freie Form (RMSD = 0.497 Å) kann mit geringen RMSD -- Werten bestimmt werden. Die Struktur der freien Form, welche in Lösung hauptsĂ€chlich vorliegt, ist mit der Röntgenstruktur weitgehend identisch. In der an Tubulin gebundenen Form wird eine essentielle Umorientierung der Seitenkette beobachtet, die fĂŒr die Wechselwirkung mit Tubulin entscheidend ist. Dipolare Kopplungen eines Proteins sind geeignet, eine 3D Homologiesuche in der PDB durchzufĂŒhren, da die relative Orientierung von SekundĂ€rstrukturelementen und DomĂ€nen durch sie beschrieben wird 85 . Die frĂŒhe Erkennung 3D homologer Proteinfaltungen eröffnet die Möglichkeit, die Bestimmung von Proteinstrukturen zu beschleunigen. Eine Homolgiesuche unter Nutzung dipolarer Kopplungen ist in der Lage, Proteine oder zumindest Fragmente mit Ă€hnlicher 3D Struktur zu finden, auch wenn die PrimĂ€rsequenzhomologie gering ist. DarĂŒber hinaus wird eine Transformation fĂŒr experimentelle dipolare Kopplungen entwickelt, die die indirekte Orientierungsinformation eines Vektors relativ zu einem externen Tensor in den möglichen Bereich fĂŒr den Projektionswinkel zwischen zwei Vektoren und somit in eine intramolekulare Strukturinformation ĂŒbersetzt. Diese EinschrĂ€nkungen können in der Strukturbestimmung von Proteinen mittels Molekulardynamik genutzt werden 92 . Im Gegensatz zu allen existierenden Implementierungen wird die Konvergenz der Rechnung durch die auf diese Weise eingefĂŒhrten dipolare Kopplungsinformation kaum beeinflusst. Die dipolaren Kopplungen werden trotzdem von den errechneten Strukturen erfĂŒllt. Auch ohne die Nutzung bereits bekannter Protein­ oder Fragmentstrukturen kann so ein erheblicher Teil der NOE -- Information substituiert werden. Die Dynamik des Vektors, der die beiden wechselwirkenden Dipole verbindet, beeinflusst den Messwert der dipolaren Kopplung. Dadurch wird Information ĂŒber die Dynamik von MolekĂŒlen auf der ”s­Zeitskala zugĂ€nglich, die bisher nur schwer untersucht werden konnte. Die Messung dipolarer Kopplungen fĂŒr einen Vektor in verschiedenen Orientierungen erlaubt die Analyse seiner Bewegung 89 . Im besonderen ist die Ableitung eines modellfreien Ordnungsparameters 2 S möglich. Weiterhin lassen sich ebenso modellfrei eine mittlere Orientierung des Vektors, axialsymmetrische Anteile und nichtaxialsymmetrische Anteile der Dynamik ableiten und auswerten. Die Anwendung der so entwickelten Protokolle auf experimentelle Daten 90 lĂ€sst Proteine deutlich dynamischer erscheinen als auf der Zeitskala der Relaxationsexperimente zu erkennen ist. Der mittlere Ordnungsparameter sinkt von 0.8 auf 0.6. Dies entspricht einer Erhöhung des Öffnungswinkels der Bewegung von ca. 22 ° auf ca. 33°. Die Bewegungen weichen teilweise bis zu 40% und im Mittel 15% von der Axialsymmetrie ab. Neuronale Netze erlauben eine schnelle (ca. 5000 chemische Verschiebungen pro Sekunde) und exakte (mittleren Abweichung von 1.6 ppm) Berechnung der 13 C NMR chemischen Verschiebung 115 . Dabei kombinieren sie die Vorteile bisher bekannter DatenbankabschĂ€tzungen (hohe Genauigkeit) und Inkrementverfahren (hohe Geschwindigkeit). Das 13 C NMR Spektrum einer organischen Verbindung stellt eine detaillierte Beschreibung seiner Struktur dar. Resultate des Strukturgenerators COCON können durch den Vergleich des experimentellen mit den berechneten 13 C NMR Spektren auf ca. 1 o/oo der vorgeschlagenen Strukturen eingeschrĂ€nkt werden, die eine geringe Abweichung zum experimentellen Spektrum haben 122 . Die Kombination mit einer Substrukturanalyse erlaubt weiterhin die Erkennung wahrscheinlicher, geschlossener Ringsysteme und gibt einen Überblick ĂŒber die Struktur des generierten Konstitutionssubraumes. Genetische Algorithmen können die Struktur organischer MolekĂŒle ausgehend von derer Summenformel auf eine Übereinstimmung mit dem experimentellen 13 C NMR Spektrum optimieren. Die Konstitution von MolekĂŒlen wird dafĂŒr durch einen Vektor der BindungszustĂ€nde zwischen allen Atom -- Atom Paaren beschrieben. Selbige Vektoren sind geeignet, in einem genetischen Algorithmus als genetischer Code von Konstitutionen betrachtet zu werden. Diese Methode erlaubt die automatisierte Bestimmung der Konstitution von MolekĂŒlen mit 10 bis 20 Nichtwasserstoffatomen 123 . Symmetrische neuronale Netze können fĂŒnf bzw. sieben dimensionale, heterogene ParameterreprĂ€sentationen der 20 proteinogenen AminosĂ€uren unter Erhalt der wesentlichen Information in den dreidimensionalen Raum projizieren 134 . Die niederdimensionalen Projektionen ermöglichen eine Visualisierung der Beziehungen der AminosĂ€uren untereinander. Die reduzierten ParameterreprĂ€sentationen sind geeignet, als Eingabe fĂŒr ein neuronales Netz zu dienen, welches die SekundĂ€rstruktur eines Proteins mit einer Genauigkeit von 66 % im Q 3 -- Wert berechnet. Neuronale Netzte sind aufgrund ihrer flexiblen Struktur besonders geeignet, quantitative Beziehungen zwischen Struktur und AktivitĂ€t zu beschreiben, da hier hochgradig nichtlineare, komplexe ZusammenhĂ€nge vorliegen. Eine numerische Codierung der ĂŒber 200 in der Literatur beschriebenen Epothilonderivate erlaubt es, Modelle zur Berechnung der Induktion der Tubulin Polymerisation (R = 0.73) und der Inhibierung des Krebszellenwachstums (R = 0.94) zu erstellen 136 . Die trainierten neuronalen Netze können in einer SensitivitĂ€tsanalyse genutzt werden, um die Bindungsstellen des MolekĂŒls zu identifizieren. Aus der Berechnung der AktivitĂ€t fĂŒr alle MolekĂŒle des durch die Parameter definierten Strukturraums ergeben sich VorschlĂ€ge fĂŒr Epothilonderivate, die bis zu 1 000 mal aktiver als die bisher synthetisierten sein könnten

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Response of soil viral and microbial functional diversity to long-term agricultural management in Jackson, West Tennessee

    Get PDF
    Soil microbial communities are a critical component for ecosystem stability and function. Viruses, as an important biotic controller, have the potential to regulate the abundance and diversity of bacterial communities through infection. Soil is known to harbor abundant and diverse viral assemblages but their ecological role and influence on microbial processes has not been fully elucidated. Microbes can be influenced by viruses not only from infection but though biogeochemical feedbacks of the “microbial (bacterium–phage–DOC) loop” or “viral shunt”. However, we know relatively little about the microbial community and function under the regulation of viruses in soil and how they respond to agricultural management under climate change. The objectives in this dissertation were (1) to estimate variability of soil viral and bacterial communities under a long-term conventional tillage, cover cropping and inorganic N fertilization management practices, and (2) to access the effect of seasonal change on soil bacteria community diversity and structure, and (3) identify the correlation between viruses and their host, and (4) reveal the response of microbial functional genes (C-degradation and N-cycling genes) on cover cropping and fertilization, and the potential roles of viruses on C-degradation. Soil treatments including two nitrogen rates (0, 67 kg N/ha -1), three levels of cover crop (no-cover, hairy vetch and winter wheat), and two tillage managements (conventional tillage and no tillage) from West Tennessee Agriculture Research and Education Center in Jackson were used. The soil bacterial diversity, functional gene and viral diversity was evaluated by 16S rRNA amplicon sequencing, RAPD-PCR, and bulk soil metagenomic sequencing. My findings highlight the importance of microbial on N fertilization and cover cropping in maintaining long-term C pool stability and N concentractions in agricultural soil, and the impact of viruses on C metabolism through regulating microbial metabolism using auxiliary metabolic genes. This study improves our understanding the ecological roles of soil viruses in influencing soil functions under long-term conservation agricultural management
    • 

    corecore