Search CORE

907 research outputs found

Computational Methods For Comparative Non-coding Rna Analysis: From Structural Motif Identification To Genome-wide Functional Classification

Author: Zhong Cuncong
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2013
Field of study

Recent advances in biological research point out that many ribonucleic acids (RNAs) are transcribed from the genome to perform a variety of cellular functions, rather than merely acting as information carriers for protein synthesis. These RNAs are usually referred to as the non-coding RNAs (ncRNAs). The versatile regulation mechanisms and functionalities of the ncRNAs contribute to the amazing complexity of the biological system. The ncRNAs perform their biological functions by folding into specific structures. In this case, the comparative study of the ncRNA structures is key to the inference of their molecular and cellular functions. We are especially interested in two computational problems for the comparative analysis of ncRNA structures: the alignment of ncRNA structures and their classification. Specifically, we aim to develop algorithms to align and cluster RNA structural motifs (recurrent RNA 3D fragments), as well as RNA secondary structures. Thorough understanding of RNA structural motifs will help us to disassemble the huge RNA 3D structures into functional modules, which can significantly facilitate the analysis of the detailed molecular functions. On the other hand, efficient alignment and clustering of the RNA secondary structures will provide insights for the understanding of the ncRNA expression and interaction in a genomic scale. In this dissertation, we will present a suite of computational algorithms and software packages to solve the RNA structural motif alignment and clustering problem, as well as the RNA iii secondary structure alignment and clustering problem. The summary of the contributions of this dissertation is as follows. (1) We developed RNAMotifScan for comparing and searching RNA structural motifs. Recent studies have shown that RNA structural motifs play an essential role in RNA folding and interaction with other molecules. Computational identification and analysis of RNA structural motifs remain to be challenging tasks. Existing motif identification methods based on 3D structure may not properly compare motifs with high structural variations. We present a novel RNA structural alignment method for RNA structural motif identi- fication, RNAMotifScan, which takes into consideration the isosteric (both canonical and non-canonical) base-pairs and multi-pairings in RNA structural motifs. The utility and accuracy of RNAMotifScan are demonstrated by searching for Kink-turn, C-loop, Sarcin-ricin, Reverse Kink-turn and E-loop motifs against a 23s rRNA (PDBid: 1S72), which is well characterized for the occurrences of these motifs. (2) We improved upon RNAMotifScan by incorporating base-stacking information and devising a new branch-and-bound algorithm called RNAMotifScanX. Model-based search of RNA structural motif has been focused on finding instances with similar 3D geometry and base-pairing patterns. Although these methods have successfully identified many of the true motif instances, each of them has its own limitations and their accuracy and sensitivity can be further improved. We introduce a novel approach to model the RNA structural motifs, which incorporates both base-pairing and base-stacking information. We also develop a new algorithm to search for known motif instances with the consideration of both base-pairing and base-stacking information. Benchmarking of RNAMotifScanX on searching known RNA structural motifs including kink-turn, C-loop, sarcin-ricin, reverse kink-turn, and E-loop iv clearly show improved performances compared to its predecessor RNAMotifScan and other state-of-the-art RNA structural motif search tools. (3) We develop an RNA structural motif clustering and de novo identification pipeline called RNAMSC. RNA structural motifs are the building blocks of the complex RNA architecture. Identification of non-coding RNA structural motifs is a critical step towards understanding of their structures and functionalities. We present a clustering approach for de novo RNA structural motif identification. We applied our approach on a data set containing 5S, 16S and 23S rRNAs and rediscovered many known motifs including GNRA tetraloop, kink-turn, C-loop, sarcin-ricin, reverse kink-turn, hook-turn, E-loop and tandem-sheared motifs, with higher accuracy than the currently state-of-the-art clustering method. More importantly, several novel structural motif families have been revealed by our novel clustering analysis. (4) We propose an improved RNA structural clustering pipeline that takes into account the length-dependent distribution of the structural similarity measure. We also devise a more efficient and robust CLique finding CLustering algorithm (CLCL), to replace the traditional hierarchical clustering approach. Benchmark of the proposed pipeline on Rfam data clearly demonstrates over 10% performance gain, when compared to a traditional hierarchical clustering pipeline. We applied this new computational pipeline to cluster the posttranscriptional control elements in fly 3’-UTR. The ncRNA elements in the 3’ untranslated regions (3’-UTRs) are known to participate in the genes’ post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization. Inferring co-expression patterns of the genes by clustering their 3’-UTR ncRNA elements will provide invaluable knowledge for further studies of their functionalities and interactions under specific physiological processes. v (5) We develop an ultra-efficient RNA secondary structure alignment algorithm ERA by using a sparse dynamic programming technique. Current advances of the next-generation sequencing technology have revealed a large number of un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess the biological functionalities of these RNA transcripts. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA structure-structure alignment algorithm is in urgent need to facilitate the comparative study. By using the sparse dynamic programming technique, we devised a new alignment algorithm that is as efficient as the tree-based alignment algorithms, and as accurate as the general edit-distance alignment algorithms. We implemented the new algorithm into a program called ERA (Efficient RNA Alignment). Benchmark results indicate that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy. These novel algorithms have led to the discovery of many novel RNA structural motif instances, which have significantly deepened our understanding to the RNA molecular functions. The genome-wide clustering of ncRNA elements in fly 3’-UTR has predicted a cluster of genes that are responsible for the spermatogenesis process. More importantly, these genes are very likely to be co-regulated by their common 3’-UTR elements. We anticipate that these algorithms and the corresponding software tools will significantly promote the comparative ncRNA research in the futur

Context based bioinformatics

Author: Csaba Gergely
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 10/05/2013
Field of study

The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio- logical questions. In many cases, these questions are driven by new biotechnological techniques, especially by genome and cell wide high throughput experiment studies. In principle there are two approaches: 1. Reduction and abstraction of the question to a clearly deﬁned optimization problem, which can be solved with appropriate and efﬁcient algorithms. 2. Development of context based methods, incorporating as much contextual knowledge as possible in the algorithms, and derivation of practical solutions for relevant biological ques- tions on the high-throughput data. These methods can be often supported by appropriate software tools and visualizations, allowing for interactive evaluation of the results by ex- perts. Context based methods are often much more complex and require more involved algorithmic techniques to get practical relevant and efﬁcient solutions for real world problems, as in many cases already the simpliﬁed abstraction of problems result in NP-hard problem instances. In many cases, to solve these complex problems, one needs to employ efﬁcient data structures and heuristic search methods to solve clearly deﬁned sub-problems using efﬁcient (polynomial) op- timization (such as dynamic programming, greedy, path- or tree-algorithms). In this thesis, we present new methods and analyses addressing open questions of bioinformatics from different contexts by incorporating the corresponding contextual knowledge. The two main contexts in this thesis are the protein structure similarity context (Part I) and net- work based interpretation of high-throughput data (Part II). For the protein structure similarity context Part I we analyze the consistency of gold standard structure classiﬁcation systems and derive a consistent benchmark set usable for different ap- plications. We introduce two methods (Vorolign, PPM) for the protein structure similarity recog- nition problem, based on different features of the structures. Derived from the idea and results of Vorolign, we introduce the concept of contact neighbor- hood potential, aiming to improve the results of protein fold recognition and threading. For the re-scoring problem of predicted structure models we introduce the method Vorescore, clearly improving the fold-recognition performance, and enabling the evaluation of the contact neighborhood potential for structure prediction methods in general. We introduce a contact consistent Vorolign variant ccVorolign further improving the structure based fold recognition performance, and enabling direct optimization of the neighborhood po- tential in the future. Due to the enforcement of contact-consistence, the ccVorolign method has much higher computational complexity than the polynomial Vorolign method - the cost of com- puting interpretable and consistent alignments. Finally, we introduce a novel structural alignment method (PPM) enabling the explicit modeling and handling of phenotypic plasticity in protein structures. We employ PPM for the analysis of effects of alternative splicing on protein structures. With the help of PPM we test the hypothesis, whether splice isoforms of the same protein can lead to protein structures with different folds (fold transitions). In Part II of the thesis we present methods generating and using context information for the interpretation of high-throughput experiments. For the generation of context information of molecular regulations we introduce novel textmin- ing approaches extracting relations automatically from scientiﬁc publications. In addition to the fast NER (named entity recognition) method (syngrep) we also present a novel, fully ontology-based context-sensitive method (SynTree) allowing for the context-speciﬁc dis- ambiguation of ambiguous synonyms and resulting in much better identiﬁcation performance. This context information is important for the interpretation of high-throughput data, but often missing in current databases. Despite all improvements, the results of automated text-mining methods are error prone. The RelAnn application presented in this thesis helps to curate the automatically extracted regula- tions enabling manual and ontology based curation and annotation. For the usage of high-throughput data one needs additional methods for data processing, for example methods to map the hundreds of millions short DNA/RNA fragments (so called reads) on a reference genome or transcriptome. Such data (RNA-seq reads) are the output of next generation sequencing methods measured by sequencing machines, which are becoming more and more efﬁcient and affordable. Other than current state-of-the-art methods, our novel read-mapping method ContextMap re- solves the occurring ambiguities at the ﬁnal step of the mapping process, employing thereby the knowledge of the complete set of possible ambiguous mappings. This approach allows for higher precision, even if more nucleotide errors are tolerated in the read mappings in the ﬁrst step. The consistence between context information of molecular regulations stored in databases and extracted from textmining against measured data can be used to identify and score consistent reg- ulations (GGEA). This method substantially extends the commonly used gene-set based methods such over-representation (ORA) and gene set enrichment analysis (GSEA). Finally we introduce the novel method RelExplain, which uses the extracted contextual knowl- edge and generates network-based and testable hypotheses for the interpretation of high-throughput data.Bioinformatik befasst sich mit der Entwicklung innovativer und praktisch einsetzbarer Verfahren und Algorithmen für biologische Fragestellungen. Oft ergeben sich diese Fragestellungen aus neuen Beobachtungs- und Messverfahren, insbesondere neuen Hochdurchsatzverfahren und genom- und zellweiten Studien. Im Prinzip gibt es zwei Vorgehensweisen: Reduktion und Abstraktion der Fragestellung auf ein klar definiertes Optimierungsproblem, das dann mit geeigneten möglichst effizienten Algorithmen gelöst wird. Die Entwicklung von kontext-basierten Verfahren, die möglichst viel Kontextwissen und möglichst viele Randbedingungen in den Algorithmen nutzen, um praktisch relevante Lösungen für relvante biologische Fragestellungen und Hochdurchsatzdaten zu erhalten. Die Verfahren können oft durch geeignete Softwaretools und Visualisierungen unterstützt werden, um eine interaktive Auswertung der Ergebnisse durch Fachwissenschaftler zu ermöglichen. Kontext-basierte Verfahren sind oft wesentlich aufwändiger und erfordern involviertere algorithmische Techniken um für reale Probleme, deren simplifizierende Abstraktionen schon NP-hart sind, noch praktisch relevante und effiziente Lösungen zu ermöglichen. Oft werden effiziente Datenstrukturen und heuristische Suchverfahren benötigt, die für klar umrissene Teilprobleme auf effiziente (polynomielle) Optimierungsverfahren (z.B. dynamische Programmierung, Greedy, Wege- und Baumverfahren) zurückgreifen und sie entsprechend für das Gesamtverfahren einsetzen. In dieser Arbeit werden eine Reihe von neuen Methoden und Analysen vorgestellt um offene Fragen der Bioinformatik aus verschiedenen Kontexten durch Verwendung von entsprechendem Kontext-Wissen zu adressieren. Die zwei Hauptkontexte in dieser Arbeit sind (Teil 1) die Ähnlichkeiten von 3D Protein Strukturen und (Teil 2) auf die netzwerkbasierte Interpretation von Hochdurchsatzdaten. Im Proteinstrukturkontext Teil 1 analysieren wir die Konsistenz der heute verfügbaren Goldstandards für Proteinstruktur-Klassifikationen, und leiten ein vielseitig einsetzbares konsistentes Benchmark-Set ab. Für eine genauere Bestimmung der Ähnlichkeit von Proteinstrukturen beschreiben wir zwei Methoden (Vorolign, PPM), die unterschiedliche Strukturmerkmale nutzen. Ausgehend von den für Vorolign erzielten Ergebnissen, führen wir Kontakt-Umgebungs-Potentiale mit dem Ziel ein, Fold-Erkennung (auf Basis der vorhandenen Strukturen) und Threading (zur Proteinstrukturvorhersage) zu verbessern. Für das Problem des Re-scorings von vorhergesagten Strukturmodellen beschreiben wir das Vorescore Verfahren ein, mit dem die Fold-Erkennung deutlich verbessert, aber auch die Anwendbarkeit von Potentialen im Allgemeinen getested werden kann. Zur weiteren Verbesserung führen wir eine Kontakt-konsistente Vorolign Variante (ccVorolign) ein, die wegen der neuen Konsistenz-Randbedingung erheblich aufwÃ¤ndiger als das polynomielle Vorolignverfahren ist, aber eben auch interpretierbare konsistente Alignments liefert. Das neue Strukturalignment Verfahren (PPM) erlaubt es phänotypische Plastizität, explizit zu modellieren und zu berücksichtigen. PPM wird eingesetzt, um die Effekte von alternativem Splicing auf die Proteinstruktur zu untersuchen, insbesondere die Hypothese, ob Splice-Isoformen unterschiedliche Folds annehmen können (Fold-Transitionen). Im zweiten Teil der Arbeit werden Verfahren zur Generierung von Kontextinformationen und zu ihrer Verwendung für die Interpretation von Hochdurchsatz-Daten vorgestellt. Neue Textmining Verfahren extrahieren aus wissenschaftlichen Publikationen automatisch molekulare regulatorische Beziehungen und entsprechende Kontextinformation. Neben schnellen NER (named entity recognition) Verfahren (wie syngrep) wird auch ein vollständig Ontologie-basiertes kontext-sensitives Verfahren (SynTree) eingeführt, das es erlaubt, mehrdeutige Synonyme kontext-spezifisch und damit wesentlich genauer aufzulösen. Diese für die Interpretation von Hochdurchsatzdaten wichtige Kontextinformation fehlt häufig in heutigen Datenbanken. Automatische Verfahren produzieren aber trotz aller Verbesserungen noch viele Fehler. Mithilfe unserer Applikation RelAnn können aus Texten extrahierte regulatorische Beziehungen ontologiebasiert manuell annotiert und kuriert werden. Die Verwendung aktueller Hochdurchsatzdaten benötigt zusätzliche Ansätze für die Datenprozessierung, zum Beispiel für das Mapping von hunderten von Millionen kurzer DNA/RNA Fragmente (sog. reads) auf Genom oder Transkriptom. Diese Daten (RNA-seq) ergeben sich durch next generation sequencing Methoden, die derzeit mit immer leistungsfähigeren Geräten immer kostengünstiger gemessen werden können. In der ContextMap Methode werden im Gegensatz zu state-of-the-art Verfahren die auftretenden Mehrdeutigkeiten erst am Ende des Mappingprozesses aufgelöst, wenn die Gesamtheit der Mappinginformationen zur Verfügung steht. Dadurch könenn mehr Fehler beim Mapping zugelassen und trotzdem höhere Genauigkeit erreicht werden. Die Konsistenz zwischen der Kontextinformation aus Textmining und Datenbanken sowie den gemessenen Daten kann dann für das Auffinden und Bewerten von konsistente Regulationen (GGEA) genutzt werden. Dieses Verfahren stellt eine wesentliche Erweiterung der häufig verwendeten Mengen-orientierten Verfahren wie overrepresentation (ORA) und gene set enrichment analysis (GSEA) dar. Zuletzt stellen wir die Methode RelExplain vor, die aus dem extrahierten Kontextwissen netzwerk-basierte, testbare Hypothesen für die Erklärung von Hochdurchsatzdaten generiert

Digitale Hochschulschriften der LMU

The identification and characterisation of the causative gene mutation for keratolytic winter erythema (KWE) in South African families

Author: Ngcungcu Thandiswa
Publication venue
Publication date: 01/01/2017
Field of study

A thesis submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in fulfilment for the degree of Doctor of Philosophy Johannesburg, 2017Keratolytic winter erythema (KWE) is a rare autosomal dominant skin disorder characterized by recurrent episodes of palmoplantar erythema and epidermal peeling, and symptoms worsen in winter. KWE is relatively common in South African (SA) Afrikaners and was mapped to 8p23.1-p22 through a common haplotype in SA families. The aim of this study was to identify and characterize the causal mutation for KWE in SA families. Targeted resequencing of 8p23.1-22 was performed in three families and seven unrelated controls. Reads were aligned to the reference genome using BWA. GATK and Pindel were used to call small and large structural variants, respectively. A 7.67 kb tandem duplication was identified upstream of the CTSB gene and encompassing an enhancer element that is active in a keratinocytes (based on H3K27ac data). The tandem duplication segregated completely with the KWE. The tandem duplication overlaps with a 15.93 kb tandem duplication identified in two Norwegian families at a 2.62 kb region encompassing the active enhancer suggesting that the duplication of the enhancer leads to the KWE phenotype. Existing chromatin structure, CTCF binding and chromatin interaction data from several cell lines, including keratinocytes were analysed and three potential topological subdomains were identified, all containing the enhancer and CTSB, or CTSB and FDFT1 or both genes and NEIL2. Additionally, we showed that the enhancer’s activity correlated with CTSB expression, but not with FDFT1 and NEIL2 expression in differentiating keratinocytes and other cell lines. RNA polymerase II ChIA-PET interaction data in cancer cell lines showed that the enhancer interacts with CTSB but not FDFT1 or NEIL2. These data suggest that the enhancer normally regulates CTSB expression. Relative gene expression and immunohistochemistry from palmar biopsies from South African and Norwegian participants (7 Affected and 7 Controls) showed a significantly higher expression of CTSB, but not FDFT1 and NEIL2, in affected individuals compared to the controls and that CTSB was significantly more abundant in the granular layer of affected individuals compared to controls. We conclude that the enhancer duplication causes KWE by upregulating CTSB expression and causing an overabundance of CTSB in the granular layer of the epidermis.MT201

Single-molecule experiments in biological physics: methods and applications

Author: Adachi K
Adelman K
Ahsan A
Alberts B
Allahverdyan A
Allemand J F
Altan-Bonnet G
Ashkin A
Ashkin A
Ashkin A
Ashkin A
Ashkin A
Ashkin A
Astumian R D
Astumian R D
Baumann C G
Baumann C G
Bell G I
Bennett C H
Bennink M L
Bennink M L
Bensimon D
Best R B
Bhasin N
Bhattacharjee S M
Bhattacharjee S M
Binnig G
Binnig G
Blanchard S C
Blickle V
Block S M
Block S M
Bloomfield V A
Bochkov G N
Bockelmann U
Boroudjerdi H
Bouchiat C
Braslavsky I
Brower-Toland B D
Brujic J
Bryant Z
Bundschuh R
Burgess S A
Bustamante C
Bustamante C
Bustamante C
Bustamante C
Calladine C R
Carl P
Carrion-Vazquez M
Carrion-Vazquez M
Carrion-Vazquez M
Carter N J
Carter N J
Causo M S
Charvin G
Chatenay D
Chen S-J
Ciliberto S
Clausen-Schaumann H
Cluzel P
Cocco S
Cocco S
Cocco S
Cocco S
Cocco S
Cohen E G D
Cohen E G D
Cohen E G D
Crooks G E
Crooks G E
Cross R A
Cuesta-Lopez S
Cui Y
Cui Y
Cule D
Czajkowsky D M
Córdova N J
Daban J R
Dammer U
Dammer U
Danilowicz C
Davenport R J
De Duve C
Dekker N H
Deng H
Discher D E
Dohoney K M
Douarche F
Eisenberg E
Engel A
Essevaz-Roulet B
Evans D J
Evans E
Evans E
Evans E
Evans E
Evans E
F Ritort
Fain B
Fernandez J M
Fernandez J M
Fernandez J M
Finer J T
Finkelstein A V
Finzi L
Fisher M E
Florin E L
Flyvbjerg H
Forde N R
Forkey J N
Frank-Kamentskii M D
Frechet J M
Fritz J
Fréchet J M J
Gallavotti G
Garces-Chavez V
Garcia-Manyes S
Gelles J
Gelles J
Gelles J
Gergely C
Gerland U
Gerland U
Gerland U
Gestland R F
Goel A
Goel A
Gordon M P
Gore J
Gore J
Gosse C
Grange W
Green N H
Grier D G
Gross D H E
Ha T
Ha T
Hafner J H
Hanggi P
Hanke A
Hanley W
Hansen J C
Haran G
Harlepp S
Herschlag D
Higgs P G
Hill T L
Hill T L
Hinterdorfer P
Howard J
Hummer G
Hummer G
Hwa T
Hyeon C
Inoue A
Isralewitz B
Jarzynski C
Jarzynski C
Jarzynski C
Jarzynski C
Jensen M O
Jeruzalmi D
Junier I Ritort F
Jülicher F
Kafri Y
Katritch V
Kawakami M
Keller D
Keller D
Kellermayer M S K
Kellermayer M S
Kim H D
Kindt J
Kitamura K
Klimov D K
Klug A
Konrad M W
Kosikov K M
Kosztin I
Kramers H A
Kukowska-Latallo J F
Kunze K-K
Kurchan J
Kühner F
Ladoux B
Lang M J
Lang M J
Law R
Lebowitz J L
Lebrun A
Lee G U
Lee G U
Leger J F
Leuba S H
Leuva S H
Lhua R C
Li P T X Bustamante C Tinoco I
Liphardt J
Liu F
Lu H
Léger J F
Maes C
Maes C
Maier B
Manosas M
Manosas M Wen J D Li P T X Smith S B Bustamante C Tinoco I Ritort F
Marko J
Marko J
Marko J F
Marszalek P E
Marszalek P E
Marszalek P E
Masaike T
Mathews D H
Mazonka O Jarzynski C
McGurn A R
Merkel R
Min W
Moffitt J R
Molloy J E
Moy V T
Muller D J
Munoz V
Müller D J
Müller M
Müller M
Nayashima M
Nelson D R
Nguyen T T Grossberg A Y Shklovskii B I Holm C Kekicheff P Podgornik R
Noji H
Noy A
Oberhauser A F
Onoa B
Onoa B
Onuchic J N
Oono Y
Orland H
Orlandini E
Orphanides G
Paci E
Pagnani A
Pan J
Pant K
Park S
Pauling L
Perret E
Peyrard M
Phillips W D
Poirier M G
Poland D
Poland D
Purohit P K
Qian H
Qu X
Rief M
Rief M
Rief M
Rief M
Rief M
Rief M
Ritort F
Ritort F
Ritort F
Ritort F
Ros R
Rouzina I
Rouzina I
Rubinstein M
Saleh O A
Sarkar A
Schiessel H
Schiessel H
Schlierf M
Schroeder R
Schumakovitch I
Schurr J M
Schwesinger F
Seifert U
Sellers J R
Shaevitz J W
Smith B L
Smith D E
Smith S B
Soong R K
Sosnick T R
Speck T
Spudich A
Strick T
Strick T R
Strick T R
Strick T R
Strunz T
Svoboda K
Tanaka H
Thirumalai D
Thurmond K B
Tinoco I
Tinoco I
Tinoco I
Tinoco I
Tinoco I
Treiber D K
Treiber D K
Trepagnier E H
Tsallis C
Uemura S
Uptain S M
Valpuesta J M
Vanzi F
Veigel C
Veigel C
Visscher K
Vologodskii A
Wang M D
Wang M D
Wartell R M
Watson J D
Watson J D
Wen J D Manosas M Li P T X Smith S B Bustamante C Ritort F Tinoco I
Wenner J R
Widom J
Wiita A P
Wilkins M H F
Williams M C
Williams M C
Williams M C
Williams P M
Williams P M
Wood R H
Woodcock C L
Wuite G J L
Xie Z
Yager T D
Yang G
Yasuda R
Yasuda R
Yildiz A
Yildiz A
Yin H
Zandi R
Zarrinkar P
Zhou H
Zhuang X
Zuckermann D M
Zuker M
Zwanzig R W
Publication venue: 'IOP Publishing'
Publication date: 15/09/2006
Field of study

I review single-molecule experiments (SME) in biological physics. Recent technological developments have provided the tools to design and build scientific instruments of high enough sensitivity and precision to manipulate and visualize individual molecules and measure microscopic forces. Using SME it is possible to: manipulate molecules one at a time and measure distributions describing molecular properties; characterize the kinetics of biomolecular reactions and; detect molecular intermediates. SME provide the additional information about thermodynamics and kinetics of biomolecular processes. This complements information obtained in traditional bulk assays. In SME it is also possible to measure small energies and detect large Brownian deviations in biomolecular reactions, thereby offering new methods and systems to scrutinize the basic foundations of statistical mechanics. This review is written at a very introductory level emphasizing the importance of SME to scientists interested in knowing the common playground of ideas and the interdisciplinary topics accessible by these techniques. The review discusses SME from an experimental perspective, first exposing the most common experimental methodologies and later presenting various molecular systems where such techniques have been applied. I briefly discuss experimental techniques such as atomic-force microscopy (AFM), laser optical tweezers (LOT), magnetic tweezers (MT), biomembrane force probe (BFP) and single-molecule fluorescence (SMF). I then present several applications of SME to the study of nucleic acids (DNA, RNA and DNA condensation), proteins (protein-protein interactions, protein folding and molecular motors). Finally, I discuss applications of SME to the study of the nonequilibrium thermodynamics of small systems and the experimental verification of fluctuation theorems. I conclude with a discussion of open questions and future perspectives.Comment: Latex, 60 pages, 12 figures, Topical Review for J. Phys. C (Cond. Matt

arXiv.org e-Print Archive

Expression and structural studies of multidomain proteins and complexes

Author: Chamberlain Dean
Publication venue: UCL (University College London)
Publication date: 01/01/1998
Field of study

It is generally accepted that there is a level of organization in proteins that overlaps the classical definitions of tertiary and quaternary structure, i.e. sequentially consecutive residues in polypeptide chains fold into distinct compact regions called domains. Many multidomain proteins are flexible and are not amenable to X-ray crystallography or are too big for multi dimensional nuclear magnetic resonance techniques, while other proteins form oligomeric structures from subunits. It is possible using small-angle X-ray and neutron scattering, coupled with molecular modelling techniques, to locate the relative positions of these domains or subunits relative to each other within the full protein structure. This PhD thesis has looked at a variety of native and recombinant oligomeric proteins and domains and attempts have been made to produce low resolution structures of their oligomerisation or their multidomain structures. Expression systems used include a Pseudomonas aeruginosa over-expression system and the baculovirus expression system. One multidomain protein was studied, namely factor I of the complement system. Two forms of factor I were studied, a native form purified from human plasma, and a recombinant form produced in insect cells. Scattering modelling was used to elucidate a bilobal domain arrangement in factor I, in which the different types of carbohydrate present on the two different forms could be modelled. The quaternary structures of two complexes were determined, namely the homo- oligomeric complexes of the Ps. aeruginosa amidase regulatory protein, AmiC, and the Mycobacterium leprae Holliday junction protein, RuvA. It was determined that in solution AmiC exists as a monomer-trimer equilibrium, and that RuvA adopts an octameric structure, both when lice and when complexed with DNA, within which the Holliday junction is buried in the RuvA-DNA complex

PROTEIN FUNCTION, DIVERISTY AND FUNCTIONAL INTERPLAY

Author: Khan Ishita Kamal
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Functional annotations of novel or unknown proteins is one of the central problems in post-genomics bioinformatics research. With the vast expansion of genomic and proteomic data and technologies over the last decade, development of automated function prediction (AFP) methods for large-scale identification of protein function has be-come imperative in many aspects. In this research, we address two important divergences from the “one protein – one function” concept on which all existing AFP methods are developed