79 research outputs found

    Mapping glycoprotein structure reveals defining events in the evolution of the Flaviviridae

    Get PDF
    Viral glycoproteins drive membrane fusion in enveloped viruses and determine host range, tissue tropism and pathogenesis. Despite their importance, there is a fragmentary understanding of glycoproteins within the Flaviviridae; for many species the glycoproteins have not yet been identified, for others, such as the hepaciviruses, the molecular mechanisms of membrane fusion remain uncharacterised. Here, we combine comprehensive phylogenetic analyses with systematic protein structure prediction to survey glycoproteins across the entire Flaviviridae. We discover class-II fusion systems, homologous to the orthoflavivirus E glycoprotein, in most species, including highly-divergent jingmenviruses and large genome flaviviruses. However, the E1E2 glycoproteins of the hepaci-, pegi- and pestiviruses are structurally distinct, may represent a novel class of fusion mechanism, and are strictly associated with infection of vertebrate hosts. By mapping glycoprotein distribution onto the underlying phylogeny we reveal a complex history of evolutionary events that have shaped the diverse virology and ecology of the Flaviviridae

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Kern-basierte Lernverfahren fĆ¼r das virtuelle Screening

    Get PDF
    We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.Wir untersuchen moderne Kern-basierte maschinelle Lernverfahren fĆ¼r das Liganden-basierte virtuelle Screening. Insbesondere entwickeln wir einen neuen Graphkern auf Basis iterativer GraphƤhnlichkeit und optimaler Knotenzuordnungen, setzen die Kernhauptkomponentenanalyse fĆ¼r Projektionsfehler-basiertes Novelty Detection ein, und beschreiben die Entdeckung eines neuen selektiven Agonisten des Peroxisom-Proliferator-aktivierten Rezeptors gamma mit Hilfe von GauƟ-Prozess-Regression. Virtuelles Screening ist die rechnergestĆ¼tzte Priorisierung von MolekĆ¼len bezĆ¼glich einer vorhergesagten Eigenschaft. Es handelt sich um ein Problem der Chemieinformatik, das in der Trefferfindungsphase der Medikamentenentwicklung auftritt. Seine Liganden-basierte Variante beruht auf dem Ƅhnlichkeitsprinzip, nach dem (strukturell) Ƥhnliche MolekĆ¼le tendenziell Ƥhnliche Eigenschaften haben. In unserer Beschreibung des Lƶsungsansatzes mit Kern-basierten Lernverfahren betonen wir die Bedeutung molekularer ReprƤsentationen, einschlieƟlich der auf ihnen definierten (Un)ƤhnlichkeitsmaƟe. Wir untersuchen Effekte in hochdimensionalen chemischen DeskriptorrƤumen, ihre Auswirkungen auf Ƅhnlichkeits-basierte Verfahren und geben einen LiteraturĆ¼berblick zu Empfehlungen zur retrospektiven Validierung, einschlieƟlich eines Beispiel-Workflows. Graphkerne sind formale ƄhnlichkeitsmaƟe, die inneren Produkten entsprechen und direkt auf Graphen, z.B. annotierten molekularen Strukturgraphen, definiert werden. Wir geben einen LiteraturĆ¼berblick Ć¼ber Graphkerne, insbesondere solche, die auf zufƤlligen Irrfahrten, Subgraphen und optimalen Knotenzuordnungen beruhen. Indem wir letztere mit einem Ansatz zur iterativen GraphƤhnlichkeit kombinieren, entwickeln wir den iterative similarity optimal assignment Graphkern. Wir beschreiben einen iterativen Algorithmus, zeigen dessen Konvergenz sowie die Eindeutigkeit der Lƶsung, und geben eine obere Schranke fĆ¼r die Anzahl der benƶtigten Iterationen an. In einer retrospektiven Studie zeigte unser Graphkern konsistent bessere Ergebnisse als chemische Deskriptoren und andere, auf optimalen Knotenzuordnungen basierende Graphkerne. Chemische DatensƤtze liegen oft auf Mannigfaltigkeiten niedrigerer DimensionalitƤt als der umgebende chemische Deskriptorraum. Dimensionsreduktionsmethoden erlauben die Identifikation dieser Mannigfaltigkeiten und stellen dadurch deskriptive Modelle der Daten zur VerfĆ¼gung. FĆ¼r spektrale Methoden auf Basis der Kern-Hauptkomponentenanalyse ist der Projektionsfehler ein quantitatives MaƟ dafĆ¼r, wie gut neue Daten von solchen Modellen beschrieben werden. Dies kann zur Identifikation von MolekĆ¼len verwendet werden, die strukturell unƤhnlich zu den Trainingsdaten sind, und erlaubt so Projektionsfehler-basiertes Novelty Detection fĆ¼r virtuelles Screening mit ausschlieƟlich positiven Beispielen. Wir fĆ¼hren eine Machbarkeitsstudie zur Lernbarkeit des Konzepts von FettsƤuren durch die Hauptkomponentenanalyse durch. Der Peroxisom-Proliferator-aktivierte Rezeptor (PPAR) ist ein im Zellkern vorkommender Rezeptor, der den Fett- und Zuckerstoffwechsel reguliert. Er spielt eine wichtige Rolle in der Entwicklung von Krankheiten wie Typ-2-Diabetes und DyslipidƤmie. Wir etablieren ein GauƟ-Prozess-Regressionsmodell fĆ¼r PPAR gamma-Agonisten mit chemischen Deskriptoren und unserem Graphkern durch gleichzeitiges Lernen mehrerer Kerne. Das Screening einer kommerziellen Substanzbibliothek und die anschlieƟende Testung 15 ausgewƤhlter Substanzen in einem Zell-basierten Transaktivierungsassay ergab vier aktive Substanzen. Eine davon, ein Naturstoff mit Cyclobutan-GrundgerĆ¼st, ist ein voller selektiver PPAR gamma-Agonist (EC50 = 10 +/- 0,2 muM, inaktiv auf PPAR alpha und PPAR beta/delta bei 10 muM). Unsere Studie liefert einen neuen PPAR gamma-Agonisten, legt den Wirkmechanismus eines bioaktiven Naturstoffs offen, und erlaubt RĆ¼ckschlĆ¼sse auf die NaturstoffursprĆ¼nge von Pharmakophormustern in synthetischen Liganden

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    A computational framework for transcriptome assembly and annotation in non-model organisms: the case of venturia inaequalis

    Get PDF
    Philosophiae Doctor - PhDIn this dissertation three computational approaches are presented that enable optimization of reference-free transcriptome reconstruction. The first addresses the selection of bona fide reconstructed transcribed fragments (transfrags) from de novo transcriptome assemblies and annotation with a multiple domain co-occurrence framework. We showed that selected transfrags are functionally relevant and represented over 94% of the information derived from annotation by transference. The second approach relates to quality score based RNA-seq sub-sampling and the description of a novel sequence similarity-derived metric for quality assessment of de novo transcriptome assemblies. A detail systematic analysis of the side effects induced by quality score based trimming and or filtering on artefact removal and transcriptome quality is describe. Aggressive trimming produced incomplete reconstructed and missing transfrags. This approach was applied in generating an optimal transcriptome assembly for a South African isolate of V. inaequalis. The third approach deals with the computational partitioning of transfrags assembled from RNA-Seq of mixed host and pathogen reads. We used this strategy to correct a publicly available transcriptome assembly for V. inaequalis (Indian isolate). We binned 50% of the latter to Apple transfrags and identified putative immunity transcript models. Comparative transcriptomic analysis between fungi transfrags from the Indian and South African isolates reveal effectors or transcripts that may be expressed in planta upon morphogenic differentiation. These studies have successfully identified V. inaequalis specific transfrags that can facilitate gene discovery. The unique access to an in-house draft genome assembly allowed us to provide preliminary description of genes that are implicated in pathogenesis. Gene prediction with bona fide transfrags produced 11,692 protein-coding genes. We identified two hydrophobin-like genes and six accessory genes of the melanin biosynthetic pathway that are implicated in the invasive action of the appressorium. The cazyome reveals an impressive repertoire of carbohydrate degrading enzymes and carbohydrate-binding modules amongst which are six polysaccharide lyases, and the largest number of carbohydrate esterases (twenty-eight) known in any fungus sequenced to dat

    Identification and characterization of a novel structural protein of porcine reproductive and respiratory syndrome virus, the replicase nonstructural protein 2

    Get PDF
    Porcine reproductive and respiratory syndrome virus (PRRSV) is a rapidly mutating pathogen eliciting respiratory and reproductive disease of high economic consequence. The PRRSV non-structural protein 2 (nsp2) is a large multifunctional protein encoded by the most genetically diverse region of the genome - the selective pressure potentiating mutation within this region is unknown. Here we report the identification of a unique function of nsp2 as a structural component of the PRRSV virion; the first PRRSV structural protein identified which is not expressed from a sub-genomic RNA or regulated via the discontinuous transcription pathway. Through the use of a set of custom Ć”-nsp2 antibodies nsp2 was identified on the surface of the PRRSV virion by immunoelectron microscopy. Further, a class of nsp2 isoforms was defined to be packaged within or upon the PRRSV virion. Nsp2 packaging was found to be conserved across a panel of highly divergent stains of PRRSV including the genotype 1 and genotype 2 prototype strains as well as contemporary and highly-pathogenic isolates. Next the hydrophobic domain of nsp2 was characterized as a putative multi-pass transmembrane domain predicted to facilitate nsp2 packaging through association with the viral envelope. Within an in vitro cell-free translation system nsp2 was found to strongly associate with canine microsomal membranes. Through high-speed ultracentrifugation, protease protection assay, and immunoprecipitation nsp2 was defined as an integral membrane protein and additionally identified to display an unexpected N-terminal cytoplasmic / C-terminal luminal topological orientation. Finally, membrane isolation demonstrated two sub-dominant nsp2 isoforms of approximately 117 and 106 kDa in size and of unknown composition or function were enriched within membranes. Together, these results define previously unknown attributes of nsp2. Identification of nsp2 as a structural protein implicates it in previously unpredicted functions related to attachment, entry, or early replication events and further provides rationale for the high mutation rate and robust adaptive immune response targeting the nsp2 protein. Characterization of the nsp2 transmembrane domain demonstrates its role as an integral membrane protein and additionally raises new questions related to the unexpected topological orientation or enrichment of select isoforms within the membrane fraction

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    A complex systems approach to education in Switzerland

    Get PDF
    The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance

    Untangling hotel industryā€™s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    • ā€¦
    corecore