Search CORE

41 research outputs found

Theories, models, simulations: a computational challenge

Author: Rossi G. C.
Publication venue
Publication date: 01/01/2006
Field of study

In this talk I would like to illustrate with examples taken from Quantum Field Theory and Biophysics how an intelligent exploitation of the unprecedented power of today's computers could led not only to the solution of pivotal problems in the theory of Strong Interactions, but also to the emergence of new lines of interdisciplinary research, while at the same time pushing the limits of modeling to the realm of living systems.Comment: 19 pages, 1 figure, conference pape

arXiv.org e-Print Archive

CiteSeerX

ART

CERN Document Server

ZERO-KNOWLEDGE DE NOVO ALGORITHMS FOR ANALYZING SMALL MOLECULES USING MASS SPECTROMETRY

Author: Kreitzberg Patrick Anthony
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/2019
Field of study

In the analysis of mass spectra, if a superset of the molecules thought to be in a sample is known a priori, then there are well established techniques for the identification of the molecules such as database search and spectral libraries. Linear molecules are chains of subunits. For example, a peptide is a linear molecule with an “alphabet” of 20 possible amino acid subunits. A peptide of length six will have 206 = 64, 000, 000 different possible outcomes. Small molecules, such as sugars and metabolites, are not constrained to linear structures and may branch. These molecules are encoded as undirected graphs rather than simply linear chains. An undirected graph with six subunits (each of which have 20 possible outcomes) will 6 have 206 · 2(6 choose 2) = 2, 097, 152, 000, 000 possible outcomes. The vast amount of complex graphs which small molecules can form can render databases and spectral libraries impossibly large to use or incomplete as many metabolites may still be unidentified. In the absence of a usable database or spectral library, an the alphabet of subunits may be used to connect peaks in the fragmentation spectra; each connection represents a neutral loss of an alphabet mass. This technique is called “de novo sequencing” and relies on the alphabet being known in advance. Often the alphabet of m/z difference values allowed by de novo analysis is not known or is incomplete. A method is proposed that, given fragmentation mass spectra, identifies an alphabet of m/z differences that can build large connected graphs from many intense peaks in each spectrum from a collection. Once an alphabet is obtained, it is informative to find common substructures among the peaks connected by the alphabet. This is the same as finding the largest isomorphic subgraphs on the de novo graphs from all pairs of fragmentation spectra. This maximal subgraph isomorphism problem is a generalization of the subgraph isomorphism problem, which asks whether a graph G1 has a subgraph isomorphic to a graph G2 . Subgraph isomorphism is NP-complete. A novel method of efficiently finding common substructures among the subspectra induced by the alphabet is proposed. This method is then combined with a novel form of hashing, eschewing evaluation of all pairs of fragmentation spectra. These methods are generalized to Euclidean graphs embedded in Zn

University of Montana

Recommended from our members

Modelling timing in blood cancers

Author: Talarmain Laure
Publication venue: University of Cambridge
Publication date: 16/12/2020
Field of study

Dysregulation of biological processes in normal cells can lead to the abnormal growth of tumours. Oncogenesis requires the acquisition of advantageous mutations to expand in a fluctuating environment. Cancer cells gain these genetic and epigenetic alterations at different timing in their development, resulting in the formation of heterogeneous cell populations which interact and compete with each others inside tumours. At later stages, by escaping the immune system and acquiring malignant properties, some cancer cells manage to evade the primary tumour and spread in different organs to form metastases. Hence, tumour development in healthy tissues endure several biological changes whilst progressing and the order between these molecular and cellular events may modify prognosis. This thesis addresses the influence of biological event timing on blood cancer progression and clinical outcomes. It first investigates the therapeutic efficacy of p53 restoration in a lymphoma mouse model. While several therapy schedules are tested, all fail due to resistance emergence. Computational modelling establishes the cell dynamics in these tumours and how to use it to propose alternative treatment strategies. Data availability leads this work to explore the impact of molecular evolution in myeloid malignancies. Notably, one study has found that Myeloproliferative Neoplasms patients with both JAK2 and TET2 mutations have different disease characteristics with distinct mutation order. My analyses identify HOXA9 as a potential prognosis marker and biological switch responsible for patient stratification in these patients and in Acute Myeloid Leukemia. Additionally, a molecular network identifies the hematopoietic regulators involved in the branching evolution of Myeloproliferative Neoplasms. Further investigations of the Acute Myeloid Leukemia data show the possible involvement of APP, a gene associated to Alzheimer disease, in early cell fate commitment in hematopoiesis and in poor survival prognosis in undifferentiated leukemia when lowly expressed. Finally, this thesis examines the regulatory dynamics behind three clusters of Acute Myeloid Leukemia patients with distinct levels of HOXA9 and APP expression. By building a program inferring molecular motifs from biological observations, genes which may interact with HOXA9 and APP are identified.Microsoft Research and the MRC Cancer Unit

Apollo (Cambridge)

Analytical, Theoretical and Empirical Advances in Genome-Scale Algorithmics

Author: Wang Kai
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2015
Field of study

Ever-increasing amounts of complex biological data continue to come on line daily. Examples include proteomic, transcriptomic, genomic and metabolomic data generated by a plethora of high-throughput methods. Accordingly, fast and effective data processing techniques are more and more in demand. This issue is addressed in this dissertation through an investigation of various algorithmic alternatives and enhancements to routine and traditional procedures in common use. In the analysis of gene co-expression data, for example, differential measures of entropy and variation are studied as augmentations to mere differential expression. These novel metrics are shown to help elucidate disease-related genes in wide assortments of case/control data. In a more theoretical spirit, limits on the worst-case behavior of density based clustering methods are studied. It is proved, for instance, that the well-known paraclique algorithm, under proper tuning, can be guaranteed never to produce subgraphs with density less than 2/3. Transformational approaches to efficient algorithm design are also considered. Classic graph search problems are mapped to and from well-studied versions of satisfiability and integer linear programming. In so doing, regions of the input space are classified for which such transforms are effective alternatives to direct graph optimizations. In all these efforts, practical implementations are emphasized in order to advance the boundary of effective computation

University of Tennessee, Knoxville: Trace

A forensics software toolkit for DNA steganalysis.

Author: Beck Marc Bjoern
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2015
Field of study

Recent advances in genetic engineering have allowed the insertion of artificial DNA strands into the living cells of organisms. Several methods have been developed to insert information into a DNA sequence for the purpose of data storage, watermarking, or communication of secret messages. The ability to detect, extract, and decode messages from DNA is important for forensic data collection and for data security. We have developed a software toolkit that is able to detect the presence of a hidden message within a DNA sequence, extract that message, and then decode it. The toolkit is able to detect, extract, and decode messages that have been encoded with a variety of different coding schemes. The goal of this project is to enable our software toolkit to determine with which coding scheme a message has been encoded in DNA and then to decode it. The software package is able to decode messages that have been encoded with every variation of most of the coding schemes described in this document. The software toolkit has two different options for decoding that can be selected by the user. The first is a frequency analysis approach that is very commonly used in cryptanalysis. This approach is very fast, but is unable to decode messages shorter than 200 words accurately. The second option is using a Genetic Algorithm (GA) in combination with a Wisdom of Artificial Crowds (WoAC) technique. This approach is very time consuming, but can decode shorter messages with much higher accuracy

University of Louisville

Synthesis of Scientific Workflows: Theory and Practice of an Instance-Aware Approach

Author: Kasalica Vedran
Publication venue: 'The Graduate School of the Humanities, Utrecht University'
Publication date: 21/11/2022
Field of study

The last two decades brought an explosion of computational tools and processes in many scientific domains (e.g., life-, social- and geo-science). Scientific workflows, i.e., computational pipelines, accompanied by workflow management systems, were soon adopted as a de-facto standard among non-computer scientists for orchestrating such computational processes. The goal of this dissertation is to provide a framework that would automate the orchestration of such computational pipelines in practice. We refer to such problems as scientific workflow synthesis problems. This dissertation introduces the temporal logic SLTLx, and presents a novel SLTLx-based synthesis approach that overcomes limitations in handling data object dependencies present in existing synthesis approaches. The new approach uses transducers and temporal goals, which keep track of the data objects in the synthesised workflow. The proposed SLTLx-based synthesis includes a bounded and a dynamic variant, which are shown in Chapter 3 to be NP-complete and PSPACE-complete, respectively. Chapter 4 introduces a transformation algorithm that translates the bounded SLTLx-based synthesis problem into propositional logic. The transformation is implemented as part of the APE (Automated Pipeline Explorer) framework, presented in Chapter 5. It relies on highly efficient SAT solving techniques, using an off-the-shelf SAT solver to synthesise a solution for the given propositional encoding. The framework provides an API (application programming interface), a CLI (command line interface), and a web-based GUI (graphical user interface). The development of APE was accompanied by four concrete application scenarios as case studies for automated workflow composition. The studies were conducted in collaboration with domain experts and presented in Chapter 6. Each of the case studies is used to assess and illustrate specific features of the SLTLx-based synthesis approach. (1) A case study on cartographic map generation demonstrates the ability to distinguish data objects as a key feature of the framework. It illustrates the process of annotating a new domain, and presents the iterative workflow synthesis approach, where the user tries to narrow down the desired specification of the problem in a few intuitive steps. (2) A case study on geo-analytical question answering as part of the QuAnGIS project shows the benefits of using data flow dependencies to describe a synthesis problem. (3) A proteomics case study demonstrates the usability of APE as an “off-the-shelf” synthesiser, providing direct integration with existing semantic domain annotations. In addition, a manual evaluation of the synthesised results shows promising results even on large real-life domains, such as the EDAM ontology and the complete bio.tools registry. (4) A geo-event question-answering study demonstrates the usability of APE within a larger question-answering system. This dissertation answers the goals it sets to solve. It provides a formal framework, accompanied by a lightweight library, which can solve real-life scientific workflow synthesis problems. Finally, the development of the library motivated an upcoming collaborative project in the life sciences domain. The aim of the project is to develop a platform which would automatically compose (using APE) and benchmark workflows in computational proteomics

Utrecht University Repository

ABSTRACTS

Author
Publication venue: OxyScholar
Publication date
Field of study

Occidental College Scholar

Pertanika Journal of Science & Technology

Author: Universiti Putra Malaysia Press
Publication venue: Universiti Putra Malaysia Press
Publication date: 01/01/2020
Field of study

Universiti Putra Malaysia Institutional Repository