386 research outputs found

    Molecular Distance Maps: An alignment-free computational tool for analyzing and visualizing DNA sequences\u27 interrelationships

    Get PDF
    In an attempt to identify and classify species based on genetic evidence, we propose a novel combination of methods to quantify and visualize the interrelationships between thousand of species. This is possible by using Chaos Game Representation (CGR) of DNA sequences to compute genomic signatures which we then compare by computing pairwise distances. In the last step, the original DNA sequences are embedded in a high dimensional space using Multi-Dimensional Scaling (MDS) before everything is projected on a Euclidean 3D space. To start with, we apply this method to a mitochondrial DNA dataset from NCBI containing over 3,000 species. The analysis shows that the oligomer composition of full mtDNA sequences can be a source of taxonomic information, suggesting that this method could be used for unclassified species and taxonomic controversies. Next, we test the hypothesis that CGR-based genomic signature is preserved along a species\u27 genome by comparing inter- and intra-genomic signatures of nuclear DNA sequences from six different organisms, one from each kingdom of life. We also compare six different distances and we assess their performance using statistical measures. Our results support the existence of a genomic signature for a species\u27 genome at the kingdom level. In addition, we test whether CGR-based genomic signatures originating only from nuclear DNA can be used to distinguish between closely-related species and we answer in the negative. To overcome this limitation, we propose the concept of ``composite signatures\u27\u27 which combine information from different types of DNA and we show that they can effectively distinguish all closely-related species under consideration. We also propose the concept of ``assembled signatures\u27\u27 which, among other advantages, do not require a long contiguous DNA sequence but can be built from smaller ones consisting of ~100-300 base pairs. Finally, we design an interactive webtool MoDMaps3D for building three-dimensional Molecular Distance Maps. The user can explore an already existing map or build his/her own using NCBI\u27s accession numbers as input. MoDMaps3D is platform independent, written in Javascript and can run in all major modern browsers

    Graphical Representation of Biological Sequences

    Get PDF
    Sequence comparison is one of the most fundamental tasks in bioinformatics. For biological sequence comparison, alignment is the most profitable method when the sequence lengths are not so large. However, as the time complexity of the alignment is the square order of the sequence length, the alignment requires a large amount of computational time for comparison of sequences of large size. Therefore, so-called alignment-free sequence comparison methods are needed for comparison between such as whole genome sequences in practical time. In this chapter, we reviewed the graphical representation of biological sequences, which is one of the major alignment-free sequence comparison methods. The notable effects of weighting during the course of the graphical representation introduced first by the author and co-workers were also mentioned

    Numerical representations of protein sequences for classification

    Get PDF
    S rozmachem bioinformatiky vyvstala možnost analyzovat a~srovnávat i~rozsáhlé soubory nejen genomických, ale i~proteomických sekvencí. Byla tedy nutnost zavést numerické reprezentace sekvencí pro jejich počítačové zpracování. Reprezentace proteinových sekvencí má svá specifika a~často vyšší výpočetní náročnost, než reprezentace genomických sekvencí. V~práci je představeno několik různých metod přístupu k~numerickým reprezentacím proteinů. Vybrané metody jsou poté testovány na setu mitochondriálně kódovaných proteinů a srovnány se standardní taxonomií a s běžně používanou symbolickou reprezentací.Todays we have the opportunity to analyze huge sets of genomics and proteomics data. In my bachaleor thesis I introduce a few numerical alternatives to represent proteins. The usage of numerical representations opened the way to analyze proteomics data as digital signals, which bring us quantity of new possibilities how to process the protein. In my thesis I compare a few numerical representation with standard taxonomy and with symbolic representation too.

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Fast scalable visualization techniques for interactive billion-particle walkthrough

    Get PDF
    This research develops a comprehensive framework for interactive walkthrough involving one billion particles in an immersive virtual environment to enable interrogative visualization of large atomistic simulation data. As a mixture of scientific and engineering approaches, the framework is based on four key techniques: adaptive data compression based on space-filling curves, octree-based visibility and occlusion culling, predictive caching based on machine learning, and scalable data reduction based on parallel and distributed processing. In terms of parallel rendering, this system combines functional parallelism, data parallelism, and temporal parallelism to improve interactivity. The visualization framework will be applicable not only to material simulation, but also to computational biology, applied mathematics, mechanical engineering, and nanotechnology, etc

    Heavy-metal resistance in Marinobacter aquaeolei 617 insights into copper resistance

    Get PDF
    Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em BiotecnologiaHeavy metal resistance in Marinobacter aquaeolei (Ma.aq) 617 in aerobic conditions was studied for three different ions, cadmium, cobalt and copper. The main aim of this work was the study of a putative copper resistance operon, copSRXAB, located in the chromosome of Marinobacter and the biochemical characterization of a unique copper binding protein CopX (proposed designation), associated with the copper resistance system. Growth under heavy metal ion stress was performed for those three heavy metals and the Minimum Inhibitory Concentration (MIC) / Maximum Tolerant Concentration (MTC) was determined using two different approaches, solid artificial sea water (ASW) plates and liquid ASW medium, supplemented with lactate and yeast extract, as carbon sources. The MIC/MTC of cadmium, cobalt and copper ions was found to be 200 μM, 4-6 mM, and 1.6 mM, respectively. These values classify Ma.aq strain 617 as cadmium, cobalt and copper resistant strain. Moreover, during the cobalt resistance studies we observed the production of an unknown protein or compound, which is proposed be a cobalamine containing protein and/or cobalamine itself. Under the scope of copper resistance, preliminary proteomics analysis of the Ma.aq periplasmic fraction was performed. CopX, identified by MALDI TOF-TOF mass spectrometry, was shown to be differentially expressed under copper stress. This demonstrated that the proposed copper operon, copSRXAB, has a role in the Ma.aq copper resistance. CopX was successfully heterologously expressed in Escherichia coli (Es.coli), and purified for the first time using usually two chromatographic steps (anionic exchangeand size exclusion) with a yield of 5.7 mg or 1.8 mg of purified CopX, per L of LB or M9 medium, respectively. Mass spectrometry Electron Spray Ionisation (ESI) and N-terminus analysis revealed that the signal peptide of CopX comprises 21 residues, and is efficiently processed by the Sec system of Es.coli. Biochemical characterization of CopX proved that it is a periplasmic monomeric type 1 copper protein, with a molecular weight of 17253.25 ± 0.30 Da, determined by mass spectrometry (ESI), that binds approximately 1 copper ion per polypeptide chain. The apparent molecular weight of CopX, 20.4 kDa, determined by size-exclusion chromatography does not depend on the ionic strength. Spectroscopic characterization showed that it presents an intense charge transfer (Scys – Cu ion) band at 440 nm and 580 nm and 720 nm. The extinction coefficient at 580 nm was found to be 3.8 mM-1cm-1, according to the copper content. CopX EPR spectrum is axial. The 15N HSQC NMR spectra of CopX confirms that it is folded, with 131 out of 147 backbone amide resonances identified, showing that it is amenable to NMR solution structure determination. CopX presents some unique features, such as, a ratio between A440nm and A580nm of 0.94 and a high hyperfine coupling constant, 170 G. Taking into account the biochemical properties, CopX is proposed to be part of a new class of the type 1 copper proteins, shown preliminarily for the first time to be associated with copper resistance

    Holography

    Get PDF
    Holography - Basic Principles and Contemporary Applications is a collection of fifteen chapters, describing the basic principles of holography and some recent innovative developments in the field. The book is divided into three sections. The first, Understanding Holography, presents the principles of hologram recording illustrated with practical examples. A comprehensive review of diffraction in volume gratings and holograms is also presented. The second section, Contemporary Holographic Applications, is concerned with advanced applications of holography including sensors, holographic gratings, white-light viewable holographic stereograms. The third section of the book Digital Holography is devoted to digital hologram coding and digital holographic microscopy

    Artificial Intelligence Research Branch future plans

    Get PDF
    This report contains information on the activities of the Artificial Intelligence Research Branch (FIA) at NASA Ames Research Center (ARC) in 1992, as well as planned work in 1993. These activities span a range from basic scientific research through engineering development to fielded NASA applications, particularly those applications that are enabled by basic research carried out in FIA. Work is conducted in-house and through collaborative partners in academia and industry. All of our work has research themes with a dual commitment to technical excellence and applicability to NASA short, medium, and long-term problems. FIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at the Jet Propulsion Laboratory (JPL) and AI applications groups throughout all NASA centers. This report is organized along three major research themes: (1) Planning and Scheduling: deciding on a sequence of actions to achieve a set of complex goals and determining when to execute those actions and how to allocate resources to carry them out; (2) Machine Learning: techniques for forming theories about natural and man-made phenomena; and for improving the problem-solving performance of computational systems over time; and (3) Research on the acquisition, representation, and utilization of knowledge in support of diagnosis design of engineered systems and analysis of actual systems
    corecore