131 research outputs found

    Database Streaming Compression on Memory-Limited Machines

    Get PDF
    Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s

    Leisure Boating Environmental Footprint: A Study of Leisure Marinas in Palermo

    Get PDF
    Ports have played a significant role in the touristic development and further economic growth of Italy. It is the country with the highest number of berths among the nations in the Medi- terranean Sea; over time, Italy has created ports with a range of functions. Therefore, it is of vital importance to evaluate the potential pollutants generated from these docks and propose ways to eliminate those problems. A survey that asked about the carbon footprint and the quality of the water in the water footprint calculation was created and distributed to the management of the ma- rinas’ operations. After receiving the completed surveys, the data were analyzed and translated using emission factors into tons of CO2 equivalent. The amount of greenhouse gases generated by the investigated marinas was determined by calculating the carbon and water footprints of five rep- resentative Palermo marinas, and we aimed to better understand how these port-related operations affect the environment. To pinpoint the pollutant sources within the investigated marinas, an orig- inal P-Mapping/Pareto ratio approach was performed as supported by Pareto’s principle. The find- ings indicated that the primary operations of the marina sector are the main sources of pollution. However, a sizable portion of the emissions were also caused by pollution from supporting opera- tions. Based on the study, the origins of CO2 and pollution in marina operations were clarified. The results obtained enable the authors to make recommendations that all recreational boating activities should be closely supervised in order to reduce CO2 emissions and their input in relation to envi- ronmental degradation

    Establishment of a national network of cetacean monitoring within the marine strategy

    Get PDF
    CONISMA, CNR and CIRCE, involved Italian research units (RUs) working on cetaceans to joina National Network answering the Marine Strategy Framework Directive (MSFD) requirements by sharing monitoring data. Data obtained during the 2016 monitoring campaigns by 13 RUs are presented here

    The Presence of the Iron-Sulfur Motif Is Important for the Conformational Stability of the Antiviral Protein, Viperin

    Get PDF
    Viperin, an antiviral protein, has been shown to contain a CX3CX2C motif, which is conserved in the radical S-adenosyl-methionine (SAM) enzyme family. A triple mutant which replaces these three cysteines with alanines has been shown to have severe deficiency in antiviral activity. Since the crystal structure of Viperin is not available, we have used a combination of computational methods including multi-template homology modeling and molecular dynamics simulation to develop a low-resolution predicted structure. The results show that Viperin is an α -β protein containing iron-sulfur cluster at the center pocket. The calculations suggest that the removal of iron-sulfur cluster would lead to collapse of the protein tertiary structure. To verify these predictions, we have prepared, expressed and purified four mutant proteins. In three mutants individual cysteine residues were replaced by alanine residues while in the fourth all the cysteines were replaced by alanines. Conformational analyses using circular dichroism and steady state fluorescence spectroscopy indicate that the mutant proteins are partially unfolded, conformationally unstable and aggregation prone. The lack of conformational stability of the mutant proteins may have direct relevance to the absence of their antiviral activity

    A Self-Organizing Algorithm for Modeling Protein Loops

    Get PDF
    Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies

    Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

    Get PDF
    Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/

    ViennaRNA Package 2.0

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.</p> <p>Results</p> <p>The <monospace>ViennaRNA</monospace> Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the <it>Turner 2004 </it>parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying <monospace>RNAlib</monospace> and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as <it>centroid </it>structures and <it>maximum expected accuracy </it>structures derived from base pairing probabilities, or <it>z</it>-<it>scores </it>for locally stable secondary structures, and support for input in <monospace>fasta</monospace> format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions.</p> <p>Conclusions</p> <p>The <monospace>ViennaRNA Package 2.0</monospace>, supporting concurrent computations <monospace>via OpenMP</monospace>, can be downloaded from <url>http://www.tbi.univie.ac.at/RNA</url>.</p
    • …
    corecore