129 research outputs found

    SERVER-SIDE PROCESSING TECHNIQUES FOR OPTIMIZING THE SPEED OF PRESENTING BIG DATA

    Get PDF
    Big data is the latest industry keyword to describe large volumes of structured and unstructured data that are difficult to process and analyze. Most organizations are looking for the best approach to managing and analyzing large volumes of data, especially in decision making. Large data causes the process of presenting information to be slow because all the large amounts of data must be displayed so that specific techniques are needed so that the presentation of information remains fast even though the data is already large. The website generally processes requests to the server, and then if the required data is available, the server will send all the data. This causes all processes to be based on the client-side. So that the client load becomes heavy in displaying all the data. In this study, server-side processing techniques will be applied so that all processes will be handled by the server and the data sent is not all direct but based on periodic requests from the client. The results of this study indicate the use of server-side processing techniques is more optimal. Based on the results of testing the data presentation speed comparison with server-side processing techniques 98.6% is better than client-side processing

    An integrated SDN architecture for application driven networking

    No full text
    The target of our effort is the definition of a dynamic network architecture meeting the requirements of applications competing for reliable high performance network resources. These applications have different requirements regarding reli- ability, bandwidth, latency, predictability, quality, reliable lead time and allocatability. At a designated instance in time a virtual network has to be defined automatically for a limited period of time, based on an existing physical network infrastructure, which implements the requirements of an application. We suggest an integrated Software Defined Network (SDN) architecture providing highly customizable functionalities required for efficient data transfer. It consists of a service interface towards the application and an open network interface towards the physical infrastruc- ture. Control and forwarding plane are separated for better scalability. This type of architecture allows to negotiate the reser- vation of network resources involving multiple applications with different requirement profiles within multi-domain environments

    Proceedings of the 3rd Open Source Geospatial Research & Education Symposium OGRS 2014

    Get PDF
    The third Open Source Geospatial Research & Education Symposium (OGRS) was held in Helsinki, Finland, on 10 to 13 June 2014. The symposium was hosted and organized by the Department of Civil and Environmental Engineering, Aalto University School of Engineering, in partnership with the OGRS Community, on the Espoo campus of Aalto University. These proceedings contain the 20 papers presented at the symposium. OGRS is a meeting dedicated to exchanging ideas in and results from the development and use of open source geospatial software in both research and education.  The symposium offers several opportunities for discussing, learning, and presenting results, principles, methods and practices while supporting a primary theme: how to carry out research and educate academic students using, contributing to, and launching open source geospatial initiatives. Participating in open source initiatives can potentially boost innovation as a value creating process requiring joint collaborations between academia, foundations, associations, developer communities and industry. Additionally, open source software can improve the efficiency and impact of university education by introducing open and freely usable tools and research results to students, and encouraging them to get involved in projects. This may eventually lead to new community projects and businesses. The symposium contributes to the validation of the open source model in research and education in geoinformatics

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Fast retrieval of weather analogues in a multi-petabyte meteorological archive

    Get PDF
    The European Centre for Medium-Range Weather Forecasts (ECMWF) manages the largest archive of meteorological data in the world. At the time of writing, it holds around 300 petabytes and grows at a rate of 1 petabyte per week. This archive is now mature, and contains valuable datasets such as several reanalyses, providing a consistent view of the weather over several decades. Weather analogue is the term used by meteorologists to refer to similar weather situations. Looking for analogues in an archive using a brute force approach requires data to be retrieved from tape and then compared to a user-provided weather pattern, using a chosen similarity measure. Such an operation would be very long and costly. In this work, a wavelet-based fingerprinting scheme is proposed to index all weather patterns from the archive, over a selected geographical domain. The system answers search queries by computing the fingerprint of the query pattern and looking for close matched in the index. Searches are fast enough that they are perceived as being instantaneous. A web-based application is provided, allowing users to express their queries interactively in a friendly and straightforward manner by sketching weather patterns directly in their web browser. Matching results are then presented as a series of weather maps, labelled with the date and time at which they occur. The system has been deployed as part of the Copernicus Climate Data Store and allows the retrieval of weather analogues from ERA5, a 40-years hourly reanalysis dataset. Some preliminary results of this work have been presented at the International Conference on Computational Science 2018 (Raoult et al. (2018))

    Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

    Get PDF
    Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2

    Understanding virus and microbial evolution in wildlife through meta-transcriptomics

    Get PDF
    Wildlife harbors a substantial and largely undocumented diversity of RNA viruses and microbial life forms. RNA viruses and microbes are also arguably the most diverse and dynamic entities on Earth. Despite their evident importance, there are major limitations in our knowledge of the diversity, ecology, and evolution of RNA viruses and microbial communities. These gaps stem from a variety of factors, including biased sampling and the difficulty in accurately identifying highly divergent sequences through sequence similarity-based analyses alone. The implementation of meta-transcriptomic sequencing has greatly contributed to narrowing this gap. In particular, the rapid increase in the number of newly described RNA viruses over the last decade provides a glimpse of the remarkable diversity within the RNA virosphere. The central goal in this thesis was to determine the diversity of RNA viruses associated with wildlife, particularly in an Australian context. To this end I exploited cutting-edge meta-transcriptomic and bioinformatic approaches to reveal the RNA virus diversity within diverse animal taxa, tissues, and environments, with a special focus on the highly divergent "dark matter" of the virome that has largely been refractory to sequence analysis. Similarly, I used these approaches to detect targeted common microbes circulating in vertebrate and invertebrate fauna. Another important goal was to assess the diversity of RNA viruses and microbes as a cornerstone within a new eco-evolutionary framework. By doing so, this thesis encompasses multiple disciplines including virus discovery, viral host-range distributions, microbial-virus and host–parasite interactions, phylogenetic analysis, and pathogen surveillance. In sum, the research presented in this thesis expands the known RNA virosphere as well as the detection and surveillance of targeted microbes in wildlife, providing new insights into the diversity, evolution, and ecology of these agents in nature

    Strategioita toksikogenomidata-analyysien standardisoinnin ja robustisuuden parantamiseksi

    Get PDF
    Toxicology is the scientific pursuit of identifying and classifying the toxic effect of a substance, as well as exploration and understanding of the adverse effects due to toxic exposure. The modern toxicological efforts have been driven by the human industrial exploits in the production of engineered substances with advanced interdisciplinary scientific collaborations. These engineered substances must be carefully tested to ensure public safety. This task is now more challenging than ever with the employment of new classes of chemical compounds, such as the engineered nanomaterials. Toxicological paradigms have been redefined over the decades to be more agile, versatile, and sensitive. On the other hand, the design of toxicological studies has become more complex, and the interpretation of the results is more challenging. Toxicogenomics offers a wealth of data to estimate the gene regulation by inspection of the alterations of many biomolecules (such as DNA, RNA, proteins, and metabolites). The response of functional genes can be used to infer the toxic effects on the biological system resulting in acute or chronic adverse effects. However, the dense data from toxicogenomics studies is difficult to analyze, and the results are difficult to interpret. Toxicogenomic evidence is still not completely integrated into the regulatory framework due to these drawbacks. Nanomaterial properties such as particle size, shape, and structure increase complexity and unique challenges to Nanotoxicology. This thesis presents the efforts in the standardization of toxicogenomics data by showcasing the potential of omics in nanotoxicology and providing easy to use tools for the analysis, and interpretation of omics data. This work explores two main themes: i) omics experimentation in nanotoxicology and investigation of nanomaterial effect by analysis of the omics data, and ii) the development of analysis pipelines as easy to use tools that bring advanced analytical methods to general users. In this work, I explored a potential solution that can ensure effective interpretability and reproducibility of omics data and related experimentation such that an independent researcher can interpret it thoroughly. DNA microarray technology is a well-established research tool to estimate the dynamics of biological molecules with high throughput. The analysis of data from these assays presents many challenges as the study designs are quite complex. I explored the challenges of omics data processing and provided bioinformatics solutions to standardize this process. The responses of individual molecules to a given exposure is only partially informative and more sophisticated models, disentangling the complex networks of dynamic molecular interactions, need to be explored. An analytical solution is presented in this thesis to tackle down the challenge of producing robust interpretations of molecular dynamics in biological systems. It allows exploring the substructures in molecular networks underlying mechanisms of molecular adaptation to exposures. I also present here a multi-omics approach to defining the mechanism of action for human cell lines exposed to nanomaterials. All the methodologies developed in this project for omics data processing and network analysis are implemented as software solutions that are designed to be easily accessible also by users with no expertise in bioinformatics. Our strategies are also developed in an effort to standardize omics data processing and analysis and to promote the use of omics-based evidence in chemical risk assessment.Toxicology is the scientific pursuit of identifying and classifying the toxic effect of a substance, as well as exploration and understanding of the adverse effects due to toxic exposure. The modern toxicological efforts have been driven by the human industrial exploits in the production of engineered substances with advanced interdisciplinary scientific collaborations. These engineered substances must be carefully tested to ensure public safety. This task is now more challenging than ever with the employment of new classes of chemical compounds, such as the engineered nanomaterials. Toxicological paradigms have been redefined over the decades to be more agile, versatile, and sensitive. On the other hand, the design of toxicological studies has become more complex, and the interpretation of the results is more challenging. Toxicogenomics offers a wealth of data to estimate the gene regulation by inspection of the alterations of many biomolecules (such as DNA, RNA, proteins, and metabolites). The response of functional genes can be used to infer the toxic effects on the biological system resulting in acute or chronic adverse effects. However, the dense data from toxicogenomics studies is difficult to analyze, and the results are difficult to interpret. Toxicogenomic evidence is still not completely integrated into the regulatory framework due to these drawbacks. Nanomaterial properties such as particle size, shape, and structure increase complexity and unique challenges to Nanotoxicology. This thesis presents the efforts in the standardization of toxicogenomics data by showcasing the potential of omics in nanotoxicology and providing easy to use tools for the analysis, and interpretation of omics data. This work explores two main themes: i) omics experimentation in nanotoxicology and investigation of nanomaterial effect by analysis of the omics data, and ii) the development of analysis pipelines as easy to use tools that bring advanced analytical methods to general users. In this work, I explored a potential solution that can ensure effective interpretability and reproducibility of omics data and related experimentation such that an independent researcher can interpret it thoroughly. DNA microarray technology is a well-established research tool to estimate the dynamics of biological molecules with high throughput. The analysis of data from these assays presents many challenges as the study designs are quite complex. I explored the challenges of omics data processing and provided bioinformatics solutions to standardize this process. The responses of individual molecules to a given exposure is only partially informative and more sophisticated models, disentangling the complex networks of dynamic molecular interactions, need to be explored. An analytical solution is presented in this thesis to tackle down the challenge of producing robust interpretations of molecular dynamics in biological systems. It allows exploring the substructures in molecular networks underlying mechanisms of molecular adaptation to exposures. I also present here a multi-omics approach to defining the mechanism of action for human cell lines exposed to nanomaterials. All the methodologies developed in this project for omics data processing and network analysis are implemented as software solutions that are designed to be easily accessible also by users with no expertise in bioinformatics. Our strategies are also developed in an effort to standardize omics data processing and analysis and to promote the use of omics-based evidence in chemical risk assessment
    • …
    corecore