Search CORE

129 research outputs found

SERVER-SIDE PROCESSING TECHNIQUES FOR OPTIMIZING THE SPEED OF PRESENTING BIG DATA

Author: Hidayat Deri Kurnia
Rahmatulloh Alam
Sulastri Heni
Publication venue: 'PPPM STMIK Nusa Mandiri'
Publication date: 07/03/2019
Field of study

Big data is the latest industry keyword to describe large volumes of structured and unstructured data that are difficult to process and analyze. Most organizations are looking for the best approach to managing and analyzing large volumes of data, especially in decision making. Large data causes the process of presenting information to be slow because all the large amounts of data must be displayed so that specific techniques are needed so that the presentation of information remains fast even though the data is already large. The website generally processes requests to the server, and then if the required data is available, the server will send all the data. This causes all processes to be based on the client-side. So that the client load becomes heavy in displaying all the data. In this study, server-side processing techniques will be applied so that all processes will be handled by the server and the data sent is not all direct but based on periodic requests from the client. The results of this study indicate the use of server-side processing techniques is more optimal. Based on the results of testing the data presentation speed comparison with server-side processing techniques 98.6% is better than client-side processing

ejournal.nusamandiri.ac.id (STMIK Nusa Mandiri)

An integrated SDN architecture for application driven networking

Author: Budich R.
Georgi A.
Hérenger H.
Meeres Y.
Sperber R.
Publication venue
Publication date: 30/06/2014
Field of study

The target of our effort is the definition of a dynamic network architecture meeting the requirements of applications competing for reliable high performance network resources. These applications have different requirements regarding reli- ability, bandwidth, latency, predictability, quality, reliable lead time and allocatability. At a designated instance in time a virtual network has to be defined automatically for a limited period of time, based on an existing physical network infrastructure, which implements the requirements of an application. We suggest an integrated Software Defined Network (SDN) architecture providing highly customizable functionalities required for efficient data transfer. It consists of a service interface towards the application and an open network interface towards the physical infrastruc- ture. Control and forwarding plane are separated for better scalability. This type of architecture allows to negotiate the reser- vation of network resources involving multiple applications with different requirement profiles within multi-domain environments

MPG.PuRe

Proceedings of the 3rd Open Source Geospatial Research & Education Symposium OGRS 2014

Author: Jolma Ari (editor)
Lehto Lassi (editor)
Sarkola Pekka (editor)
Publication venue: Aalto-yliopisto
Publication date: 01/01/2014
Field of study

The third Open Source Geospatial Research & Education Symposium (OGRS) was held in Helsinki, Finland, on 10 to 13 June 2014. The symposium was hosted and organized by the Department of Civil and Environmental Engineering, Aalto University School of Engineering, in partnership with the OGRS Community, on the Espoo campus of Aalto University. These proceedings contain the 20 papers presented at the symposium. OGRS is a meeting dedicated to exchanging ideas in and results from the development and use of open source geospatial software in both research and education. The symposium offers several opportunities for discussing, learning, and presenting results, principles, methods and practices while supporting a primary theme: how to carry out research and educate academic students using, contributing to, and launching open source geospatial initiatives. Participating in open source initiatives can potentially boost innovation as a value creating process requiring joint collaborations between academia, foundations, associations, developer communities and industry. Additionally, open source software can improve the efficiency and impact of university education by introducing open and freely usable tools and research results to students, and encouraging them to get involved in projects. This may eventually lead to new community projects and businesses. The symposium contributes to the validation of the open source model in research and education in geoinformatics

Aaltodoc Publication Archive

Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

Author: Roth Christian
Publication venue: University Goettingen Repository
Publication date: 06/09/2021
Field of study

Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

Georg-August-University Göttingen

Fast retrieval of weather analogues in a multi-petabyte meteorological archive

Author: Raoult Baudouin
Publication venue
Publication date: 31/01/2020
Field of study

The European Centre for Medium-Range Weather Forecasts (ECMWF) manages the largest archive of meteorological data in the world. At the time of writing, it holds around 300 petabytes and grows at a rate of 1 petabyte per week. This archive is now mature, and contains valuable datasets such as several reanalyses, providing a consistent view of the weather over several decades. Weather analogue is the term used by meteorologists to refer to similar weather situations. Looking for analogues in an archive using a brute force approach requires data to be retrieved from tape and then compared to a user-provided weather pattern, using a chosen similarity measure. Such an operation would be very long and costly. In this work, a wavelet-based fingerprinting scheme is proposed to index all weather patterns from the archive, over a selected geographical domain. The system answers search queries by computing the fingerprint of the query pattern and looking for close matched in the index. Searches are fast enough that they are perceived as being instantaneous. A web-based application is provided, allowing users to express their queries interactively in a friendly and straightforward manner by sketching weather patterns directly in their web browser. Matching results are then presented as a series of weather maps, labelled with the date and time at which they occur. The system has been deployed as part of the Copernicus Climate Data Store and allows the retrieval of weather analogues from ERA5, a 40-years hourly reanalysis dataset. Some preliminary results of this work have been presented at the International Conference on Computational Science 2018 (Raoult et al. (2018))

Central Archive at the University of Reading

Modelling for Environment's Sake:Proceedings of the 5th Biennial Conference of the International Environmental Modelling and Software Society, iEMSs 2010

Author
Publication venue
Publication date: 01/12/2010
Field of study

University of Twente Research Information

Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

Author: Thomas A
Publication venue: 'Japanese Society for Biological Sciences in Space'
Publication date: 11/06/2019
Field of study

Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2

Open Research Exeter

Understanding virus and microbial evolution in wildlife through meta-transcriptomics

Author: Ortiz Baez Ayda
Publication venue: 'University of Zagreb, Faculty of Science, Department of Mathematics'
Publication date: 01/01/2023
Field of study

Wildlife harbors a substantial and largely undocumented diversity of RNA viruses and microbial life forms. RNA viruses and microbes are also arguably the most diverse and dynamic entities on Earth. Despite their evident importance, there are major limitations in our knowledge of the diversity, ecology, and evolution of RNA viruses and microbial communities. These gaps stem from a variety of factors, including biased sampling and the difficulty in accurately identifying highly divergent sequences through sequence similarity-based analyses alone. The implementation of meta-transcriptomic sequencing has greatly contributed to narrowing this gap. In particular, the rapid increase in the number of newly described RNA viruses over the last decade provides a glimpse of the remarkable diversity within the RNA virosphere. The central goal in this thesis was to determine the diversity of RNA viruses associated with wildlife, particularly in an Australian context. To this end I exploited cutting-edge meta-transcriptomic and bioinformatic approaches to reveal the RNA virus diversity within diverse animal taxa, tissues, and environments, with a special focus on the highly divergent "dark matter" of the virome that has largely been refractory to sequence analysis. Similarly, I used these approaches to detect targeted common microbes circulating in vertebrate and invertebrate fauna. Another important goal was to assess the diversity of RNA viruses and microbes as a cornerstone within a new eco-evolutionary framework. By doing so, this thesis encompasses multiple disciplines including virus discovery, viral host-range distributions, microbial-virus and host–parasite interactions, phylogenetic analysis, and pathogen surveillance. In sum, the research presented in this thesis expands the known RNA virosphere as well as the detection and surveillance of targeted microbes in wildlife, providing new insights into the diversity, evolution, and ecology of these agents in nature

Sydney eScholarship

Recommended from our members

AIRM: a new AI Recruiting Model for the Saudi Arabian labour market

Author: Aleisa Monirah Ali
Publication venue
Publication date: 17/06/2022
Field of study

One of the goals of Saudi Vision 2030 is to keep the unemployment rate at the lowest level to empower the economy. Prior research has shown that an increase in unemployment has a negative effect on a country’s Gross Domestic Product. This research aims to utilise cutting-edge technology such as Data Lake (DL), Machine Learning (ML) and Artificial Intelligence (AI) to assist the Saudi labour market bymatching job seekers with vacant positions. Currently, human experts carry out this process; however, this is time consuming and labour intensive. Moreover, in the Saudi labour market, this process does not use a cohesive data centre to monitor, integrate, or analyse labour market data, resulting in inefficiencies, such as bias and latency. These inefficiencies arise from a lack of technologies and, more importantly, from having an open labour market without a national labour market data centre. This research proposes a new AI Recruiting Model (AIRM) architecture that exploits DLs, ML and AI to rapidly and efficiently match job seekers to vacant positions in the Saudi labour market. A Minimum Viable Product (MVP) is employed to test the proposed AIRM architecture using a labour market dataset simulation corpus for training purposes; the architecture is further evaluated against three research-collaborative Human Resources (HR) professionals. As this research is data-driven in nature, it requires collaboration from domain experts. The first layer of the AIRM architecture uses balanced iterative reducing and clustering using hierarchies (BIRCH) as a clustering algorithm for the initial screening layer. The mapping layer uses sentence transformers with a robustly optimised BERTt pre-training approach (RoBERTa) as the base model, and ranking is carried out using the Facebook AI Similarity Search (FAISS). Finally, the preferences layer takes the user’s preferences as a list and sorts the results using the pre-trained cross-encoders model, considering the weight of the more important words. This new AIRM has yielded favourable outcomes: This research considered accepting an AIRM selection ratified by at least one HR expert to account for the subjective character of the selection process when exclusively handled by human HR experts. The research evaluated the AIRM using two metrics: accuracy and time. The AIRM had an overall matching accuracy of 84%, with at least one expert agreeing with the system’s output. Furthermore, it completed the task in 2.4 minutes, whereas human experts took more than six days on average. Overall, the AIRM outperforms humans in task execution, making it useful in pre-selecting a group of applicants and positions. The AIRM is not limited to government services. It can also help any commercial business that uses Big Data

Sussex Research Online

Strategioita toksikogenomidata-analyysien standardisoinnin ja robustisuuden parantamiseksi

Author: Marwah Veer Singh
Publication venue: 'University of Helsinki Libraries'
Publication date: 06/09/2019
Field of study

Toxicology is the scientific pursuit of identifying and classifying the toxic effect of a substance, as well as exploration and understanding of the adverse effects due to toxic exposure. The modern toxicological efforts have been driven by the human industrial exploits in the production of engineered substances with advanced interdisciplinary scientific collaborations. These engineered substances must be carefully tested to ensure public safety. This task is now more challenging than ever with the employment of new classes of chemical compounds, such as the engineered nanomaterials. Toxicological paradigms have been redefined over the decades to be more agile, versatile, and sensitive. On the other hand, the design of toxicological studies has become more complex, and the interpretation of the results is more challenging. Toxicogenomics offers a wealth of data to estimate the gene regulation by inspection of the alterations of many biomolecules (such as DNA, RNA, proteins, and metabolites). The response of functional genes can be used to infer the toxic effects on the biological system resulting in acute or chronic adverse effects. However, the dense data from toxicogenomics studies is difficult to analyze, and the results are difficult to interpret. Toxicogenomic evidence is still not completely integrated into the regulatory framework due to these drawbacks. Nanomaterial properties such as particle size, shape, and structure increase complexity and unique challenges to Nanotoxicology. This thesis presents the efforts in the standardization of toxicogenomics data by showcasing the potential of omics in nanotoxicology and providing easy to use tools for the analysis, and interpretation of omics data. This work explores two main themes: i) omics experimentation in nanotoxicology and investigation of nanomaterial effect by analysis of the omics data, and ii) the development of analysis pipelines as easy to use tools that bring advanced analytical methods to general users. In this work, I explored a potential solution that can ensure effective interpretability and reproducibility of omics data and related experimentation such that an independent researcher can interpret it thoroughly. DNA microarray technology is a well-established research tool to estimate the dynamics of biological molecules with high throughput. The analysis of data from these assays presents many challenges as the study designs are quite complex. I explored the challenges of omics data processing and provided bioinformatics solutions to standardize this process. The responses of individual molecules to a given exposure is only partially informative and more sophisticated models, disentangling the complex networks of dynamic molecular interactions, need to be explored. An analytical solution is presented in this thesis to tackle down the challenge of producing robust interpretations of molecular dynamics in biological systems. It allows exploring the substructures in molecular networks underlying mechanisms of molecular adaptation to exposures. I also present here a multi-omics approach to defining the mechanism of action for human cell lines exposed to nanomaterials. All the methodologies developed in this project for omics data processing and network analysis are implemented as software solutions that are designed to be easily accessible also by users with no expertise in bioinformatics. Our strategies are also developed in an effort to standardize omics data processing and analysis and to promote the use of omics-based evidence in chemical risk assessment.Toxicology is the scientific pursuit of identifying and classifying the toxic effect of a substance, as well as exploration and understanding of the adverse effects due to toxic exposure. The modern toxicological efforts have been driven by the human industrial exploits in the production of engineered substances with advanced interdisciplinary scientific collaborations. These engineered substances must be carefully tested to ensure public safety. This task is now more challenging than ever with the employment of new classes of chemical compounds, such as the engineered nanomaterials. Toxicological paradigms have been redefined over the decades to be more agile, versatile, and sensitive. On the other hand, the design of toxicological studies has become more complex, and the interpretation of the results is more challenging. Toxicogenomics offers a wealth of data to estimate the gene regulation by inspection of the alterations of many biomolecules (such as DNA, RNA, proteins, and metabolites). The response of functional genes can be used to infer the toxic effects on the biological system resulting in acute or chronic adverse effects. However, the dense data from toxicogenomics studies is difficult to analyze, and the results are difficult to interpret. Toxicogenomic evidence is still not completely integrated into the regulatory framework due to these drawbacks. Nanomaterial properties such as particle size, shape, and structure increase complexity and unique challenges to Nanotoxicology. This thesis presents the efforts in the standardization of toxicogenomics data by showcasing the potential of omics in nanotoxicology and providing easy to use tools for the analysis, and interpretation of omics data. This work explores two main themes: i) omics experimentation in nanotoxicology and investigation of nanomaterial effect by analysis of the omics data, and ii) the development of analysis pipelines as easy to use tools that bring advanced analytical methods to general users. In this work, I explored a potential solution that can ensure effective interpretability and reproducibility of omics data and related experimentation such that an independent researcher can interpret it thoroughly. DNA microarray technology is a well-established research tool to estimate the dynamics of biological molecules with high throughput. The analysis of data from these assays presents many challenges as the study designs are quite complex. I explored the challenges of omics data processing and provided bioinformatics solutions to standardize this process. The responses of individual molecules to a given exposure is only partially informative and more sophisticated models, disentangling the complex networks of dynamic molecular interactions, need to be explored. An analytical solution is presented in this thesis to tackle down the challenge of producing robust interpretations of molecular dynamics in biological systems. It allows exploring the substructures in molecular networks underlying mechanisms of molecular adaptation to exposures. I also present here a multi-omics approach to defining the mechanism of action for human cell lines exposed to nanomaterials. All the methodologies developed in this project for omics data processing and network analysis are implemented as software solutions that are designed to be easily accessible also by users with no expertise in bioinformatics. Our strategies are also developed in an effort to standardize omics data processing and analysis and to promote the use of omics-based evidence in chemical risk assessment

Helsingin yliopiston digitaalinen arkisto