1,484 research outputs found

    Information Theory, Graph Theory and Bayesian Statistics based improved and robust methods in Genome Assembly

    Get PDF
    Bioinformatics skills required for genome sequencing often represent a significant hurdle for many researchers working in computational biology. This dissertation highlights the significance of genome assembly as a research area, focuses on its need to remain accurate, provides details about the characteristics of the raw data, examines some key metrics, emphasizes some tools and outlines the whole pipeline for next-generation sequencing. Currently, a major effort is being put towards the assembly of the genomes of all living organisms. Given the importance of comparative genome assembly, herein dissertation, the principle of Minimum Description Length (MDL) and its two variants, the Two-Part MDL and Sophisticated MDL, are explored in identifying the optimal reference sequence for genome assembly. Thereafter, a Modular Approach to Reference Assisted Genome Assembly Pipeline, referred to as MARAGAP, is developed. MARAGAP uses the principle of Minimum Description Length (MDL) to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and Single Nucleotide Polymorphisms (SNPs) in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs (called BECA). BECA effectively capitalizes on the `alignment-layout-consensus' paradigm and Quality (Q-) values for detecting and correcting SNPs by evaluating a number of probabilistic measures. However, the entire process is conducted once. BECA's framework is further extended by using Gibbs Sampling for further iterations of BECA. After each assembly the reference sequence is updated and the probabilistic score for each base call renewed. The revised reference sequence and probabilities are then further used to identify the alignments and consensus sequence, thereby, yielding an algorithm referred to as Gibbs-BECA. Gibbs-BECA further improves the performance both in terms of rectifying more SNPs and offering a robust performance even in the presence of a poor reference sequence. Lastly, another major effort in this dissertation was the development of two cohesive software platforms that combine many different genome assembly pipelines in two distinct environments, referred to as Baari and Genobuntu, respectively. Baari and Genobuntu support pre-assembly tools, genome assemblers as well as post-assembly tools. Additionally, a library of tools developed by the authors for Next Generation Sequencing (NGS) data and commonly used biological software have also been provided in these software platforms. Baari and Genobuntu are free, easily distributable and facilitate building laboratories and software workstations both for personal use as well as for a college/university laboratory. Baari is a customized Ubuntu OS packed with the tools mentioned beforehand whereas Genobuntu is a software package containing the same tools for users who already have Ubuntu OS pre-installed on their systems

    Optimal reference sequence selection for genome assembly using minimum description length principle

    Get PDF
    Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome

    The A, C, G, and T of Genome Assembly

    Get PDF
    Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research

    Working Capital Management and Performance of SME Sector

    Get PDF
    The study investigates the influence of working capital management (WCM) on performance of small medium enterprises (SME’s) in Pakistan. The duration of the study is seven years from 2006 to 2012. The data used in this study was taken from different sources i.e. SMEDA, Karachi Stock Exchange, tax offices, company itself and Bloom burgee business week. Data of SME’s acquired from these sources forms the foundation of our calculation and then interpretation. As the data was gathered for a period of seven years i.e. 2006-2012, the reason for choosing this period was because of the availability of the latest data. The dependent variable of the study is Return on assets which is used as a proxy for profitability.  Independent variables were number of days account receivable, number of day’s inventory, cash conversion cycle (CCC) and number of days account payable. In addition to these variables some other variables were used which includes firm size, leverage and growth. Panal data technique is used to study the influence of WCM on profitability of SME’s. Results suggest that number of day’s accounts payable has positive association with profitability whereas average collection period, inventory turnover and CCC have inverse relation with performance. On the other hand the variable size and growth in sales has positive influence on profitability. In contrast debt ratio has negative impact on profitability. Keywords: Cash Conversion Cycle, Working Capital Management, SME’

    The A, C, G, and T of Genome Assembly

    Get PDF
    Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research

    Developing web-based digital twin of assembly lines for industrial cyber-physical systems

    Get PDF
    Modern manufacturing relies heavily on digital technologies, and the recent changes in the manufacturing environment are the reflection of the advancements in information and communication technologies. Web-based Digital Twin (WDT) will constitute the future of manufacturing giving a greater potential of process/product data interaction, where Digital Twin functions on a web browser and connects to its Physical Twin to exchange data. To this end, the research work on WDT is still in the first stages. Therefore, the current paper presents a framework for developing WDT taking into account the possibility of utilising WDT for education, research and industrial applications. A case study adopted from a mini-scale assembly line is used to illustrate the proposed concept

    Insect Pest Complex of Wheat Crop

    Get PDF
    Wheat Triticum aestivum L. is grown on broad range of climatic conditions because of edible grains, cereal crop and stable food of about 2 Billion peoples worldwide. Additionally, it is the rich source of carbohydrates (55–60%), vegetable proteins and contributed 50–60% daily dietary requirement in Pakistan. Globally, wheat crops is grown over 90% area of total cultivated area; facing devastating biotic and abiotic factors. The estimated economic losses in wheat quantity and quality are about 4 thousands per tonne per year including physical crop losses and handling. Economic losses of about 80–90 million USD in Pakistan are recorded due to inadequate production and handling losses. Wheat agro-ecosystem of the world colonizes many herbivore insects which are abundant and causing significant losses. The feeding style of the insects made them dispersive from one habitat to another imposing significant crop loss. Areas of maximum wheat production are encountered with either insect which chew the vegetative as well as reproductive part or stem and root feeders. This chapter provides the pest’s taxonomic rank, distribution across the globe, biology and damage of chewing and sucking insect pest of wheat. It is very important to study biology of the pest in accordance with crop cycle to forecast which insect stage is economically important, what the proper time to manage pest is and what type of control is necessary to manage crop pest. The chapter will provide management strategies well suited to pest stage and environment

    PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics

    Get PDF
    PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modifications, (iv) identification of truncated proteoforms, (v) in silico spectral comparison, and (vi) weight-based candidate protein scoring. High-throughput performance is achieved through the execution of optimized code via multiple threads in parallel, on graphics processing units (GPUs) using NVidia Compute Unified Device Architecture (CUDA) framework. An intuitive graphical web interface allows for setting up of search parameters as well as for visualization of results. The accuracy and performance of the tool have been validated on several TDP datasets and against available TDP software. Specifically, results obtained from searching two published TDP datasets demonstrate that PERCEPTRON outperforms all other tools by up to 135% in terms of reported proteins and 10-fold in terms of runtime. In conclusion, the proposed tool significantly enhances the state-of-the-art in TDP search software and is publicly available at https://perceptron.lums.edu.pk. Users can also create in-house deployments of the tool by building code available on the GitHub repository (http://github.com/BIRL/Perceptron)

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis
    corecore