20,613 research outputs found

    Statistical framework for video decoding complexity modeling and prediction

    Get PDF
    Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time - feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding

    Evaluation of machine learning approaches for prediction of protein coding genes in prokaryotic DNA sequences

    Get PDF
    According to the National Human Genome Research Institute the amount of genomic data generated on a yearly basis is constantly increasing. This rapid growth in genomic data has led to a subsequent surge in the demand for efficient analysis and handling of said data. Gene prediction involves identifying the areas of a DNA sequence that code for proteins, also called protein coding genes. This task falls within the scope of bioinformatics, and there has been surprisingly little development in this field of study, over the past years. Despite there being sufficient state-of-the-art gene prediction tools, there is still room for improvement in terms of efficiency and accuracy. Advances made within the field of gene prediction can, among other things, aid the medical and pharmaceutical industry, as well as impact environmental and anthropological research. Machine learning techniques such as the Random Forest classifiers and Artificial Neural Networks (ANN) have proved successful at the task of gene prediction. In this thesis one deep learning model and two other machine learning models were tested. The first model implemented was the established Random Forest classifier. When it comes to the use of ensemble methods, such as the Random Forest classifier, feature engineering is critical for the success of such models. The exploration of different feature selection and extraction techniques underpinned its relevance. It also showed that feature importance varies greatly among genomes, and revealed possibilities that can be further explored in future work. The second model tested was the ensemble method Extreme Gradient Boosting (XGBoost), which served as a good competitor to the Random Forest classifier. Finally, a Recurrent Neural Network (RNN) was implemented. RNNs are known to be good with handling sequential data, therefore it seemed like a good candidate for gene prediction. The 15 prokaryotic genomes used to train the models were extracted from the NCBI genome database. Each model was tasked with classifying sub-sequences of the genomes, called open reading frames (ORFs), as either protein coding ORFs, or non-coding ORFs. One challenge when preparing these datasets was that the number of protein coding ORFs was very small compared to the number of non-coding ORFs. Another problem encountered in the dataset was that protein coding ORFs in general are longer than non-coding ORFs, which can bias the models to simply classify long ORFs as protein coding, and short ORFs as non-coding. For these reasons, two datasets for each genome were created, taking each imbalance into account. The models were trained, tuned and tested on both datasets for all genomes, and a combination of genomes. The models were evaluated with regard to accuracy, precision and recall. The results show that all three methods have potential and attained somewhat similar performance scores. Despite the fact that both time and data were limited during model development, they still yielded promising results. Considering there are several parameters that have not yet been tuned in all models, many possibilities for further research remain. The fact that a relatively simple RNN architecture performed so well, and has no requirement for feature engineering, shows great promise for further applications in gene prediction, and possibly other fields in bioinformatics.M-D

    Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

    Get PDF
    Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

    Nucleotide Complementarity Features in the Design of Effective Artificial miRNAs

    Full text link
    L'importance du miARN dans la rĂ©gulation des gĂšnes a bien Ă©tĂ© Ă©tablie. Cependant, le mĂ©canisme prĂ©cis du processus de reconnaissance des cibles n'est toujours pas complĂštement compris. Parmi les facteurs connus, la complĂ©mentaritĂ© en nuclĂ©otides, l'accessibilitĂ© des sites cibles, la concentration en espĂšces d'ARN et la coopĂ©rativitĂ© des sites ont Ă©tĂ© jugĂ©es importantes. En utilisant ces rĂšgles connues, nous avons prĂ©cĂ©demment conçu des miARN artificiels qui inhibent la croissance des cellules cancĂ©reuses en rĂ©primant l'expression de plusieurs gĂšnes. De telles sĂ©quences guides ont Ă©tĂ© dĂ©livrĂ©es dans les cellules sous forme de shARN. Le VIH Ă©tant un virus Ă  ARN, nous avons conçu et testĂ© des ARN guides qui inhibent sa rĂ©plication en ciblant directement le gĂ©nome viral et les facteurs cellulaires nĂ©cessaires au virus dans le cadre de mon premier projet. En utilisant une version mise Ă  jour du programme de conception, mirBooking, nous devenons capables de prĂ©dire l'effet de concentration des espĂšces Ă  ARN avec plus de prĂ©cision. Les sĂ©quences guides conçues fournissaient aux cellules une rĂ©sistance efficace Ă  l'infection virale, Ă©gale ou meilleure que celles ciblant directement le gĂ©nome viral par une complĂ©mentaritĂ© quasi-parfaite. Cependant, les niveaux de rĂ©pression des facteurs viraux et cellulaires ne pouvaient pas ĂȘtre prĂ©dits avec prĂ©cision. Afin de mieux comprendre les rĂšgles de reconnaissance des cibles miARN, les rĂšgles de couplage des bases au-delĂ  du « seed » ont Ă©tĂ© approfondies dans mon deuxiĂšme projet. En concevant des sĂ©quences guides correspondant partiellement Ă  la cible et en analysant le schĂ©ma de rĂ©pression, nous avons Ă©tabli un modĂšle unificateur de reconnaissance de cible par miARN via la protĂ©ine Ago2. Il montre qu'une fois que le « seed » est appariĂ©e avec l'ARN cible, la formation d'un duplex d'ARN est interrompue au niveau de la partie centrale du brin guide mais reprend plus loin en aval de la partie centrale en suivant un ordre distinct. L'implĂ©mentation des rĂšgles dĂ©couvertes dans un programme informatique, MicroAlign, a permis d'amĂ©liorer la conception de miARN artificiels efficaces. Dans cette Ă©tude, nous avons non seulement confirmĂ© la contribution des nuclĂ©otides non-germes Ă  l'efficacitĂ© des miARN, mais Ă©galement dĂ©fini de maniĂšre quantitative la maniĂšre dont ils fonctionnent. Le point de vue actuellement rĂ©pandu selon lequel les miARN peuvent cibler efficacement tous les gĂšnes de maniĂšre Ă©gale, avec uniquement des correspondances de semences, peut nĂ©cessiter un rĂ©examenThe importance of miRNA in gene regulation has been well established; however, the precise mechanism of its target recognition process is still not completely understood. Among the known factors, nucleotide complementarity, accessibility of the target sites, and the concentration of the RNA species, and site cooperativity were deemed important. Using these known rules, we previously designed artificial miRNAs that inhibit cancer cell growth by repressing the expression of multiple genes. Such guide sequences were delivered into the cells in the form of shRNAs. HIV is an RNA virus. We designed and tested guide RNAs that inhibit its replication by directly targeting the viral genome and cellular factors that the virus requires in my first project. Using an updated version of the design program, mirBooking, we become capable to predict the concentration effect of RNA species more accurately. Designed guide sequences provided cells with effective resistance against viral infection. The protection was equal or better than those that target the viral genome directly via near-perfect complementarity. However, the repression levels of the viral and cellular factors could not be precisely predicted. In order to gain further insights on the rules of miRNA target recognition, the rules of base pairing beyond the seed was further investigated in my second project. By designing guide sequences that partially match the target and analysing the repression pattern, we established a unifying model of miRNA target recognition via Ago2 protein. It shows that once the seed is base-paired with the target RNA, the formation of an RNA duplex is interrupted at the central portion of the guide strand but resumes further downstream of the central portion following a distinct order. The implementation of the discovered rules in a computer program, MicroAlign, enhanced the design of efficient artificial miRNAs. In this study, we not only confirmed the contribution of non-seed nucleotides to the efficiency of miRNAs, but also quantitatively defined the way through which they work. The currently popular view that miRNAs can effectively target all genes equally with only seed matches may require careful re-examination

    Non-coding yet non-trivial: a review on the computational genomics of lincRNAs

    Get PDF

    Consciousness CLEARS the Mind

    Full text link
    A full understanding of consciouness requires that we identify the brain processes from which conscious experiences emerge. What are these processes, and what is their utility in supporting successful adaptive behaviors? Adaptive Resonance Theory (ART) predicted a functional link between processes of Consciousness, Learning, Expectation, Attention, Resonance, and Synchrony (CLEARS), includes the prediction that "all conscious states are resonant states." This connection clarifies how brain dynamics enable a behaving individual to autonomously adapt in real time to a rapidly changing world. The present article reviews theoretical considerations that predicted these functional links, how they work, and some of the rapidly growing body of behavioral and brain data that have provided support for these predictions. The article also summarizes ART models that predict functional roles for identified cells in laminar thalamocortical circuits, including the six layered neocortical circuits and their interactions with specific primary and higher-order specific thalamic nuclei and nonspecific nuclei. These prediction include explanations of how slow perceptual learning can occur more frequently in superficial cortical layers. ART traces these properties to the existence of intracortical feedback loops, and to reset mechanisms whereby thalamocortical mismatches use circuits such as the one from specific thalamic nuclei to nonspecific thalamic nuclei and then to layer 4 of neocortical areas via layers 1-to-5-to-6-to-4.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    T-ALL and thymocytes : a message of noncoding RNAs

    Get PDF
    In the last decade, the role for noncoding RNAs in disease was clearly established, starting with microRNAs and later expanded towards long noncoding RNAs. This was also the case for T cell acute lymphoblastic leukemia, which is a malignant blood disorder arising from oncogenic events during normal T cell development in the thymus. By studying the transcriptomic profile of protein-coding genes, several oncogenic events leading to T cell acute lymphoblastic leukemia (T-ALL) could be identified. In recent years, it became apparent that several of these oncogenes function via microRNAs and long noncoding RNAs. In this review, we give a detailed overview of the studies that describe the noncoding RNAome in T-ALL oncogenesis and normal T cell development
    • 

    corecore