177 research outputs found

    FEDRO: a software tool for the automatic discovery of candidate ORFs in plants with c →u RNA editing

    Get PDF
    BACKGROUND: RNA editing is an important mechanism for gene expression in plants organelles. It alters the direct transfer of genetic information from DNA to proteins, due to the introduction of differences between RNAs and the corresponding coding DNA sequences. Software tools successful for the search of genes in other organisms not always are able to correctly perform this task in plants organellar genomes. Moreover, the available software tools predicting RNA editing events utilise algorithms that do not account for events which may generate a novel start codon. RESULTS: We present Fedro, a Java software tool implementing a novel strategy to generate candidate Open Reading Frames (ORFs) resulting from Cytidine to Uridine (c→u) editing substitutions which occur in the mitochondrial genome (mtDNA) of a given input plant. The goal is to predict putative proteins of plants mitochondria that have not been yet annotated. In order to validate the generated ORFs, a screening is performed by checking for sequence similarity or presence in active transcripts of the same or similar organisms. We illustrate the functionalities of our framework on a model organism. CONCLUSIONS: The proposed tool may be used also on other organisms and genomes. Fedro is publicly available at http://math.unipa.it/rombo/FEDRO

    A Survey on Explainable Anomaly Detection

    Full text link
    In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from Data (TKDD) for publication (preprint version

    Why is this an anomaly? Explaining anomalies using sequential explanations

    Get PDF
    In most applications, anomaly detection operates in an unsupervised mode by looking for outliers hoping that they are anomalies. Unfortunately, most anomaly detectors do not come with explanations about which features make a detected outlier point anomalous. Therefore, it requires human analysts to manually browse through each detected outlier point’s feature space to obtain the subset of features that will help them determine whether they are genuinely anomalous or not. This paper introduces sequential explanation (SE) methods that sequentially explain to the analyst which features make the detected outlier anomalous. We present two methods for computing SEs called the outlier and sample-based SE that will work alongside any anomaly detector. The outlier-based SE methods use an anomaly detector’s outlier scoring measure guided by a search algorithm to compute the SEs. Meanwhile, the sample-based SE methods employ sampling to turn the problem into a classical feature selection problem. In our experiments, we compare the performances of the different outlier- and sample-based SEs. Our results show that both the outlier and sample-based methods compute SEs that perform well and outperform sequential feature explanations.http://www.elsevier.com/locate/patcoghj2021Computer Scienc

    Fluorescent Labeling, Co-Tracking, and Quantification of RNA In Cellulo.

    Full text link
    RNA plays a fundamental, pervasive role in cellular physiology, through the maintenance and controlled readout of all genetic information, a functional landscape we are only beginning to understand. In particular, the cellular mechanisms for the spatiotemporal control of the plethora of RNAs are still poorly understood. Intracellular single-molecule fluorescence microscopy provides a powerful emerging tool for probing the pertinent biophysical and biochemical parameters that govern cellular RNA functions, including those of protein-encoding mRNAs. Yet progress has been hampered by the scarcity of high-yield, efficient methods to fluorescently label RNA molecules without the need to drastically increase their molecular weight through artificial appendages that may result in altered behavior. Herein, we employ a series of in vitro enzymatic techniques to efficiently, extensively and in high-yield, incorporate chemically modified nucleoside triphosphates into a transcribed messenger RNA body, between its body and tail (BBT), or randomly throughout the poly(A) tail (tail). Of these, BBT and tail modified strategies proved the most promising methods to functionally label messenger RNA and single-particle track their behaviors using our in-house single-molecule assay: intracellular single-molecule high resolution localization and counting (iSHiRLoC). From this research also was spawned a novel method to anchor an RNA to the actin cytoskeleton for the study of long-term interactions within a cellular context, termed: Gene-Actin Tethered Intracellular Co-tracking Assay (GATICA). Here, biotinylated RNA is tethered to the actin surface, either through complexation with a streptavidin coupled to a biotinylated phalloidin molecule or actin protein. Taken together, this body of work represents strategies for the labeling and visualizing, both freely diffusing and actin tethered, long-RNAs and their interactome in real-time.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135832/1/tcuster_1.pd

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Application of decision trees and multivariate regression trees in design and optimization

    Get PDF
    Induction of decision trees and regression trees is a powerful technique not only for performing ordinary classification and regression analysis but also for discovering the often complex knowledge which describes the input-output behavior of a learning system in qualitative forms;In the area of classification (discrimination analysis), a new technique called IDea is presented for performing incremental learning with decision trees. It is demonstrated that IDea\u27s incremental learning can greatly reduce the spatial complexity of a given set of training examples. Furthermore, it is shown that this reduction in complexity can also be used as an effective tool for improving the learning efficiency of other types of inductive learners such as standard backpropagation neural networks;In the area of regression analysis, a new methodology for performing multiobjective optimization has been developed. Specifically, we demonstrate that muitiple-objective optimization through induction of multivariate regression trees is a powerful alternative to the conventional vector optimization techniques. Furthermore, in an attempt to investigate the effect of various types of splitting rules on the overall performance of the optimizing system, we present a tree partitioning algorithm which utilizes a number of techniques derived from diverse fields of statistics and fuzzy logic. These include: two multivariate statistical approaches based on dispersion matrices, an information-theoretic measure of covariance complexity which is typically used for obtaining multivariate linear models, two newly-formulated fuzzy splitting rules based on Pearson\u27s parametric and Kendall\u27s nonparametric measures of association, Bellman and Zadeh\u27s fuzzy decision-maximizing approach within an inductive framework, and finally, the multidimensional extension of a widely-used fuzzy entropy measure. The advantages of this new approach to optimization are highlighted by presenting three examples which respectively deal with design of a three-bar truss, a beam, and an electric discharge machining (EDM) process

    STRUCTURAL STUDIES ON CELL ENTRY OF RESPIRATORY ENTEROVIRUSES

    Get PDF
    Enteroviruses (EVs) represent a group of non-enveloped, positive strand RN

    Structural base for the transfer of GPI-anchored proteins into fungal cell walls

    Get PDF
    Fungi, such as the unicellular model organism Saccharomyces cerevisiae, possess a thick cell wall composed of polysaccharides and proteins, which is essential for viable and healthy cells. While the mere synthesis of its components happens along the secretory pathway and at the plasma membrane, the correct processing is established on the exterior side of the plasma membrane. This is realized by a set of enzymes, called glycoside hydrolases (GH), which act on the hydrolysis and rearrangement of glycosidic bonds. In S. cerevisiae and the human pathogen Candida albicans, it has been shown that members of the GH76 family (Dfg5-subfamily) carry out the incorporation of GPI-anchored proteins into the cell wall, which is essential for these organisms. Although bacterial homologs of that class were already described, our understanding of the fungal counterparts and their underlying mechanism with its exceptional potential as a drug target still lack behind. In order to fill this gap, the GH76 family has been subjected initially to phylogenetic analysis providing insights into its multifunctional character with up to ten different subfamilies. The exact role of Dfg5-proteins could be explained by an in-depth structural and functional analysis of one of its members, CtDfg5, from the thermophilic mold Chaetomium thermophilum. Its crystal structure determined at atomic resolution showed that the overall fold and the active site motif is shared between fungal and bacterial homologs, however an annotated function as α1,6-mannanases could not be shown in vitro. Instead it was possible to reassemble the GPI-core glycan structure (Manα1,2-Manα1,6-Manα1,4-GlcN) within the substrate-binding pocket of CtDfg5 by screening crystals with high molar sugar-fragments. This did not only provide a detailed view on the true substrate of Dfg5-proteins, but also first experimentally derived insights into the three-dimensional architecture of the GPI anchor glycan. Together with the complex structure of a putative acceptor molecule, a lipid-to-wall transfer mechanism catalyzed by Dfg5-proteins could be derived. Moreover, the structural insights suggested a possible way of using different GPI-modifications as a coding system to determine the final localization of GPI-anchored proteins. Furthermore, structure-based docking of a commercially available lead library helped to identify a small molecule (FP-1) that binds to the active site of CtDfg5. FP-1 shows specific effects at milli molar concentrations in terms of the viability of the model organism S. cerevisiae, assuming a high potential for further drug development. Finally, another fungal homolog from a so far uncharacterized subfamily showed that all GH76 family members recognize α1,6 mannobiose as a central element of cognate substrates
    • …
    corecore