6,774 research outputs found

    Temporal Information in Data Science: An Integrated Framework and its Applications

    Get PDF
    Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems.Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems

    "Going back to our roots": second generation biocomputing

    Full text link
    Researchers in the field of biocomputing have, for many years, successfully "harvested and exploited" the natural world for inspiration in developing systems that are robust, adaptable and capable of generating novel and even "creative" solutions to human-defined problems. However, in this position paper we argue that the time has now come for a reassessment of how we exploit biology to generate new computational systems. Previous solutions (the "first generation" of biocomputing techniques), whilst reasonably effective, are crude analogues of actual biological systems. We believe that a new, inherently inter-disciplinary approach is needed for the development of the emerging "second generation" of bio-inspired methods. This new modus operandi will require much closer interaction between the engineering and life sciences communities, as well as a bidirectional flow of concepts, applications and expertise. We support our argument by examining, in this new light, three existing areas of biocomputing (genetic programming, artificial immune systems and evolvable hardware), as well as an emerging area (natural genetic engineering) which may provide useful pointers as to the way forward.Comment: Submitted to the International Journal of Unconventional Computin

    Model checking the evolution of gene regulatory networks

    Get PDF
    The behaviour of gene regulatory networks (GRNs) is typically analysed using simulation-based statistical testing-like methods. In this paper, we demonstrate that we can replace this approach by a formal verification-like method that gives higher assurance and scalability. We focus on Wagner’s weighted GRN model with varying weights, which is used in evolutionary biology. In the model, weight parameters represent the gene interaction strength that may change due to genetic mutations. For a property of interest, we synthesise the constraints over the parameter space that represent the set of GRNs satisfying the property. We experimentally show that our parameter synthesis procedure computes the mutational robustness of GRNs—an important problem of interest in evolutionary biology—more efficiently than the classical simulation method. We specify the property in linear temporal logic. We employ symbolic bounded model checking and SMT solving to compute the space of GRNs that satisfy the property, which amounts to synthesizing a set of linear constraints on the weights

    Clonality and evolutionary history of rhabdomyosarcoma.

    Full text link
    To infer the subclonality of rhabdomyosarcoma (RMS) and predict the temporal order of genetic events for the tumorigenic process, and to identify novel drivers, we applied a systematic method that takes into account germline and somatic alterations in 44 tumor-normal RMS pairs using deep whole-genome sequencing. Intriguingly, we find that loss of heterozygosity of 11p15.5 and mutations in RAS pathway genes occur early in the evolutionary history of the PAX-fusion-negative-RMS (PFN-RMS) subtype. We discover several early mutations in non-RAS mutated samples and predict them to be drivers in PFN-RMS including recurrent mutation of PKN1. In contrast, we find that PAX-fusion-positive (PFP) subtype tumors have undergone whole-genome duplication in the late stage of cancer evolutionary history and have acquired fewer mutations and subclones than PFN-RMS. Moreover we predict that the PAX3-FOXO1 fusion event occurs earlier than the whole genome duplication. Our findings provide information critical to the understanding of tumorigenesis of RMS

    The Ah receptor: adaptive metabolism, ligand diversity, and the xenokine model

    Get PDF
    Author Posting. © American Chemical Society, 2020. This is an open access article published under an ACS AuthorChoice License. The definitive version was published in Chemical Research in Toxicology, 33(4), (2020): 860-879, doi:10.1021/acs.chemrestox.9b00476.The Ah receptor (AHR) has been studied for almost five decades. Yet, we still have many important questions about its role in normal physiology and development. Moreover, we still do not fully understand how this protein mediates the adverse effects of a variety of environmental pollutants, such as the polycyclic aromatic hydrocarbons (PAHs), the chlorinated dibenzo-p-dioxins (“dioxins”), and many polyhalogenated biphenyls. To provide a platform for future research, we provide the historical underpinnings of our current state of knowledge about AHR signal transduction, identify a few areas of needed research, and then develop concepts such as adaptive metabolism, ligand structural diversity, and the importance of proligands in receptor activation. We finish with a discussion of the cognate physiological role of the AHR, our perspective on why this receptor is so highly conserved, and how we might think about its cognate ligands in the future.This review is dedicated in memory of the career of Alan Poland, one of the truly great minds in pharmacology and toxicology. This work was supported by the National Institutes of Health Grants R35-ES028377, T32-ES007015, P30-CA014520, P42-ES007381, and U01-ES1026127, The UW SciMed GRS Program, and The Morgridge Foundation. The authors would like to thank Catherine Stanley of UW Media Solutions for her artwork

    Evolution of the Karyopherin-β Family of Nucleocytoplasmic Transport Factors; Ancient Origins and Continued Specialization

    Get PDF
    Macromolecular transport across the nuclear envelope (NE) is achieved through nuclear pore complexes (NPCs) and requires karyopherin-βs (KAP-βs), a family of soluble receptors, for recognition of embedded transport signals within cargo. We recently demonstrated, through proteomic analysis of trypanosomes, that NPC architecture is likely highly conserved across the Eukaryota, which in turn suggests conservation of the transport mechanisms. To determine if KAP-β diversity was similarly established early in eukaryotic evolution or if it was subsequently layered onto a conserved NPC, we chose to identify KAP-β sequences in a diverse range of eukaryotes and to investigate their evolutionary history.Thirty six predicted proteomes were scanned for candidate KAP-β family members. These resulting sequences were resolved into fifteen KAP-β subfamilies which, due to broad supergroup representation, were most likely represented in the last eukaryotic common ancestor (LECA). Candidate members of each KAP-β subfamily were found in all eukaryotic supergroups, except XPO6, which is absent from Archaeplastida. Phylogenetic reconstruction revealed the likely evolutionary relationships between these different subfamilies. Many species contain more than one representative of each KAP-β subfamily; many duplications are apparently taxon-specific but others result from duplications occurring earlier in eukaryotic history.At least fifteen KAP-β subfamilies were established early in eukaryote evolution and likely before the LECA. In addition we identified expansions at multiple stages within eukaryote evolution, including a multicellular plant-specific KAP-β, together with frequent secondary losses. Taken with evidence for early establishment of NPC architecture, these data demonstrate that multiple pathways for nucleocytoplasmic transport were established prior to the radiation of modern eukaryotes but that selective pressure continues to sculpt the KAP-β family

    Synthetic biology—putting engineering into biology

    Get PDF
    Synthetic biology is interpreted as the engineering-driven building of increasingly complex biological entities for novel applications. Encouraged by progress in the design of artificial gene networks, de novo DNA synthesis and protein engineering, we review the case for this emerging discipline. Key aspects of an engineering approach are purpose-orientation, deep insight into the underlying scientific principles, a hierarchy of abstraction including suitable interfaces between and within the levels of the hierarchy, standardization and the separation of design and fabrication. Synthetic biology investigates possibilities to implement these requirements into the process of engineering biological systems. This is illustrated on the DNA level by the implementation of engineering-inspired artificial operations such as toggle switching, oscillating or production of spatial patterns. On the protein level, the functionally self-contained domain structure of a number of proteins suggests possibilities for essentially Lego-like recombination which can be exploited for reprogramming DNA binding domain specificities or signaling pathways. Alternatively, computational design emerges to rationally reprogram enzyme function. Finally, the increasing facility of de novo DNA synthesis—synthetic biology’s system fabrication process—supplies the possibility to implement novel designs for ever more complex systems. Some of these elements have merged to realize the first tangible synthetic biology applications in the area of manufacturing of pharmaceutical compounds.
    corecore