7,959 research outputs found

    Infinite factorization of multiple non-parametric views

    Get PDF
    Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering setting, by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation

    Knowledge graph embedding for experimental uncertainty estimation

    Get PDF
    Purpose: Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model performance. Uncertainty inherently affects experiment measurements and is often missing in the available data sets due to its estimation cost. For similar reasons, experiments are very few compared to other data sources. Discarding experiments based on the missing uncertainty values would preclude the development of predictive models. Data profiling techniques are fundamental to assess data quality, but some data quality dimensions are challenging to evaluate without knowing the uncertainty. In this context, this paper aims to predict the missing uncertainty of the experiments. Design/methodology/approach: This work presents a methodology to forecast the experiments’ missing uncertainty, given a data set and its ontological description. The approach is based on knowledge graph embeddings and leverages the task of link prediction over a knowledge graph representation of the experiments database. The validity of the methodology is first tested in multiple conditions using synthetic data and then applied to a large data set of experiments in the chemical kinetic domain as a case study. Findings: The analysis results of different test case scenarios suggest that knowledge graph embedding can be used to predict the missing uncertainty of the experiments when there is a hidden relationship between the experiment metadata and the uncertainty values. The link prediction task is also resilient to random noise in the relationship. The knowledge graph embedding outperforms the baseline results if the uncertainty depends upon multiple metadata. Originality/value: The employment of knowledge graph embedding to predict the missing experimental uncertainty is a novel alternative to the current and more costly techniques in the literature. Such contribution permits a better data quality profiling of scientific repositories and improves the development process of data-driven models based on scientific experiments

    Knowledge-based systems and geological survey

    Get PDF
    This personal and pragmatic review of the philosophy underpinning methods of geological surveying suggests that important influences of information technology have yet to make their impact. Early approaches took existing systems as metaphors, retaining the separation of maps, map explanations and information archives, organised around map sheets of fixed boundaries, scale and content. But system design should look ahead: a computer-based knowledge system for the same purpose can be built around hierarchies of spatial objects and their relationships, with maps as one means of visualisation, and information types linked as hypermedia and integrated in mark-up languages. The system framework and ontology, derived from the general geoscience model, could support consistent representation of the underlying concepts and maintain reference information on object classes and their behaviour. Models of processes and historical configurations could clarify the reasoning at any level of object detail and introduce new concepts such as complex systems. The up-to-date interpretation might centre on spatial models, constructed with explicit geological reasoning and evaluation of uncertainties. Assuming (at a future time) full computer support, the field survey results could be collected in real time as a multimedia stream, hyperlinked to and interacting with the other parts of the system as appropriate. Throughout, the knowledge is seen as human knowledge, with interactive computer support for recording and storing the information and processing it by such means as interpolating, correlating, browsing, selecting, retrieving, manipulating, calculating, analysing, generalising, filtering, visualising and delivering the results. Responsibilities may have to be reconsidered for various aspects of the system, such as: field surveying; spatial models and interpretation; geological processes, past configurations and reasoning; standard setting, system framework and ontology maintenance; training; storage, preservation, and dissemination of digital records

    -ilities Tradespace and Affordability Project – Phase 3

    Get PDF
    One of the key elements of the SERC’s research strategy is transforming the practice of systems engineering and associated management practices – “SE and Management Transformation (SEMT).” The Grand Challenge goal for SEMT is to transform the DoD community’s current systems engineering and management methods, processes, and tools (MPTs) and practices away from sequential, single stovepipe system, hardware-first, document-driven, point- solution, acquisition-oriented approaches; and toward concurrent, portfolio and enterprise- oriented, hardware-software-human engineered, model-driven, set-based, full life cycle approaches.This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Office of the Assistant Secretary of Defense for Research and Engineering (ASD(R&E)) under Contract H98230-08- D-0171 (Task Order 0031, RT 046).This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Office of the Assistant Secretary of Defense for Research and Engineering (ASD(R&E)) under Contract H98230-08- D-0171 (Task Order 0031, RT 046)

    LIFE CYCLE INFORMATION MODELS WITH PARAMETER UNCERTAINTY ANALYSIS TO FACILITATE THE USE OF LIFE-CYCLE ASSESSMENT OUTCOMES IN PAVEMENT DESIGN DECISION-MAKING

    Get PDF
    The objective of this dissertation is to develop Life Cycle Information Models (LCIMs) to promote consistent and credible communication of potential environmental impacts quantified through Life-Cycle Assessment (LCA) methodology. The introduction of Life Cycle Information Models (LCIMs) will shift the focus of pavement LCA stakeholders to collect reliable foreground data and adapt to consistent background data present within LCIMs. LCA methodology requires significant Life Cycle Inventory (LCI) data to model real world systems and quantify potential environmental impacts. The lack of guidance in ISO standards on consistently compiling LCI data and defining protocols for modeling lowers the reliability of LCA outcomes. In addition, LCA outcomes are communicated as point estimates despite the variations associated with input data. These limitations provided two motivations for this dissertation. The first motivation is to develop an information modeling approach to support the formal specification of relationships between pavement LCA flows and processes, while mapping them to a consistent set of background LCI and foreground process parameters. The second motivation is to develop the margins of error within LCA outcomes by propagating different types of uncertainties. An illustration of the discussed methodology is provided for the case of Hot-Mix Asphalt (HMA) mixtures containing varying amounts of Reclaimed Asphalt Pavement (RAP) and Recycled Asphalt Shingles (RAS). LCIMs serve as a building block for a complete LCA and formalizing the underlying model and upstream datasets. This builds trust among pavement LCA stakeholders by promoting the use of consistent underlying relationships between unit product systems, processes, and flows within pavement LCA system boundary and mapping them to consistent, transparent public background datasets. Pavement LCA stakeholders are empowered to develop context-specific LCA outcomes using LCIMs and can reliably incorporate these outcomes within decision-making by highlighting the margins of error associated with the results. The methodology discussed in this dissertation is timely with emerging legislations such as the Buy Clean Act (2017) in California that requires highway construction contractors to produce LCA based Environmental Product Declarations (EPDs), at the point of installation, for a list of all eligible construction materials

    Cost engineering for manufacturing: current and future research

    Get PDF
    The article aims to identify the scientific challenges and point out future research directions on Cost Engineering. The research areas covered in this article include Design Cost; Manufacturing Cost; Operating Cost; Life Cycle Cost; Risk and Uncertainty management and Affordability Engineering. Collected information at the Academic Forum on Cost Engineering held at Cranfield University in 2008 and further literature review findings are presented. The forum set the scope of the Cost Engineering research, a brainstorming was held on the forum and literatures were further reviewed to understand the current and future practices in cost engineering. The main benefits of the article include coverage of the current research on cost engineering from different perspectives and the future research areas on Cost Engineering
    corecore