51 research outputs found
Neuro-Symbolic Deductive Reasoning for Cross-Knowledge Graph Entailment
A significant and recent development in neural-symbolic learning are deep neural networks that can reason over symbolic knowledge graphs (KGs). A particular task of interest is KG entailment, which is to infer the set of all facts that are a logical consequence of current and potential facts of a KG. Initial neural-symbolic systems that can deduce the entailment of a KG have been presented, but they are limited: current systems learn fact relations and entailment patterns specific to a particular KG and hence do not truly generalize, and must be retrained for each KG they are tasked with entailing. We propose a neural-symbolic system to address this limitation in this paper. It is designed as a differentiable end-to-end deep memory network that learns over abstract, generic symbols to discover entailment patterns common to any reasoning task. A key component of the system is a simple but highly effective normalization process for continuous representation learning of KG entities within memory networks. Our results show how the model, trained over a set of KGs, can effectively entail facts from KGs excluded from the training, even when the vocabulary or the domain of test KGs is completely different from the training KGs
A Simple Standard for Sharing Ontological Mappings (SSSOM).
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec
Analysis of Water Balance Changes and Parameterization Reflecting Soil Characteristics in a Hydrological Simulation Program—FORTRAN Model
Efficient water resource management requires accurate analyses of hydrological components and water balance. The Hydrological Simulation Program—FORTRAN (HSPF) model serves this purpose at the watershed scale. It has limited accuracy in calculating runoff and infiltration because the model simulates hydrological processes using one representative parameter for each land use in the watershed. Accuracy requires field-scale analysis of hydrological components. We calculated the lower zone storage nominal parameter, which markedly affects runoff in HSPF, from effective moisture content and depth of each soil layer. Analysis of hydrological components suggested re-calculating the parameters reflecting soil characteristics. We investigated two scenarios through simulations: Scenario 1 used the existing method. Scenario 2 used parameters that reflected soil properties. Total flows for each sub-catchment were identical, but proportions of direct and intermediate runoff were larger in Scenario 1. Ratios of baseflow, evapotranspiration, and infiltration were larger in Scenario 2, reflecting soil characteristics. Comparing the baseflow ratio to total flow, Scenario 2 values were similar to observed values. Comparisons of R2 and Nash–Sutcliffe Efficiency (NSE) at the end of the watershed were well matched (R2 and NSE are higher than 0.9) in both scenarios, but proportions of each hydrological component differed. It is important to consider soil characteristics when applying water quantity and quality analyses in an HSPF simulation
Analysis of Water Balance Changes and Parameterization Reflecting Soil Characteristics in a Hydrological Simulation Program—FORTRAN Model
Efficient water resource management requires accurate analyses of hydrological components and water balance. The Hydrological Simulation Program—FORTRAN (HSPF) model serves this purpose at the watershed scale. It has limited accuracy in calculating runoff and infiltration because the model simulates hydrological processes using one representative parameter for each land use in the watershed. Accuracy requires field-scale analysis of hydrological components. We calculated the lower zone storage nominal parameter, which markedly affects runoff in HSPF, from effective moisture content and depth of each soil layer. Analysis of hydrological components suggested re-calculating the parameters reflecting soil characteristics. We investigated two scenarios through simulations: Scenario 1 used the existing method. Scenario 2 used parameters that reflected soil properties. Total flows for each sub-catchment were identical, but proportions of direct and intermediate runoff were larger in Scenario 1. Ratios of baseflow, evapotranspiration, and infiltration were larger in Scenario 2, reflecting soil characteristics. Comparing the baseflow ratio to total flow, Scenario 2 values were similar to observed values. Comparisons of R2 and Nash–Sutcliffe Efficiency (NSE) at the end of the watershed were well matched (R2 and NSE are higher than 0.9) in both scenarios, but proportions of each hydrological component differed. It is important to consider soil characteristics when applying water quantity and quality analyses in an HSPF simulation
Recommended from our members
Gene Set Summarization Using Large Language Models.
Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB. TALISMAN (Terminological ArtificiaL Intelligence SuMmarization of Annotation and Narratives) uses generative AI to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct retrieval from the model. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for an input gene set. However, LLM-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, in our experiments these methods were rarely able to recapitulate the most precise and informative term from standard enrichment analysis. We also observe minor differences depending on prompt input information, with GO term descriptions leading to higher recall but lower precision. However, newer LLM models perform statistically significantly better than the oldest model across all performance metrics, suggesting that future models may lead to further improvements. Overall, the results are nondeterministic, with minor variations in prompt resulting in radically different term lists, true to the stochastic nature of LLMs. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis, however they may provide summarization benefits for implicit knowledge integration across extant but unstandardized knowledge, for large sets of features, and where the amount of information is difficult for humans to process
- …