2,729 research outputs found

    Semantic modelling of user interests based on cross-folksonomy analysis

    Get PDF
    The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combine

    Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures

    Full text link
    Predicting chemical function from structure is a major goal of the chemical sciences, from the discovery and repurposing of novel drugs to the creation of new materials. Recently, new machine learning algorithms are opening up the possibility of general predictive models spanning many different chemical functions. Here, we consider the challenge of applying large language models to chemical patents in order to consolidate and leverage the information about chemical functionality captured by these resources. Chemical patents contain vast knowledge on chemical function, but their usefulness as a dataset has historically been neglected due to the impracticality of extracting high-quality functional labels. Using a scalable ChatGPT-assisted patent summarization and word-embedding label cleaning pipeline, we derive a Chemical Function (CheF) dataset, containing 100K molecules and their patent-derived functional labels. The functional labels were validated to be of high quality, allowing us to detect a strong relationship between functional label and chemical structural spaces. Further, we find that the co-occurrence graph of the functional labels contains a robust semantic structure, which allowed us in turn to examine functional relatedness among the compounds. We then trained a model on the CheF dataset, allowing us to assign new functional labels to compounds. Using this model, we were able to retrodict approved Hepatitis C antivirals, uncover an antiviral mechanism undisclosed in the patent, and identify plausible serotonin-related drugs. The CheF dataset and associated model offers a promising new approach to predict chemical functionality.Comment: Under revie

    Vermeidung von Repräsentationsheterogenitäten in realweltlichen Wissensgraphen

    Get PDF
    Knowledge graphs are repositories providing factual knowledge about entities. They are a great source of knowledge to support modern AI applications for Web search, question answering, digital assistants, and online shopping. The advantages of machine learning techniques and the Web's growth have led to colossal knowledge graphs with billions of facts about hundreds of millions of entities collected from a large variety of sources. While integrating independent knowledge sources promises rich information, it inherently leads to heterogeneities in representation due to a large variety of different conceptualizations. Thus, real-world knowledge graphs are threatened in their overall utility. Due to their sheer size, they are hardly manually curatable anymore. Automatic and semi-automatic methods are needed to cope with these vast knowledge repositories. We first address the general topic of representation heterogeneity by surveying the problem throughout various data-intensive fields: databases, ontologies, and knowledge graphs. Different techniques for automatically resolving heterogeneity issues are presented and discussed, while several open problems are identified. Next, we focus on entity heterogeneity. We show that automatic matching techniques may run into quality problems when working in a multi-knowledge graph scenario due to incorrect transitive identity links. We present four techniques that can be used to improve the quality of arbitrary entity matching tools significantly. Concerning relation heterogeneity, we show that synonymous relations in knowledge graphs pose several difficulties in querying. Therefore, we resolve these heterogeneities with knowledge graph embeddings and by Horn rule mining. All methods detect synonymous relations in knowledge graphs with high quality. Furthermore, we present a novel technique for avoiding heterogeneity issues at query time using implicit knowledge storage. We show that large neural language models are a valuable source of knowledge that is queried similarly to knowledge graphs already solving several heterogeneity issues internally.Wissensgraphen sind eine wichtige Datenquelle von Entitätswissen. Sie unterstützen viele moderne KI-Anwendungen. Dazu gehören unter anderem Websuche, die automatische Beantwortung von Fragen, digitale Assistenten und Online-Shopping. Neue Errungenschaften im maschinellen Lernen und das außerordentliche Wachstum des Internets haben zu riesigen Wissensgraphen geführt. Diese umfassen häufig Milliarden von Fakten über Hunderte von Millionen von Entitäten; häufig aus vielen verschiedenen Quellen. Während die Integration unabhängiger Wissensquellen zu einer großen Informationsvielfalt führen kann, führt sie inhärent zu Heterogenitäten in der Wissensrepräsentation. Diese Heterogenität in den Daten gefährdet den praktischen Nutzen der Wissensgraphen. Durch ihre Größe lassen sich die Wissensgraphen allerdings nicht mehr manuell bereinigen. Dafür werden heutzutage häufig automatische und halbautomatische Methoden benötigt. In dieser Arbeit befassen wir uns mit dem Thema Repräsentationsheterogenität. Wir klassifizieren Heterogenität entlang verschiedener Dimensionen und erläutern Heterogenitätsprobleme in Datenbanken, Ontologien und Wissensgraphen. Weiterhin geben wir einen knappen Überblick über verschiedene Techniken zur automatischen Lösung von Heterogenitätsproblemen. Im nächsten Kapitel beschäftigen wir uns mit Entitätsheterogenität. Wir zeigen Probleme auf, die in einem Multi-Wissensgraphen-Szenario aufgrund von fehlerhaften transitiven Links entstehen. Um diese Probleme zu lösen stellen wir vier Techniken vor, mit denen sich die Qualität beliebiger Entity-Alignment-Tools deutlich verbessern lässt. Wir zeigen, dass Relationsheterogenität in Wissensgraphen zu Problemen bei der Anfragenbeantwortung führen kann. Daher entwickeln wir verschiedene Methoden um synonyme Relationen zu finden. Eine der Methoden arbeitet mit hochdimensionalen Wissensgrapheinbettungen, die andere mit einem Rule Mining Ansatz. Beide Methoden können synonyme Relationen in Wissensgraphen mit hoher Qualität erkennen. Darüber hinaus stellen wir eine neuartige Technik zur Vermeidung von Heterogenitätsproblemen vor, bei der wir eine implizite Wissensrepräsentation verwenden. Wir zeigen, dass große neuronale Sprachmodelle eine wertvolle Wissensquelle sind, die ähnlich wie Wissensgraphen angefragt werden können. Im Sprachmodell selbst werden bereits viele der Heterogenitätsprobleme aufgelöst, so dass eine Anfrage heterogener Wissensgraphen möglich wird

    Language Shift and Development: A Case Study of Zhongdian Southern Khams Language Vitality

    Get PDF
    The fields of language endangerment and maintenance address language shift overwhelmingly in the context of a local language being replaced by that of a surrounding oppositional dominant cultural group. There are, though, situations in which a local language is competing with the dominant variety of the wider cultural group to which it belongs. How a situation like this is dealt with by linguists and language planners depends largely on the recognition of participant tongues as their own languages or one as a dialect of another. Reversing language shift for a “dialect” is difficult to garnish institutional and financial support for if its “mother language” is not endangered. This paper is a case study of one such language: the variety of Tibetan local to Zhongdian, Shangri-la County of Diqing Tibetan Autonomous Prefecture in Yunnan. Through an examination of the linguistic vitality of the local Tibetan variety, I discuss factors that have contributed to the language shift of local Tibetans to the regional Chinese variety. Reviewing Zhongdian Tibetan’s localized maintenance reveals the threat posed to it by Literary Tibetan and other prestigious forms

    A unified data repository for rich communication services

    Get PDF
    Rich Communication Services (RCS) is a framework that defines a set of IP-based services for the delivery of multimedia communications to mobile network subscribers. The framework unifies a set of pre-existing communication services under a single name, and permits network operators to re-use investments in existing network infrastructure, especially the IP Multimedia Subsystem (IMS), which is a core part of a mobile network and also acts as a docking station for RCS services. RCS generates and utilises disparate subscriber data sets during execution, however, it lacks a harmonised repository for the management of such data sets, thus making it difficult to obtain a unified view of heterogeneous subscriber data. This thesis proposes the creation of a unified data repository for RCS which is based on the User Data Convergence (UDC) standard. The standard was proposed by the 3rd Generation Partnership Project (3GPP), a major telecommunications standardisation group. UDC provides an approach for consolidating subscriber data into a single logical repository without adversely affecting existing network infrastructure, such as the IMS. Thus, this thesis details the design and development of a prototypical implementation of a unified repository, named Converged Subscriber Data Repository (CSDR). It adopts a polyglot persistence model for the underlying data store and exposes heterogeneous data through the Open Data Protocol (OData), which is a candidate implementation of the Ud interface defined in the UDC architecture. With the introduction of polyglot persistence, multiple data stores can be used within the CSDR and disparate network data sources can access heterogeneous data sets using OData as a standard communications protocol. As the CSDR persistence model becomes more complex due to the inclusion of more storage technologies, polyglot persistence ensures a consistent conceptual view of these data sets through OData. Importantly, the CSDR prototype was integrated into a popular open-source implementation of the core part of an IMS network known as the Open IMS Core. The successful integration of the prototype demonstrates its ability to manage and expose a consolidated view of heterogeneous subscriber data, which are generated and used by different RCS services deployed within IMS

    Concentrated phases of colloids or nanoparticles: Solid pressure and dynamics of concentration processes

    Get PDF
    The behaviour of colloids concentrated at an interface is strongly affected by surface interactions occurring on the nanoscale between colloids and between the colloids and the surface. For instance, processes such as ultrafiltration, nanofiltration or reverse osmosis which are classically used to purify, eliminate and concentrate colloids or nanoparticles strongly depend on these interfacial phenomena. The level of fouling, its kinetics or even the way colloids build up (porosity, hydraulic resistance or accumulation reversibility) are driven by colloidal properties. It is therefore necessary to establish experimental and theoretical connections between colloidal properties at the local (micro) scale and the efficiency of the concentration process; this knowledge being compulsory for the control of numerous processes dealing with nanoparticles. In this chapter, our aim is to show how the concept of solid pressure can be a good vehicle to account for particle-particle interactions in the macroscopic description of separation processes like membrane filtration, sedimentation or drying. It will be shown how solid pressure (or to be more precise its variation with the volume fraction) can be related to the way a suspension resists an increase in concentration. We shall also try to illustrate in which way the change of state of the suspension, can be put in relation to the various forms of accumulation on a surface, both from a theoretical and an experimental point of view. The resistance of the dispersion to over-concentration will be defined by osmotic pressure variation and this resistance will be linked to a Péclet number relative to filtration conditions. The importance of a critical volume fraction inducing a phase transition between fluid (dispersed) and solid (condensed) phases will be underlined in order to describe the transition between accumulation phenomena of polarisation concentration and of dense layer formation. The consequence of this transition on the reversibility of accumulated layers will be discussed. The chapter will then investigate how the variation of osmotic pressure with volume fraction can explain the variety of accumulation phenomena occurring at an interface

    Mapping Phenomena Relevant to Adolescent Emotion Regulation: A Text-Mining Systematic Review

    Get PDF
    Adolescence is a developmentally sensitive period for emotion regulation with potentially lifelong implications for mental health and well-being. Although substantial empirical research has addressed this topic, the literature is fragmented across subdisciplines, and an overarching theoretical framework is lacking. The first step toward constructing a unifying framework is identifying relevant phenomena. This systematic review of 6305 articles used text mining to identify phenomena relevant to adolescents’ emotion regulation. First, a baseline was established of relevant phenomena discussed in theory and recent narrative reviews. Then, article keywords and abstracts were analyzed using text mining, examining term frequency as an indicator of relevance and term co-occurrence as an indicator of association. The results reflected themes commonly featured in theory and narrative reviews, such as socialization and neurocognitive development, but also identified undertheorized themes, such as developmental disorders, physical health, external stressors, structural disadvantage, substance use, identity and moral development, and sexual development. The findings illustrate how text mining systematic reviews, a novel approach, may complement narrative reviews. Future theoretical work might integrate these undertheorized themes into an overarching framework, and empirical research might consider them as promising areas for future research, or as potential confounders in research on adolescents’ emotion regulation

    A abordagem POESIA para a integração de dados e serviços na Web semantica

    Get PDF
    Orientador: Claudia Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: POESIA (Processes for Open-Ended Systems for lnformation Analysis), a abordagem proposta neste trabalho, visa a construção de processos complexos envolvendo integração e análise de dados de diversas fontes, particularmente em aplicações científicas. A abordagem é centrada em dois tipos de mecanismos da Web semântica: workflows científicos, para especificar e compor serviços Web; e ontologias de domínio, para viabilizar a interoperabilidade e o gerenciamento semânticos dos dados e processos. As principais contribuições desta tese são: (i) um arcabouço teórico para a descrição, localização e composição de dados e serviços na Web, com regras para verificar a consistência semântica de composições desses recursos; (ii) métodos baseados em ontologias de domínio para auxiliar a integração de dados e estimar a proveniência de dados em processos cooperativos na Web; (iii) implementação e validação parcial das propostas, em urna aplicação real no domínio de planejamento agrícola, analisando os benefícios e as limitações de eficiência e escalabilidade da tecnologia atual da Web semântica, face a grandes volumes de dadosAbstract: POESIA (Processes for Open-Ended Systems for Information Analysis), the approach proposed in this work, supports the construction of complex processes that involve the integration and analysis of data from several sources, particularly in scientific applications. This approach is centered in two types of semantic Web mechanisms: scientific workflows, to specify and compose Web services; and domain ontologies, to enable semantic interoperability and management of data and processes. The main contributions of this thesis are: (i) a theoretical framework to describe, discover and compose data and services on the Web, inc1uding mIes to check the semantic consistency of resource compositions; (ii) ontology-based methods to help data integration and estimate data provenance in cooperative processes on the Web; (iii) partial implementation and validation of the proposal, in a real application for the domain of agricultural planning, analyzing the benefits and scalability problems of the current semantic Web technology, when faced with large volumes of dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    SWKM 2008: Social Web and Knowledge Management, Proceedings:CEUR Workshop Proceedings

    Get PDF
    corecore