21 research outputs found

    Analysis of cross platform power data governance

    Get PDF
    With the rapid development of smart grid, how to solve the problems of professional business cooperation and information sharing, long data input time, accurate data, weak real-time, data extraction, redundant storage, low quality, privacy protection, further comprehensive management of data, mining the value of data resources has become one of the important tasks for the development of electric power enterprises. The traditional method uses edge computing for data transmission and task allocation. On this basis, we study the cross-platform power governance scheme based on edge unloading computing and deep reinforcement learning. The fi nal experimental results show that the scheme has smaller delay and lower energy consumption

    Robust Group Linkage

    Full text link
    We study the problem of group linkage: linking records that refer to entities in the same group. Applications for group linkage include finding businesses in the same chain, finding conference attendees from the same affiliation, finding players from the same team, etc. Group linkage faces challenges not present for traditional record linkage. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, groups can be huge (with tens of thousands of records), requiring high scalability even after using good blocking strategies. We present a two-stage algorithm: the first stage identifies cores containing records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the cores and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets

    Spatio-Temporal Linkage over Location-Enhanced Services

    Full text link

    Advanced Methods for Entity Linking in the Life Sciences

    Get PDF
    The amount of knowledge increases rapidly due to the increasing number of available data sources. However, the autonomy of data sources and the resulting heterogeneity prevent comprehensive data analysis and applications. Data integration aims to overcome heterogeneity by unifying different data sources and enriching unstructured data. The enrichment of data consists of different subtasks, amongst other the annotation process. The annotation process links document phrases to terms of a standardized vocabulary. Annotated documents enable effective retrieval methods, comparability of different documents, and comprehensive data analysis, such as finding adversarial drug effects based on patient data. A vocabulary allows the comparability using standardized terms. An ontology can also represent a vocabulary, whereas concepts, relationships, and logical constraints additionally define an ontology. The annotation process is applicable in different domains. Nevertheless, there is a difference between generic and specialized domains according to the annotation process. This thesis emphasizes the differences between the domains and addresses the identified challenges. The majority of annotation approaches focuses on the evaluation of general domains, such as Wikipedia. This thesis evaluates the developed annotation approaches with case report forms that are medical documents for examining clinical trials. The natural language provides different challenges, such as similar meanings using different phrases. The proposed annotation method, AnnoMap, considers the fuzziness of natural language. A further challenge is the reuse of verified annotations. Existing annotations represent knowledge that can be reused for further annotation processes. AnnoMap consists of a reuse strategy that utilizes verified annotations to link new documents to appropriate concepts. Due to the broad spectrum of areas in the biomedical domain, different tools exist. The tools perform differently regarding a particular domain. This thesis proposes a combination approach to unify results from different tools. The method utilizes existing tool results to build a classification model that can classify new annotations as correct or incorrect. The results show that the reuse and the machine learning-based combination improve the annotation quality compared to existing approaches focussing on the biomedical domain. A further part of data integration is entity resolution to build unified knowledge bases from different data sources. A data source consists of a set of records characterized by attributes. The goal of entity resolution is to identify records representing the same real-world entity. Many methods focus on linking data sources consisting of records being characterized by attributes. Nevertheless, only a few methods can handle graph-structured knowledge bases or consider temporal aspects. The temporal aspects are essential to identify the same entities over different time intervals since these aspects underlie certain conditions. Moreover, records can be related to other records so that a small graph structure exists for each record. These small graphs can be linked to each other if they represent the same. This thesis proposes an entity resolution approach for census data consisting of person records for different time intervals. The approach also considers the graph structure of persons given by family relationships. For achieving qualitative results, current methods apply machine-learning techniques to classify record pairs as the same entity. The classification task used a model that is generated by training data. In this case, the training data is a set of record pairs that are labeled as a duplicate or not. Nevertheless, the generation of training data is a time-consuming task so that active learning techniques are relevant for reducing the number of training examples. The entity resolution method for temporal graph-structured data shows an improvement compared to previous collective entity resolution approaches. The developed active learning approach achieves comparable results to supervised learning methods and outperforms other limited budget active learning methods. Besides the entity resolution approach, the thesis introduces the concept of evolution operators for communities. These operators can express the dynamics of communities and individuals. For instance, we can formulate that two communities merged or split over time. Moreover, the operators allow observing the history of individuals. Overall, the presented annotation approaches generate qualitative annotations for medical forms. The annotations enable comprehensive analysis across different data sources as well as accurate queries. The proposed entity resolution approaches improve existing ones so that they contribute to the generation of qualitative knowledge graphs and data analysis tasks

    Temporal and Contextual Dependencies in Relational Data Modeling

    Get PDF
    Although a solid theoretical foundation of relational data modeling has existed for decades, critical reassessment from temporal requirements’ perspective reveals shortcomings in its integrity constraints. We identify the need for this work by discussing how existing relational databases fail to ensure correctness of data when the data to be stored is time sensitive. The analysis presented in this work becomes particularly important in present times where, because of relational databases’ inadequacy to cater to all the requirements, new forms of database systems such as temporal databases, active databases, real time databases, and NoSQL (non-relational) databases have been introduced. In relational databases, temporal requirements have been dealt with either at application level using scripts or through manual assistance, but no attempts have been made to address them at design level. These requirements are the ones that need changing metadata as the time progresses, which remains unsupported by Relational Database Management System (RDBMS) to date. Starting with shortcomings of data, entity, and referential integrity in relational data modeling, we propose a new form of integrity that works at a more detailed level of granularity. We also present several important concepts including temporal dependency, contextual dependency, and cell level integrity. We then introduce cellular-constraints to implement the proposed integrity and dependencies, and also how they can be incorporated into the relational data model to enable RDBMS to handle temporal requirements in future. Overall, we provide a formal description to address the temporal requirements’ problem in relational data model, and design a framework for solving this problem. We have supplemented our proposition using examples, experiments and results

    Multi-Source Spatial Entity Extraction and Linkage

    Get PDF
    corecore