3,669 research outputs found

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

    Individual Tariffs for Mobile Services: Theoretical Framework and a Computational Case in Mobile Music

    Get PDF
    This paper introduces individual tariffs at service and content bundle level in mobile communications. It gives a theoretical framework (economic, sociological) as well as a computational game solution method. The user can be an individual or a community. Individual tariffs are decided through interactions between the user and the supplier. A numerical example from mobile music illustrates the concepts.risks;mobile communication services;Individual tariffs;computational games

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and 
);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    Get PDF
    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    CROSS-DISCIPLINARY COLLABORATIONS IN DATA QUALITY RESEARCH

    Get PDF
    Data Quality has been the target of research and development for over four decades, and due to its cross-disciplinary nature has been approached by business analysts, solution architects, database experts and statisticians to name a few. As data quality increases in importance and complexity, there is a need to motivate the exploitation of synergies across diverse research communities in order to form holistic solutions that span across its organizational, architectural and computational aspects. As a first step towards bridging gaps between the various research communities, we undertook a comprehensive literature study of data quality research published in the last two decades. In this study we considered a broad range of Information System (IS) and Computer Science (CS) publication outlets. The main aims of the study were to understand the current landscape of data quality research, create better awareness of (lack of) synergies between various research communities, and, subsequently, direct attention towards holistic solutions. In this paper, we present a summary of the findings from the study that outline the overlaps and distinctions between the two communities from various points of view, including publication outlets, topics and themes of research, highly cited or influential contributors and strength and nature of co-authorship networks

    Learning Algorithm to Automate Fast Author Name Disambiguation

    Get PDF
    RÉSUMÉ : La production scientifique mondiale reprĂ©sente une quantitĂ© massive d’enregistrements auxquels on peut accĂ©der via de nombreuses bases de donnĂ©es. En raison de la prĂ©sence d’enregistrements ambigus, un processus de dĂ©sambiguĂŻsation efficace dans un dĂ©lai raisonnable est nĂ©cessaire comme Ă©tape essentielle pour extraire l’information correcte et gĂ©nĂ©rer des statistiques de publication. Cependant, la tĂąche de dĂ©sambiguĂŻsation est exhaustive et complexe en raison des bases de donnĂ©es volumineuses et des donnĂ©es manquantes. Actuellement, il n’existe pas de mĂ©thode automatique complĂšte capable de produire des rĂ©sultats satisfaisants pour le processus de dĂ©sambiguĂŻsation. Auparavant, une application efficace de dĂ©sambiguĂŻsation d’entitĂ© a Ă©tĂ© dĂ©veloppĂ©e, qui est un algorithme en cascade supervisĂ© donnant des rĂ©sultats prometteurs sur de grandes bases de donnĂ©es bibliographiques. Bien que le travail existant produise des rĂ©sultats de haute qualitĂ© dans un dĂ©lai de traitement raisonnable, il manque un choix efficace de mĂ©triques et la structure des classificateurs est dĂ©terminĂ©e d’une maniĂšre heuristique par l’analyse des erreurs de prĂ©cision et de rappel. De toute Ă©vidence, une approche automatisĂ©e qui rend l’application flexible et rĂ©glable amĂ©liorerait directement la convivialitĂ© de l’application. Une telle approche permettrait de comprendre l’importance de chaque classification d’attributs dans le processus de dĂ©sambiguĂŻsation et de sĂ©lectionner celles qui sont les plus performantes. Dans cette recherche, nous proposons un algorithme d’apprentissage pour automatiser le processus de dĂ©sambiguĂŻsation de cette application. Pour atteindre nos objectifs, nous menons trois Ă©tapes majeures: premiĂšrement, nous abordons le problĂšme d’évaluation des algorithmes de codage phonĂ©tique qui peuvent ĂȘtre utilisĂ©s dans le blocking. Six algorithmes de codage phonĂ©tique couramment utilisĂ©s ont Ă©tĂ© sĂ©lectionnĂ©s et des mesures d’évaluation quantitative spĂ©cifiques ont Ă©tĂ© dĂ©veloppĂ©es afin d’évaluer leurs limites et leurs avantages et de recruter le meilleur. DeuxiĂšmement, nous testons diffĂ©rentes mesures de similaritĂ© de chaĂźne de caractĂšres et nous analysons les avantages et les inconvĂ©nients de chaque technique. En d’autres termes, notre deuxiĂšme objectif est de construire une mĂ©thode de dĂ©sambiguĂŻsation efficace en comparant plusieurs algorithmes basĂ©s sur les edits et les tokens pour amĂ©liorer la mĂ©thode du blocking. Enfin, en utilisant les mĂ©thodes d’agrĂ©gation bootstrap (Bagging) et AdaBoost, un algorithme a Ă©tĂ© dĂ©veloppĂ© qui utilise des techniques d’optimisation de particle swarm et d’optimisation de set covers pour concevoir un cadre d’apprentissage qui permet l’ordre automatique des weak classifiers et la dĂ©termination de leurs seuils. Des comparaisons de performance ont Ă©tĂ© effectuĂ©es sur des donnĂ©es rĂ©elles extraites du Web of Science (WoS) et des bases de donnĂ©es bibliographiques SCOPUS. En rĂ©sumĂ©, ce travail nous permet de tirer des conclusions sur les qualitĂ©s et les faiblesses de chaque algorithme phonĂ©tique et mesure de similaritĂ© dans la perspective de notre application. Nous avons montrĂ© que l’algorithme phonĂ©tique NYSIIS est un meilleur choix Ă  utiliser dans l’étape de blocking de l’application de dĂ©sambiguĂŻsation. De plus, l’algorithme de Weighting Table-based surpassait certains des algorithmes de similaritĂ© couramment utilisĂ©s en terme de efficacitĂ© de temps, tout en produisant des rĂ©sultats satisfaisants. En outre, nous avons proposĂ© une mĂ©thode d’apprentissage pour dĂ©terminer automatiquement la structure de l’algorithme de dĂ©sambiguĂŻsation.----------ABSTRACT : The worldwide scientific production represents a massive amount of records which can be accessed via numerous databases. Because of the presence of ambiguous records, a time-efficient disambiguation process is required as an essential step of extracting correct information and generating publication statistics. However, the disambiguation task is exhaustive and complex due to the large volume databases and existing missing data. Currently there is no complete automatic method that is able to produce satisfactory results for the disambiguation process. Previously, an efficient entity disambiguation application was developed that is a supervised cascade algorithm which gives promising results on large bibliographic databases. Although the existing work produces high-quality results within a reasonable processing time, it lacks an efficient choice of metrics and the structure of the classifiers is determined in a heuristic manner by the analysis of precision and recall errors. Clearly, an automated approach that makes the application flexible and adjustable would directly enhance the usability of the application. Such approach would help to understand the importance of each feature classification in the disambiguation process and select the most efficient ones. In this research, we propose a learning algorithm for automating the disambiguation process of this application. In fact, the aim of this work is to help to employ the most appropriate phonetic algorithm and similarity measures as well as introduce a desirable automatic approach instead of a heuristic approach. To achieve our goals, we conduct three major steps: First, we address the problem of evaluating phonetic encoding algorithms that can be used in blocking. Six commonly used phonetic encoding algorithm were selected and specific quantitative evaluation metrics were developed in order to assess their limitations and advantages and recruit the best one. Second, we test different string similarity measures and we analyze the advantages and disadvantages of each technique. In other words, our second goal is to build an efficient disambiguation method by comparing several editand token-based algorithms to improve the blocking method. Finally, using bootstrap aggregating (Bagging) and AdaBoost methods, an algorithm has been developed that employs particle swarm and set cover optimization techniques to design a learning framework that enables automatic ordering of the weak classifiers and determining their thresholds. Performance comparisons were carried out on real data extracted from the web of science (WoS) and the SCOPUS bibliographic databases. In summary, this work allows us to draw conclusions about the qualities and weaknesses of each phonetic algorithm and similarity measure in the perspective of our application. We have shown that the NYSIIS phonetic algorithm is a better choice to use in blocking step of the disambiguation application. In addition, the Weighting Table-based algorithm outperforms some of the commonly used similarity algorithms in terms of time-efficiency, while producing satisfactory results. Moreover, we proposed a learning method to determine the structure of the disambiguation algorithm automatically

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively
    • 

    corecore