683 research outputs found

    Documenting Knowledge Graph Embedding and Link Prediction using Knowledge Graphs

    Get PDF
    In recent years, sub-symbolic learning, i.e., Knowledge Graph Embedding (KGE) incorporated with Knowledge Graphs (KGs) has gained significant attention in various downstream tasks (e.g., Link Prediction (LP)). These techniques learn a latent vector representation of KG's semantical structure to infer missing links. Nonetheless, the KGE models remain a black box, and the decision-making process behind them is not clear. Thus, the trustability and reliability of the model's outcomes have been challenged. While many state-of-the-art approaches provide data-driven frameworks to address these issues, they do not always provide a complete understanding, and the interpretations are not machine-readable. That is why, in this work, we extend a hybrid interpretable framework, InterpretME, in the field of the KGE models, especially for translation distance models, which include TransE, TransH, TransR, and TransD. The experimental evaluation on various benchmark KGs supports the validity of this approach, which we term Trace KGE. Trace KGE, in particular, contributes to increased interpretability and understanding of the perplexing KGE model's behavior

    Наука, 21-й век.

    Get PDF
    Разум — очень короткое слово, но это самая совершенная и восхитительная вещь, фрагмент души вселенной, или, как благочестивее сказать для тех, кто изучает философию по Моисею, очень точная копия божественного образа. Филон Александрийский. Об изменении имён. λογισμὸς δὲ βραχὺ μὲν ὄνομα, τελειότατον δὲ καὶ θειότατον ἔργον, τῆς τοῦ παντὸς ψυχῆς ἀπόσπασμα ἤ, ὅπερ ὁσιώτερον εἰπεῖν τοῖς κατὰ Μωυσῆν φιλοσοφοῦσιν, εἰκόνος θείας ἐκμαγεῖον ἐμφερές. Φίλων ο Αλεξανδρεύς. Περί των μετονομαζομένων και ων ένεκα μετονομάζονται

    Metadata as a Methodological Commons: From Aboutness Description to Cognitive Modeling

    Get PDF
    ABSTRACTMetadata is data about data, which is generated mainly for resources organization and description, facilitating finding, identifying, selecting and obtaining information①. With the advancement of technologies, the acquisition of metadata has gradually become a critical step in data modeling and function operation, which leads to the formation of its methodological commons. A series of general operations has been developed to achieve structured description, semantic encoding and machine-understandable information, including entity definition, relation description, object analysis, attribute extraction, ontology modeling, data cleaning, disambiguation, alignment, mapping, relating, enriching, importing, exporting, service implementation, registry and discovery, monitoring etc. Those operations are not only necessary elements in semantic technologies (including linked data) and knowledge graph technology, but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems.In this paper, a series of metadata-related methods are collectively referred to as ‘metadata methodological commons’, which has a lot of best practices reflected in the various standard specifications of the Semantic Web. In the future construction of a multi-modal metaverse based on Web 3.0, it shall play an important role, for example, in building digital twins through adopting knowledge models, or supporting the modeling of the entire virtual world, etc. Manual-based description and coding obviously cannot adapted to the UGC (User Generated Contents) and AIGC (AI Generated Contents)-based content production in the metaverse era. The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of AI era

    Development of an Event Management Web Application For Students: A Focus on Back-end

    Get PDF
    Managing schedules can be challenging for students, with different calendars on various platforms leading to confusion and missed events. To address this problem, this thesis presents the development of an event management website designed to help students stay organized and motivated. With a focus on the application's back-end, this thesis explores the technology stack used to build the website and the implementation details of each chosen technology. By providing a detailed case study of the website development process, this thesis serves as a helpful resource for future developers looking to build their web applications

    Hybrid human-AI driven open personalized education

    Get PDF
    Attaining those skills that match labor market demand is getting increasingly complicated as prerequisite knowledge, skills, and abilities are evolving dynamically through an uncontrollable and seemingly unpredictable process. Furthermore, people's interests in gaining knowledge pertaining to their personal life (e.g., hobbies and life-hacks) are also increasing dramatically in recent decades. In this situation, anticipating and addressing the learning needs are fundamental challenges to twenty-first century education. The need for such technologies has escalated due to the COVID-19 pandemic, where online education became a key player in all types of training programs. The burgeoning availability of data, not only on the demand side but also on the supply side (in the form of open/free educational resources) coupled with smart technologies, may provide a fertile ground for addressing this challenge. Therefore, this thesis aims to contribute to the literature about the utilization of (open and free-online) educational resources toward goal-driven personalized informal learning, by developing a novel Human-AI based system, called eDoer. In this thesis, we discuss all the new knowledge that was created in order to complete the system development, which includes 1) prototype development and qualitative user validation, 2) decomposing the preliminary requirements into meaningful components, 3) implementation and validation of each component, and 4) a final requirement analysis followed by combining the implemented components in order develop and validate the planned system (eDoer). All in all, our proposed system 1) derives the skill requirements for a wide range of occupations (as skills and jobs are typical goals in informal learning) through an analysis of online job vacancy announcements, 2) decomposes skills into learning topics, 3) collects a variety of open/free online educational resources that address those topics, 4) checks the quality of those resources and topic relevance using our developed intelligent prediction models, 5) helps learners to set their learning goals, 6) recommends personalized learning pathways and learning content based on individual learning goals, and 7) provides assessment services for learners to monitor their progress towards their desired learning objectives. Accordingly, we created a learning dashboard focusing on three Data Science related jobs and conducted an initial validation of eDoer through a randomized experiment. Controlling for the effects of prior knowledge as assessed by the pretest, the randomized experiment provided tentative support for the hypothesis that learners who engaged with personal eDoer recommendations attain higher scores on the posttest than those who did not. The hypothesis that learners who received personalized content in terms of format, length, level of detail, and content type, would achieve higher scores than those receiving non-personalized content was not supported as a statistically significant result

    The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models

    Full text link
    Introduction: The scientific publishing landscape is expanding rapidly, creating challenges for researchers to stay up-to-date with the evolution of the literature. Natural Language Processing (NLP) has emerged as a potent approach to automating knowledge extraction from this vast amount of publications and preprints. Tasks such as Named-Entity Recognition (NER) and Named-Entity Linking (NEL), in conjunction with context-dependent semantic interpretation, offer promising and complementary approaches to extracting structured information and revealing key concepts. Results: We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental design, and the nature of the experimental method as an additional class. SourceData-NLP contains more than 620,000 annotated biomedical entities, curated from 18,689 figures in 3,223 papers in molecular and cell biology. We illustrate the dataset's usefulness by assessing BioLinkBERT and PubmedBERT, two transformers-based models, fine-tuned on the SourceData-NLP dataset for NER. We also introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement. Conclusions: SourceData-NLP's scale highlights the value of integrating curation into publishing. Models trained with SourceData-NLP will furthermore enable the development of tools able to extract causal hypotheses from the literature and assemble them into knowledge graphs

    Galaxy training: A powerful framework for teaching!

    Get PDF
    There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform

    Decision support system for blockchain (DLT) platform selection based on ITU recommendations: A systematic literature review approach

    Get PDF
    Blockchain technologies, also known as Distributed Ledger Technologies (DLT), are increasingly being explored in many applications, especially in the presence of (potential) dis-/mis-/un-trust among organizations and individuals. Today, there exists a plethora of DLT platforms on the market, which makes it challenging for system designers to decide what platform they should adopt and implement. Although a few DLT comparison frameworks have been proposed in the literature, they often fail in covering all performance and functional aspects, adding that they too rarely build upon standardized criteria and recommendations. Given this state of affairs, the present paper considers a recent and exhaustive set of assessment criteria recommended by the ITU (International Telecommunication Union). Those criteria (about fifty) are nonetheless mostly defined in a textual form, which may pose interpretation problems during the implementation process. To avoid this, a systematic literature review regarding each ITU criterion is conducted with a twofold objective: (i) to understand to what extent a given criterion is considered/evaluated by the literature; (ii) to come up with ‘formal’ metric definition (i.e., on a mathematical or experimental ground) based, whenever possible, on the current literature. Following this formalization stage, a decision support tool called CREDO-DLT, which stands for “multiCRiteria-basEd ranking Of Distributed Ledger Technology platforms”, is developed using AHP and TOPSIS, which is publicly made available to help decision-maker to select the most suitable DLT platform alternative (i.e., that best suits their needs and requirements). A use case scenario in the context of energy communities is proposed to show the practicality of CREDO-DLT. •Blockchain (DLT) standardization initiatives are reviewed.•To what extent ITU’s DLT assessment criteria are covered in literature is studied.•A mathematical formalizations of the ITU recommendations are proposed.•A decision support tool (CREDO-DLT) is designed for DLT platform selection.•An energy community use case is developed to show the practicality of CREDO-DLT

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses
    corecore