36 research outputs found

    A Neuro-Symbolic Approach to Structured Event Recognition

    Get PDF
    Events are structured entities with multiple components: the event type, the participants with their roles, the outcome, the sub-events etc. A fully end-to-end approach for event recognition from raw data sequence, therefore, should also solve a number of simpler tasks like recognizing the objects involved in the events and their roles, the outcome of the events as well as the sub-events. Ontological knowledge about event structure, specified in logic languages, could be very useful to solve the aforementioned challenges. However, the majority of successful approaches in event recognition from raw data are based on purely neural approaches (mainly recurrent neural networks), with limited, if any, support for background knowledge. These approaches typically require large training sets with detailed annotations at the different levels in which recognition can be decomposed (e.g., video annotated with object bounding boxes, object roles, events and sub-events). In this paper, we propose a neuro-symbolic approach for structured event recognition from raw data that uses "shallow" annotation on the high-level events and exploits background knowledge to propagate this supervision to simpler tasks such as object classification. We develop a prototype of the approach and compare it with a purely neural solution based on recurrent neural networks, showing the higher capability of solving both the event recognition task and the simpler task of object classification, as well as the ability to generalize to events with unseen outcomes

    Identifying soccer players’ playing styles: a systematic review

    Get PDF
    Identifying playing styles in football is highly valuable for achieving effective performance analysis. While there is extensive research on team styles, studies on individual player styles are still in their early stages. Thus, the aim of this systematic review was to provide a comprehensive overview of the existing literature on player styles and identify research areas required for further development, offering new directions for future research. Following the PRISMA guidelines for systematic reviews, we conducted a search using a specific strategy across four databases (PubMed, Scopus, Web of Science, and SPORTDiscus). Inclusion and exclusion criteria were applied to the initial search results, ultimately identifying twelve studies suitable for inclusion in this review. Through thematic analysis and qualitative evaluation of these studies, several key findings emerged: (a) a lack of a structured theoretical framework for player styles based on their positions within the team formation, (b) absence of studies investigating the influence of contextual variables on player styles, (c) methodological deficiencies observed in the reviewed studies, and (d) disparity in the objectives of sports science and data science studies. By identifying these gaps in the literature and presenting a structured framework for player styles (based on the compilation of all reported styles from the reviewed studies), this review aims to assist team stakeholders and provide guidance for future research endeavors

    The Extended Dawid-Skene Model:Fusing Information from Multiple Data Schemas

    Get PDF
    While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864).Comment: Updated with Author-Preprint version following Publication in P. Cellier and K. Driessens (Eds.): ECML PKDD 2019 Workshops, CCIS 1167, pp. 121 - 136, 202

    Explainable methods for knowledge graph refinement and exploration via symbolic reasoning

    Get PDF
    Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollständig und enthalten auch ungültige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die Erklärbarkeit und Verständlichkeit von Wissensgraphinhalten für Nutzer. In Anwendungen ist darüber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen Entitäten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch für tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz über dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit Erklärungen für Nutzer. Die Dissertation umfasst folgende Beiträge: Insbesondere leistet die Dissertation folgende Beiträge: • Zur Wissensgraph-Erweiterung präsentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch Hinzufügen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln können neue Fakten inferieren und somit Lücken im Wissensgraphen schließen. Experimente mit großen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche Erklärungen liefert. • Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen Repräsentationen für fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken für die Regelqualität verwendet. Experimente zeigen, dass RuLES die Qualität der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. • Zur Unterstützung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von Erklärungen für Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen für Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und Qualität der entdeckten Erklärungen deutlich verbessert. Die generierten unterstützen Erklärungen unterstütze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. • Zur Unterstützung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen Entitäts-Clustern mit Erklärungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-Erklärung besteht aus einer Kombination von Relationen zwischen den Entitäten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- Qualität und die Cluster-Erklärbarkeit durch iteratives Verschränken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher Qualität berechnet und dass die Cluster-Erklärungen für Nutzer informativ sind

    Towards Quality-of-Service Metrics for Symbolic Knowledge Injection

    Get PDF
    The integration of symbolic knowledge and sub-symbolic predictors represents a recent popular trend in AI. Among the set of integration approaches, Symbolic Knowledge Injection (SKI) proposes the exploitation of human-intelligible knowledge to steer sub-symbolic models towards some desired behaviour. The vast majority of works in the field of SKI aim at increasing the predictive performance of the sub-symbolic model at hand and, therefore, measure SKI strength solely based on performance improvements. However, a variety of artefacts exist that affect this measure, mostly linked to the quality of the injected knowledge and the underlying predictor. Moreover, the use of injection techniques introduces the possibility of producing more efficient sub-symbolic models in terms of computations, energy, and data required. Therefore, novel and reliable Quality-of-Service (QoS) measures for SKI are clearly needed, aiming at robustly identifying the overall quality of an injection mechanism. Accordingly, in this work, we propose and mathematically model the first – up to our knowledge – set of QoS metrics for SKI, focusing on measuring injection robustness and efficiency gain

    A Neuro-Symbolic Approach for Real-World Event Recognition from Weak Supervision

    Get PDF
    Events are structured entities involving different components (e.g, the participants, their roles etc.) and their relations. Structured events are typically defined in terms of (a subset of) simpler, atomic events and a set of temporal relation between them. Temporal Event Detection (TED) is the task of detecting structured and atomic events within data streams, most often text or video sequences, and has numerous applications, from video surveillance to sports analytics. Existing deep learning approaches solve TED task by implicitly learning the temporal correlations among events from data. As consequence, these approaches often fail in ensuring a consistent prediction in terms of the relationship between structured and atomic events. On the other hand, neuro-symbolic approaches have shown their capability to constrain the output of the neural networks to be consistent with respect to the background knowledge of the domain. In this paper, we propose a neuro-symbolic approach for TED in a real world scenario involving sports activities. We show how by incorporating simple knowledge involving the relative order of atomic events and constraints on their duration, the approach substantially outperforms a fully neural solution in terms of recognition accuracy, when little or even no supervision is available on the atomic events

    Revising with a Backward Glance: Regressions and Skips during Reading as Cognitive Signals for Revision Policies in Incremental Processing

    Full text link
    In NLP, incremental processors produce output in instalments, based on incoming prefixes of the linguistic input. Some tokens trigger revisions, causing edits to the output hypothesis, but little is known about why models revise when they revise. A policy that detects the time steps where revisions should happen can improve efficiency. Still, retrieving a suitable signal to train a revision policy is an open problem, since it is not naturally available in datasets. In this work, we investigate the appropriateness of regressions and skips in human reading eye-tracking data as signals to inform revision policies in incremental sequence labelling. Using generalised mixed-effects models, we find that the probability of regressions and skips by humans can potentially serve as useful predictors for revisions in BiLSTMs and Transformer models, with consistent results for various languages.Comment: Accepted to CoNLL 202

    ClusterLLM: Large Language Models as a Guide for Text Clustering

    Full text link
    We introduce ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT. Compared with traditional unsupervised methods that builds upon "small" embedders, ClusterLLM exhibits two intriguing advantages: (1) it enjoys the emergent capability of LLM even if its embeddings are inaccessible; and (2) it understands the user's preference on clustering through textual instruction and/or a few annotated data. First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder. We empirically show that this strategy is both effective for fine-tuning small embedder and cost-efficient to query ChatGPT. Second, we prompt ChatGPT for helps on clustering granularity by carefully designed pairwise questions <do A and B belong to the same category>, and tune the granularity from cluster hierarchies that is the most consistent with the ChatGPT answers. Extensive experiments on 14 datasets show that ClusterLLM consistently improves clustering quality, at an average cost of ~$0.6 per dataset

    A Siamese Based System for City Verification

    Get PDF
    Imagegeolocalizationisreceivingincreasingattention duetoitsimportanceinseveralapplications,suchasimageretrieval, criminalinvestigationsandfact-checking.Previousworksfocused onseveralinstancesofimagegeolocalizationincludingplacerecognition,GPScoordinatesestimationandcountryrecognition.Inthis paper,wetackleanevenmorechallengingproblem,whichisrecognizingthecitywhereanimagehasbeentaken.Duetothevast numberofcitiesintheworld,wecasttheproblemasaverification problem,wherebythesystemhastodecidewhetheracertainimage hasbeentakeninagivencityornot.Inparticular,wepresentasystemthatgivenaqueryimageandasmallsetofimagestakenina targetcity,decidesifthequeryimagehasbeenshotinthetargetcity ornot.Toallowthesystemtohandlethecaseofimages,takenin citiesthathavenotbeenusedduringtraining,weuseaSiamesenetworkbasedonVisionTransformerasabackbone.Theexperiments werunprovethevalidityoftheproposedsystemwhichoutperforms solutionsbasedonstate-of-the-arttechniques,eveninthechallengingcaseofimagesshotindifferentcitiesofthesamecountry

    Multiple-Aspect Analysis of Semantic Trajectories

    Get PDF
    This open access book constitutes the refereed post-conference proceedings of the First International Workshop on Multiple-Aspect Analysis of Semantic Trajectories, MASTER 2019, held in conjunction with the 19th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2019, in WĂźrzburg, Germany, in September 2019. The 8 full papers presented were carefully reviewed and selected from 12 submissions. They represent an interesting mix of techniques to solve recurrent as well as new problems in the semantic trajectory domain, such as data representation models, data management systems, machine learning approaches for anomaly detection, and common pathways identification
    corecore