Search CORE

2,324 research outputs found

Information Extraction in Illicit Domains

Author: Banko M.
Bauer F.
Chakrabarti S.
Kushmerick N.
Mikolov T.
Sahlgren M.
Wick M.
Zouaq A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/03/2017
Field of study

Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201

arXiv.org e-Print Archive

Crossref

Exploiting prior knowledge and latent variable representations for the statistical modeling and probabilistic querying of large knowledge graphs

Author: Krompaß Denis
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/11/2015
Field of study

Large knowledge graphs increasingly add great value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. These applications include Google search, Bing search, IBM’s Watson, but also smart mobile assistants as Apple’s Siri, Google Now or Microsoft’s Cortana. Popular knowledge graphs like DBpedia, YAGO or Freebase store a broad range of facts about the world, to a large extent derived from Wikipedia, currently the biggest web encyclopedia. In addition to these freely accessible open knowledge graphs, commercial ones have also evolved including the well-known Google Knowledge Graph or Microsoft’s Satori. Since incompleteness and veracity of knowledge graphs are known problems, the statistical modeling of knowledge graphs has increasingly gained attention in recent years. Some of the leading approaches are based on latent variable models which show both excellent predictive performance and scalability. Latent variable models learn embedding representations of domain entities and relations (representation learning). From these embeddings, priors for every possible fact in the knowledge graph are generated which can be exploited for data cleansing, completion or as prior knowledge to support triple extraction from unstructured textual data as successfully demonstrated by Google’s Knowledge-Vault project. However, large knowledge graphs impose constraints on the complexity of the latent embeddings learned by these models. For graphs with millions of entities and thousands of relation-types, latent variable models are required to exploit low dimensional embeddings for entities and relation-types to be tractable when applied to these graphs. The work described in this thesis extends the application of latent variable models for large knowledge graphs in three important dimensions. First, it is shown how the integration of ontological constraints on the domain and range of relation-types enables latent variable models to exploit latent embeddings of reduced complexity for modeling large knowledge graphs. The integration of this prior knowledge into the models leads to a substantial increase both in predictive performance and scalability with improvements of up to 77% in link-prediction tasks. Since manually designed domain and range constraints can be absent or fuzzy, we also propose and study an alternative approach based on a local closed-world assumption, which derives domain and range constraints from observed data without the need of prior knowledge extracted from the curated schema of the knowledge graph. We show that such an approach also leads to similar significant improvements in modeling quality. Further, we demonstrate that these two types of domain and range constraints are of general value to latent variable models by integrating and evaluating them on the current state of the art of latent variable models represented by RESCAL, Translational Embedding, and the neural network approach used by the recently proposed Google Knowledge Vault system. In the second part of the thesis it is shown that the just mentioned three approaches all perform well, but do not share many commonalities in the way they model knowledge graphs. These differences can be exploited in ensemble solutions which improve the predictive performance even further. The third part of the thesis concerns the efficient querying of the statistically modeled knowledge graphs. This thesis interprets statistically modeled knowledge graphs as probabilistic databases, where the latent variable models define a probability distribution for triples. From this perspective, link-prediction is equivalent to querying ground triples which is a standard functionality of the latent variable models. For more complex querying that involves e.g. joins and projections, the theory on probabilistic databases provides evaluation rules. In this thesis it is shown how the intrinsic features of latent variable models can be combined with the theory of probabilistic databases to realize efficient probabilistic querying of the modeled graphs

Fusing uncertain knowledge and evidence for maritime situational awareness via Markov Logic Networks

Author: Bryan Karna
Snidaro Lauro
Visentini Ingrid
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The concepts of event and anomaly are important building blocks for developing a situational picture of the observed environment. We here relate these concepts to the JDL fusion model and demonstrate the power of Markov Logic Networks (MLNs) for encoding uncertain knowledge and compute inferences according to observed evidence. MLNs combine the expressive power of first-order logic and the probabilistic uncertainty management of Markov networks. Within this framework, different types of knowledge (e.g. a priori, contextual) with associated uncertainty can be fused together for situation assessment by expressing unobservable complex events as a logical combination of simpler evidences. We also develop a mechanism to evaluate the level of completion of complex events and show how, along with event probability, it could provide additional useful information to the operator. Examples are demonstrated on two maritime scenarios of rules for event and anomaly detection

Archivio istituzionale della ricerca - Università degli Studi di Udine

Advancing functional connectivity research from association to causation

Author: Biswal Bharat B.
Calhoun Vince
Cole Michael W.
Hanson Stephen José
Headley Drew B.
Lurie Daniel J.
Marinazzo Daniele
Mill Ravi D.
Poldrack Russell A.
Reid Andrew T.
Sanchez-Romero Ruben
Uddin Lucina Q.
Valdés-Sosa Pedro A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Cognition and behavior emerge from brain network interactions, such that investigating causal interactions should be central to the study of brain function. Approaches that characterize statistical associations among neural time series-functional connectivity (FC) methods-are likely a good starting point for estimating brain network interactions. Yet only a subset of FC methods ('effective connectivity') is explicitly designed to infer causal interactions from statistical associations. Here we incorporate best practices from diverse areas of FC research to illustrate how FC methods can be refined to improve inferences about neural mechanisms, with properties of causal neural interactions as a common ontology to facilitate cumulative progress across FC approaches. We further demonstrate how the most common FC measures (correlation and coherence) reduce the set of likely causal models, facilitating causal inferences despite major limitations. Alternative FC measures are suggested to immediately start improving causal inferences beyond these common FC measures

Crossref

Repository@Nottingham

Ghent University Academic Bibliography

Process Mining for Systems with Shared Resources and Queues:Process Modeling, Conformance Checking, and Performance Analysis

Author: Denisov Vadim Vladimirovich
Publication venue: Eindhoven University of Technology
Publication date: 01/06/2023
Field of study

Pure OAI Repository

Process Mining for Systems with Shared Resources and Queues:Process Modeling, Conformance Checking, and Performance Analysis

Author: Denisov Vadim Vladimirovich
Publication venue: Eindhoven University of Technology
Publication date: 01/06/2023
Field of study

Pure OAI Repository

Philosophy and the practice of Bayesian statistics

Author: Abbott
Ashby
Atkinson
Barkow
Bartlett
Bayarri
Bayarri
Bayarri
Berger
Berk
Berk
Bernardo
Binmore
Bousquet
Box
Box
Box
Braithwaite
Brown
Cesa-Bianchi
Claeskens
Cox
Cox
Cox
Cox
Csiszár
Dawid
Donovan
Doob
Earman
Eggertsson
Fitelson
Foster
Fraser
Freedman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Gelman
Ghitza
Ghosh
Giere
Gigerenzer
Gigerenzer
Glymour
Good
Good
Gray
Greenland
Greenland
Grünwald
Grünwald
Gustafson
Guttorp
Haack
Hacking
Halpern
Hastie
Hempel
Hill
Hjort
Holland
Howson
Hunter
Jaynes
Kass
Kass
Kass
Kelly
Kelly
Kitcher
Kleijn
Kolakowski
Kuhn
Kuhn
Lakatos
Laudan
Laudan
Li
Lijoi
Lindsay
Manski
Manski
Mayo
Mayo
Mayo
Mayo
McAllister
McCarty
Merrill
Metropolis
Morris
Müller
Newman
Norton
Paninski
Popper
Quine
Raftery
Ripley
Rivers
Robins
Rubin
Rubin
Russell
Salmon
Savage
Schervish
Seidenfeld
Seidenfeld
Shalizi
Snijders
Spanos
Stove
Stove
Tilly
Tilly
Toulmin
Tukey
Uffink
Uffink
Vansteelandt
Vidyasagar
Vuong
Wahba
Wasserman
Weinberg
White
Wooldridge
Ziman
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.Comment: 36 pages, 5 figures. v2: Fixed typo in caption of figure 1. v3: Further typo fixes. v4: Revised in response to referee

arXiv.org e-Print Archive

CiteSeerX

Crossref