11,645 research outputs found
MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Besides the text content, documents and their associated words usually come
with rich sets of meta informa- tion, such as categories of documents and
semantic/syntactic features of words, like those encoded in word embeddings.
Incorporating such meta information directly into the generative process of
topic models can improve modelling accuracy and topic quality, especially in
the case where the word-occurrence information in the training data is
insufficient. In this paper, we present a topic model, called MetaLDA, which is
able to leverage either document or word meta information, or both of them
jointly. With two data argumentation techniques, we can derive an efficient
Gibbs sampling algorithm, which benefits from the fully local conjugacy of the
model. Moreover, the algorithm is favoured by the sparsity of the meta
information. Extensive experiments on several real world datasets demonstrate
that our model achieves comparable or improved performance in terms of both
perplexity and topic quality, particularly in handling sparse texts. In
addition, compared with other models using meta information, our model runs
significantly faster.Comment: To appear in ICDM 201
A Framework for Aggregating Private and Public Web Archives
Personal and private Web archives are proliferating due to the increase in
the tools to create them and the realization that Internet Archive and other
public Web archives are unable to capture personalized (e.g., Facebook) and
private (e.g., banking) Web pages. We introduce a framework to mitigate issues
of aggregation in private, personal, and public Web archives without
compromising potential sensitive information contained in private captures. We
amend Memento syntax and semantics to allow TimeMap enrichment to account for
additional attributes to be expressed inclusive of the requirements for
dereferencing private Web archive captures. We provide a method to involve the
user further in the negotiation of archival captures in dimensions beyond time.
We introduce a model for archival querying precedence and short-circuiting, as
needed when aggregating private and personal Web archive captures with those
from public Web archives through Memento. Negotiation of this sort is novel to
Web archiving and allows for the more seamless aggregation of various types of
Web archives to convey a more accurate picture of the past Web.Comment: Preprint version of the ACM/IEEE Joint Conference on Digital
Libraries (JCDL 2018) full paper, accessible at the DO
Fast filtering and animation of large dynamic networks
Detecting and visualizing what are the most relevant changes in an evolving
network is an open challenge in several domains. We present a fast algorithm
that filters subsets of the strongest nodes and edges representing an evolving
weighted graph and visualize it by either creating a movie, or by streaming it
to an interactive network visualization tool. The algorithm is an approximation
of exponential sliding time-window that scales linearly with the number of
interactions. We compare the algorithm against rectangular and exponential
sliding time-window methods. Our network filtering algorithm: i) captures
persistent trends in the structure of dynamic weighted networks, ii) smoothens
transitions between the snapshots of dynamic network, and iii) uses limited
memory and processor time. The algorithm is publicly available as open-source
software.Comment: 6 figures, 2 table
Scalable Query Answering Under Uncertainty to Neuroscientific Ontological Knowledge: The NeuroLang Approach
Researchers in neuroscience have a growing number of datasets available to study the brain, which is made possible by recent technological advances. Given the extent to which the brain has been studied, there is also available ontological knowledge encoding the current state of the art regarding its different areas, activation patterns, keywords associated with studies, etc. Furthermore, there is inherent uncertainty associated with brain scans arising from the mapping between voxels—3D pixels—and actual points in different individual brains. Unfortunately, there is currently no unifying framework for accessing such collections of rich heterogeneous data under uncertainty, making it necessary for researchers to rely on ad hoc tools. In particular, one major weakness of current tools that attempt to address this task is that only very limited propositional query languages have been developed. In this paper we present NeuroLang, a probabilistic language based on first-order logic with existential rules, probabilistic uncertainty, ontologies integration under the open world assumption, and built-in mechanisms to guarantee tractable query answering over very large datasets. NeuroLang’s primary objective is to provide a unified framework to seamlessly integrate heterogeneous data, such as ontologies, and map fine-grained cognitive domains to brain regions through a set of formal criteria, promoting shareable and highly reproducible research. After presenting the language and its general query answering architecture, we discuss real-world use cases showing how NeuroLang can be applied to practical scenarios.Fil: Zanitti, Gaston E.. No especifÃca;Fil: Soto, Yamil Osvaldo Omar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Iovene, Valentin. No especifÃca;Fil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Rodriguez, Ricardo Oscar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Simari, Gerardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Wassermann, Demian. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events
We introduce a novel iterative approach for event coreference resolution that
gradually builds event clusters by exploiting inter-dependencies among event
mentions within the same chain as well as across event chains. Among event
mentions in the same chain, we distinguish within- and cross-document event
coreference links by using two distinct pairwise classifiers, trained
separately to capture differences in feature distributions of within- and
cross-document event clusters. Our event coreference approach alternates
between WD and CD clustering and combines arguments from both event clusters
after every merge, continuing till no more merge can be made. And then it
performs further merging between event chains that are both closely related to
a set of other chains of events. Experiments on the ECB+ corpus show that our
model outperforms state-of-the-art methods in joint task of WD and CD event
coreference resolution.Comment: EMNLP 201
- …