9,632 research outputs found
Exploratory topic modeling with distributional semantics
As we continue to collect and store textual data in a multitude of domains,
we are regularly confronted with material whose largely unknown thematic
structure we want to uncover. With unsupervised, exploratory analysis, no prior
knowledge about the content is required and highly open-ended tasks can be
supported. In the past few years, probabilistic topic modeling has emerged as a
popular approach to this problem. Nevertheless, the representation of the
latent topics as aggregations of semi-coherent terms limits their
interpretability and level of detail.
This paper presents an alternative approach to topic modeling that maps
topics as a network for exploration, based on distributional semantics using
learned word vectors. From the granular level of terms and their semantic
similarity relations global topic structures emerge as clustered regions and
gradients of concepts. Moreover, the paper discusses the visual interactive
representation of the topic map, which plays an important role in supporting
its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent
Data Analysis (IDA 2015
Recommended from our members
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: N
Collaboration in the Semantic Grid: a Basis for e-Learning
The CoAKTinG project aims to advance the state of the art in collaborative mediated spaces for the Semantic Grid. This paper presents an overview of the hypertext and knowledge based tools which have been deployed to augment existing collaborative environments, and the ontology which is used to exchange structure, promote enhanced process tracking, and aid navigation of resources before, after, and while a collaboration occurs. While the primary focus of the project has been supporting e-Science, this paper also explores the similarities and application of CoAKTinG technologies as part of a human-centred design approach to e-Learning
The influence of conceptual user models on the creation and interpretation of diagrams representing reactive systems
In system design, many diagrams of many different types are used. Diagrams communicate design aspects between members of the development team, and between these experts and the non-expert customers and future users. Mastering the creation of diagrams is often a challenging task, judging by particular errors persistently found in diagrams created by undergraduate computer science students. We assume a possible misalignment between human perception and cognition on the one hand and the diagramsâ structure and syntax on the other. This article presents the results of an investigation of such a misalignment. We focus on the deployment of so-called 'conceptual user models' (mental models, created by users in their mind) at the creation of diagrams. We propose a taxonomy for mental mappings, used for categorization of representations. We describe an experiment where naive and novice subjects created one or several diagrams of a familiar task. We use our taxonomy for analysing these diagrams, both for the represented task structure and the symbols used. The results indeed show a mismatch between mental models and currently used diagram techniques
The CIAO Multi-Dialect Compiler and System: An Experimentation Workbench for Future (C)LP Systems
CIAO is an advanced programming environment supporting Logic and Constraint programming. It offers a simple concurrent kernel on top of which declarative and non-declarative extensions are added via librarles. Librarles are available for supporting the ISOProlog standard, several constraint domains, functional and higher order programming, concurrent and distributed programming, internet programming, and others. The source language allows declaring properties of predicates via assertions, including types and modes. Such properties are checked at compile-time or at run-time. The compiler and system architecture are designed to natively support modular global analysis, with the two objectives of proving properties in assertions and performing program optimizations, including transparently exploiting parallelism in programs. The purpose of this paper is to report on recent progress made in the context of the CIAO system, with special emphasis on the capabilities of the compiler, the techniques used for supporting such capabilities, and the results in the ĂĄreas of program analysis and transformation already obtained with the system
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
We present our system for semantic frame induction that showed the best
performance in Subtask B.1 and finished as the runner-up in Subtask A of the
SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et
al., 2019). Our approach separates this task into two independent steps: verb
clustering using word and their context embeddings and role labeling by
combining these embeddings with syntactical features. A simple combination of
these steps shows very competitive results and can be extended to process other
datasets and languages.Comment: 5 pages, 3 tables, accepted at SemEval 201
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
- âŠ