10,892 research outputs found
Recommended from our members
Ontological Foundations for Scholarly Debate Mapping Technology
Mapping scholarly debates is an important genre of what can be called Knowledge Domain Analytics (KDA) technology â i.e. technology which combines both quantitative and qualitative methods of analysing specialist knowledge domains. However, current KDA technology research has emerged from diverse traditions and thus lacks a common conceptual foundation. This paper reports on the design of a KDA ontology that aims to provide this foundation. The paper then describes the argumentation extensions to the ontology for supporting scholarly debate mapping as a special form of KDA and demonstrates its expressive capabilities using a case study debate
Scalable k-Means Clustering via Lightweight Coresets
Coresets are compact representations of data sets such that models trained on
a coreset are provably competitive with models trained on the full data set. As
such, they have been successfully used to scale up clustering models to massive
data sets. While existing approaches generally only allow for multiplicative
approximation errors, we propose a novel notion of lightweight coresets that
allows for both multiplicative and additive errors. We provide a single
algorithm to construct lightweight coresets for k-means clustering as well as
soft and hard Bregman clustering. The algorithm is substantially faster than
existing constructions, embarrassingly parallel, and the resulting coresets are
smaller. We further show that the proposed approach naturally generalizes to
statistical k-means clustering and that, compared to existing results, it can
be used to compute smaller summaries for empirical risk minimization. In
extensive experiments, we demonstrate that the proposed algorithm outperforms
existing data summarization strategies in practice.Comment: To appear in the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
Graphs, Matrices, and the GraphBLAS: Seven Good Reasons
The analysis of graphs has become increasingly important to a wide range of
applications. Graph analysis presents a number of unique challenges in the
areas of (1) software complexity, (2) data complexity, (3) security, (4)
mathematical complexity, (5) theoretical analysis, (6) serial performance, and
(7) parallel performance. Implementing graph algorithms using matrix-based
approaches provides a number of promising solutions to these challenges. The
GraphBLAS standard (istc- bigdata.org/GraphBlas) is being developed to bring
the potential of matrix based graph algorithms to the broadest possible
audience. The GraphBLAS mathematically defines a core set of matrix-based graph
operations that can be used to implement a wide class of graph algorithms in a
wide range of programming environments. This paper provides an introduction to
the GraphBLAS and describes how the GraphBLAS can be used to address many of
the challenges associated with analysis of graphs.Comment: 10 pages; International Conference on Computational Science workshop
on the Applications of Matrix Computational Methods in the Analysis of Modern
Dat
- âŠ