999 research outputs found
MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail!
Hadoop is currently the large-scale data analysis "hammer" of choice, but
there exist classes of algorithms that aren't "nails", in the sense that they
are not particularly amenable to the MapReduce programming model. To address
this, researchers have proposed MapReduce extensions or alternative programming
models in which these algorithms can be elegantly expressed. This essay
espouses a very different position: that MapReduce is "good enough", and that
instead of trying to invent screwdrivers, we should simply get rid of
everything that's not a nail. To be more specific, much discussion in the
literature surrounds the fact that iterative algorithms are a poor fit for
MapReduce: the simple solution is to find alternative non-iterative algorithms
that solve the same problem. This essay captures my personal experiences as an
academic researcher as well as a software engineer in a "real-world" production
analytics environment. From this combined perspective I reflect on the current
state and future of "big data" research
Deep Residual Learning for Small-Footprint Keyword Spotting
We explore the application of deep residual learning and dilated convolutions
to the keyword spotting task, using the recently-released Google Speech
Commands Dataset as our benchmark. Our best residual network (ResNet)
implementation significantly outperforms Google's previous convolutional neural
networks in terms of accuracy. By varying model depth and width, we can achieve
compact models that also outperform previous small-footprint variants. To our
knowledge, we are the first to examine these approaches for keyword spotting,
and our results establish an open-source state-of-the-art reference to support
the development of future speech-based interfaces.Comment: Published in ICASSP 201
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
- …