2,426 research outputs found
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Webly Supervised Learning of Convolutional Networks
We present an approach to utilize large amounts of web data for learning
CNNs. Specifically inspired by curriculum learning, we present a two-step
approach for CNN training. First, we use easy images to train an initial visual
representation. We then use this initial CNN and adapt it to harder, more
realistic images by leveraging the structure of data and categories. We
demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on
ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly
supervised learning by localizing objects in web images and training a R-CNN
style detector. It achieves the best performance on VOC 2007 where no VOC
training data is used. Finally, we show our approach is quite robust to noise
and performs comparably even when we use image search results from March 2013
(pre-CNN image search era)
Pattern Discovery in DNS Query Traffic
AbstractDNS provides a critical function in directing Internet traffic. Traditional rule-based anomaly or intrusion detection methods are not able to update the rules dynamically. Data mining based approaches can find various patterns in massive dynamic query traffic data. In this paper, a novel periodic trend mining method is proposed, as well as a periodic trend pattern based traffic prediction method. Clustering is adopted to partition numerous domain names into separate groups by the characteristics of their query traffic time series. Experimental results on a real-word DNS log indicate data mining based approaches are promising in the domain of DNS service
Inferring the Origin Locations of Tweets with Quantitative Confidence
Social Internet content plays an increasingly critical role in many domains,
including public health, disaster management, and politics. However, its
utility is limited by missing geographic information; for example, fewer than
1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable,
content-based approach to estimate the location of tweets using a novel yet
simple variant of gaussian mixture models. Further, because real-world
applications depend on quantified uncertainty for such estimates, we propose
novel metrics of accuracy, precision, and calibration, and we evaluate our
approach accordingly. Experiments on 13 million global, comprehensively
multi-lingual tweets show that our approach yields reliable, well-calibrated
results competitive with previous computationally intensive methods. We also
show that a relatively small number of training data are required for good
estimates (roughly 30,000 tweets) and models are quite time-invariant
(effective on tweets many weeks newer than the training set). Finally, we show
that toponyms and languages with small geographic footprint provide the most
useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new
references, various other presentation improvements. Version 3: Various
presentation improvements, accepted at ACM CSCW 201
- …