56,562 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Efficient Regularization of Squared Curvature
Curvature has received increased attention as an important alternative to
length based regularization in computer vision. In contrast to length, it
preserves elongated structures and fine details. Existing approaches are either
inefficient, or have low angular resolution and yield results with strong block
artifacts. We derive a new model for computing squared curvature based on
integral geometry. The model counts responses of straight line triple cliques.
The corresponding energy decomposes into submodular and supermodular pairwise
potentials. We show that this energy can be efficiently minimized even for high
angular resolutions using the trust region framework. Our results confirm that
we obtain accurate and visually pleasing solutions without strong artifacts at
reasonable run times.Comment: 8 pages, 12 figures, to appear at IEEE conference on Computer Vision
and Pattern Recognition (CVPR), June 201
Segmentation of ultrasound images of thyroid nodule for assisting fine needle aspiration cytology
The incidence of thyroid nodule is very high and generally increases with the
age. Thyroid nodule may presage the emergence of thyroid cancer. The thyroid
nodule can be completely cured if detected early. Fine needle aspiration
cytology is a recognized early diagnosis method of thyroid nodule. There are
still some limitations in the fine needle aspiration cytology, and the
ultrasound diagnosis of thyroid nodule has become the first choice for
auxiliary examination of thyroid nodular disease. If we could combine medical
imaging technology and fine needle aspiration cytology, the diagnostic rate of
thyroid nodule would be improved significantly. The properties of ultrasound
will degrade the image quality, which makes it difficult to recognize the edges
for physicians. Image segmentation technique based on graph theory has become a
research hotspot at present. Normalized cut (Ncut) is a representative one,
which is suitable for segmentation of feature parts of medical image. However,
how to solve the normalized cut has become a problem, which needs large memory
capacity and heavy calculation of weight matrix. It always generates over
segmentation or less segmentation which leads to inaccurate in the
segmentation. The speckle noise in B ultrasound image of thyroid tumor makes
the quality of the image deteriorate. In the light of this characteristic, we
combine the anisotropic diffusion model with the normalized cut in this paper.
After the enhancement of anisotropic diffusion model, it removes the noise in
the B ultrasound image while preserves the important edges and local details.
This reduces the amount of computation in constructing the weight matrix of the
improved normalized cut and improves the accuracy of the final segmentation
results. The feasibility of the method is proved by the experimental results.Comment: 15pages,13figure
Outlier Detection Techniques For Wireless Sensor Networks: A Survey
In the field of wireless sensor networks, measurements that
significantly deviate from the normal pattern of sensed data are
considered as outliers. The potential sources of outliers include
noise and errors, events, and malicious attacks on the network.
Traditional outlier detection techniques are not directly
applicable to wireless sensor networks due to the multivariate
nature of sensor data and specific requirements and limitations of
the wireless sensor networks. This survey provides a comprehensive
overview of existing outlier detection techniques specifically
developed for the wireless sensor networks. Additionally, it
presents a technique-based taxonomy and a decision tree to be used
as a guideline to select a technique suitable for the application
at hand based on characteristics such as data type, outlier type,
outlier degree
- …