Search CORE

2,129 research outputs found

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Graph Sample and Hold: A Framework for Big-Graph Analytics

Author: Ahmed Nesreen K.
Duffield Nick
Kompella Ramana
Neville Jennifer
Publication venue
Publication date: 16/03/2014
Field of study

Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH). To begin, the proposed framework samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state. We then show how to produce unbiased estimators for various graph properties from the sample. Given that the graph analysis algorithms will run on a sample instead of the whole population, the runtime complexity of these algorithm is kept under control. Moreover, given that the estimators of graph properties are unbiased, the approximation error is kept under control. Finally, we show the performance of the proposed framework (gSH) on various types of graphs, such as social graphs, among others

arXiv.org e-Print Archive

CiteSeerX

Mapping crime: Understanding Hotspots

Author: Cameron J
Chainey S
Eck J
Wilson R
Publication venue: National Institute of Justice
Publication date: 01/08/2005
Field of study

UCL Discovery

Hybrid group anomaly detection for sequence data: application to trajectory data analytics

Author: Belhadi Asma
Cano Alberto
Djenouri Youcef
Lin Jerry Chun-Wei
Srivastava Gautam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Many research areas depend on group anomaly detection. The use of group anomaly detection can maintain and provide security and privacy to the data involved. This research attempts to solve the deficiency of the existing literature in outlier detection thus a novel hybrid framework to identify group anomaly detection from sequence data is proposed in this paper. It proposes two approaches for efficiently solving this problem: i) Hybrid Data Mining-based algorithm, consists of three main phases: first, the clustering algorithm is applied to derive the micro-clusters. Second, the kNN algorithm is applied to each micro-cluster to calculate the candidates of the group's outliers. Third, a pattern mining framework gets applied to the candidates of the group's outliers as a pruning strategy, to generate the groups of outliers, and ii) a GPU-based approach is presented, which benefits from the massively GPU computing to boost the runtime of the hybrid data mining-based algorithm. Extensive experiments were conducted to show the advantages of different sequence databases of our proposed model. Results clearly show the efficiency of a GPU direction when directly compared to a sequential approach by reaching a speedup of 451. In addition, both approaches outperform the baseline methods for group detection.acceptedVersio

SINTEF Open

Kristiania Open Archive

NORA - Norwegian Open Research Archives

A Spatiotemporal analysis to identify Naturally Occurring Retirement Communities in Nebraska

Author: Lee Sangho
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/05/2018
Field of study

This study aims to identify the geographic locations of “naturally occurring retirement communities (NORCs)” and whether there were spatiotemporal patterns of naturally occurring retirement communities in Nebraska for the time periods of 2000 to 2010, and to 2015. As the American population continues to age, older people generally prefer to live in their own homes for later years of life, instead of moving into assisted living. These demands have resulted in the increase of elderly populations who are “aging in place”. Nevertheless, there have been few spatiotemporal analyses about the distribution patterns of elderly households in terms of NORCs for the state of Nebraska. In this study, the entire area within the state’s boundaries was subdivided into block groups and the spatial statistics of demographic patterns were analyzed over time. For this study, U.S. Census data from 2000, 2010, and 2015 were aggregated by block groups which include the total number of households and proportion of households (owners/renters) in Nebraska. Three analyses were conducted on the data. First, the geovisualization method with ArcGIS 10.4 was used to visually investigate the distribution and changes of NORCs from 2000 to 2010, and to 2015. Second, Global Moran’s I was used to quantify the spatial relationship of NORCs in Nebraska. Third, various methods of spatial statistics were used to identify clusters between NORCs and other block groups: Local Moran’s and G-statistics. Over the past 15 years, the proportion of elderly households in Nebraska has steadily increased, and the rate of increase has risen sharply over the recent five years, as of 2015. As a result, the number of NORCs has also increased, and 47 of the total NORCs (57.3%) were classified as the aging in place type of NORCs. In addition, block groups with similar proportion of households have clustered spatially together or formed hot-spots. This study contributes to understanding the concept of NORCs relative to the residents “aging in place” and policy makers. Local government should take appropriate steps to prepare for the super aging society by rearranging and integrating given resources as much as possible. By taking full advantage of results of this study, the government should develop community-based policies to support the older residents aging in place. Because of the population density and proximity of older residents in NORCs, economies of scale are able to rethink how to organize and deliver services, giving the opportunity to make our communities better for those retired seniors. Advisor: Yunwoo Na

DigitalCommons@University of Nebraska

Reflecting Human Knowledge of Place and Route-Choice Behavior Using Big Data

Author: Chen Jiaoli
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2017
Field of study

Exploring human knowledge of geographical space and related behavior not only helps in understanding human-environment interactions and dynamic geographic processes, but also advances Geographic Information Systems (GIS) toward a human-centric paradigm to make daily life more efficient. Today’s relatively easy acquisition of various big data provides an unprecedented opportunity for geographers to answer research questions that previously could not be adequately addressed. However, new challenges also arise regarding data quality and bias as well as change in methodology for dealing with big data that are different from traditional data types. Representing people’s perception of place and studying driver’s route-choice behavior are two of the many applications of big data in answering research questions about human knowledge and behavior in the fields of GIS and transportation. Incorporating three papers, this dissertation focuses on these two different applications to achieve the following objectives: 1) examine the degree to which a geographic place’s spatial extent can be estimated from human-generated geotagged photos; 2) address the challenge of geotagged photos’ uneven spatial distribution in place estimation and explore an approach that can better derive a place’s spatial extent; 3) develop a method that can properly estimate the spatial extent of a place that has multiple disjoint regions while considering geotagged photos’ uneven distribution; 4) explore useful spatiotemporal patterns of taxi drivers’ route-choice behavior in a dynamic urban environment. This dissertation makes three major contributions to big data applications’ systematic theory: 1) proposes an effective approach to handling the uneven spatial distribution problem of geotagged photos as a type of volunteered geographic data by modeling their representativeness; 2) develops methods that can properly derive the vague spatial extent of a place with or without disjoint regions; and 3) explores taxi drivers’ route-choice patterns in different situations that can inform future transportation decisions and policy-making processes

University of Tennessee, Knoxville: Trace