67,481 research outputs found
MulGuisin, a Topological Network Finder and its Performance on Galaxy Clustering
We introduce a new clustering algorithm, MulGuisin (MGS), that can identify
distinct galaxy over-densities using topological information from the galaxy
distribution. This algorithm was first introduced in an LHC experiment as a Jet
Finder software, which looks for particles that clump together in close
proximity. The algorithm preferentially considers particles with high energies
and merges them only when they are closer than a certain distance to create a
jet. MGS shares some similarities with the minimum spanning tree (MST) since it
provides both clustering and network-based topology information. Also, similar
to the density-based spatial clustering of applications with noise (DBSCAN),
MGS uses the ranking or the local density of each particle to construct
clustering. In this paper, we compare the performances of clustering algorithms
using controlled data and some realistic simulation data as well as the SDSS
observation data, and we demonstrate that our new algorithm find networks most
efficiently and it defines galaxy networks in a way that most closely resembles
human vision.Comment: 15 pages,12 figure
New Density-Based Clustering Technique
Density Based Spatial Clustering of Applications of Noise (DBSCAN) is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distribution of dataset. It simply uses the global minimum number of points (MinPts) parameter, so that the clustering result of multi-density database is inaccurate. In addition, when it used to cluster large databases, it will cost too much time. We try to solve these problems by integrated the grid-based in addition to using representative points in our new proposed density-based GMDBSCAN-UR clustering algorithm. In this research, we apply an unsupervised machine learning approach based on DBSCAN algorithm. We propose a grid-based cluster technique to reduce the time complexity. Grid-based technique divides the data space into cells. A number of well scattered points in each cell in the grid are chosen. These scattered points must capture the shape and extent of the dataset as all. Thus, our work in this research adopts a middle ground between the centroid-based and the all-point extremes. Next we treat all data in the same cell as an object, and all the operations of clustering are done on this cell. We make local clustering in each cell and merge between the resulted clusters. We use local MinPts for every cell in the grid to overcome the problem of undetermined clusters in multi-density datasets in clustering with DBSCAN clustering algorithm case. This will enhance the time complexity. Next step is labeling the not chosen points to the resulted clusters. Finally, we make post processing and noise elimination
encephalitis in Florida
Background: Eastern Equine Encephalitis virus (EEEV) is an alphavirus with high pathogenicity in both humans and horses. Florida continues to have the highest occurrence of human cases in the USA, with four fatalities recorded in 2010. Unlike other states, Florida supports year-round EEEV transmission. This research uses GIS to examine spatial patterns of documented horse cases during 2005–2010 in order to understand the relationships between habitat and transmission intensity of EEEV in Florida. Methods: Cumulative incidence rates of EEE in horses were calculated for each county. Two cluster analyses were performed using density-based spatial clustering of applications with noise (DBSCAN). The first analysis was based on regional clustering while the second focused on local clustering. Ecological associations of EEEV were examined using compositional analysis and Euclidean distance analysis to determine if the proportion or proximity of certain habitats played a role in transmission. Results: The DBSCAN algorithm identified five distinct regional spatial clusters that contained 360 of the 438 horse cases. The local clustering resulted in 18 separate clusters containing 105 of the 438 cases. Both the compositional analysis and Euclidean distance analysis indicated that the top five habitats positively associated with horse cases were rural residential areas, crop and pastureland, upland hardwood forests, vegetated non-forested wetlands, an
A Generalized Density-Based Algorithm for the Spatiotemporal Tracking of Drought Events
Drought events evolve simultaneously in space and time; hence, a proper characterization of an event re-quires the tracking of its full spatiotemporal evolution. Here we present a generalized algorithm for the tracking of drought events based on a three-dimensional application of the DBSCAN (density-based spatial clustering of applications with noise) clustering approach. The need for a generalized and flexible algorithm is dictated by the absence of a unanimous consensus on the definition of a drought event, which often depends on the target of the study. The proposed methodology introduces a set of six parameters that control both the spatial and the temporal connectivity between cells under drought conditions, also accounting for the local intensity of the drought itself. The capability of the algorithm to adapt to different drought definitions is tested successfully over a study case in Australia in the period 2017-20 using a set of standardized precipitation index (SPI) data derived from the ERA5 precipitation reanalysis. Insights on the possible range of variability of the model parameters, as well as on their effects on the delineation of drought events, are provided for the case of mete-orological droughts in order to incentivize further applications of the methodology
Recommended from our members
ADCN: An Anisotropic Density-Based Clustering Algorithm for Discovering Spatial Point Patterns with Noise
Density-based clustering algorithms such as DBSCAN have been widely used for spatial knowledge discovery as they offer several key advantages compared to other clustering algorithms. They can discover clusters with arbitrary shapes, are robust to noise and do not require prior knowledge (or estimation) of the number of clusters. The idea of using a scan circle centered at each point with a search radius Eps to find at least MinPts points as a criterion for deriving local density is easily understandable and sufficient for exploring isotropic spatial point patterns. However, there are many cases that cannot be adequately captured this way, particularly if they involve linear features or shapes with a continuously changing density such as a spiral. In such cases, DBSCAN tends to either create an increasing number of small clusters or add noise points into large clusters. Therefore, in this paper, we propose a novel anisotropic density-based clustering algorithm (ADCN). To motivate our work, we introduce synthetic and real-world cases that cannot be sufficiently handled by DBSCAN (and OPTICS). We then present our clustering algorithm and test it with a wide range of cases. We demonstrate that our algorithm can perform as equally well as DBSCAN in cases that do not explicitly benefit from an anisotropic perspective and that it outperforms DBSCAN in cases that do. We show that our approach has the same time complexity as DBSCAN and OPTICS, namely O(n log n) when using a spatial index and O(n 2 ) otherwise. We provide an implementation and test the runtime over multiple cases. Finally, we apply DBSCAN, OPTICS, and ADCN to the task of extracting urban areas of interest (AOI) from geotagged photos in six cities. Visual comparison shows that, comparing to DBSCAN and OPTICS, ADCN is inclined to extract AOIs with linear shapes which follow the underline road networks. ADCN also turns out to connect clusters when the spatial distribution of them shows similar directions
Scaling DBSCAN-like algorithms for event detection systems in Twitter
The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft
- …