67,481 research outputs found

    MulGuisin, a Topological Network Finder and its Performance on Galaxy Clustering

    Full text link
    We introduce a new clustering algorithm, MulGuisin (MGS), that can identify distinct galaxy over-densities using topological information from the galaxy distribution. This algorithm was first introduced in an LHC experiment as a Jet Finder software, which looks for particles that clump together in close proximity. The algorithm preferentially considers particles with high energies and merges them only when they are closer than a certain distance to create a jet. MGS shares some similarities with the minimum spanning tree (MST) since it provides both clustering and network-based topology information. Also, similar to the density-based spatial clustering of applications with noise (DBSCAN), MGS uses the ranking or the local density of each particle to construct clustering. In this paper, we compare the performances of clustering algorithms using controlled data and some realistic simulation data as well as the SDSS observation data, and we demonstrate that our new algorithm find networks most efficiently and it defines galaxy networks in a way that most closely resembles human vision.Comment: 15 pages,12 figure

    New Density-Based Clustering Technique

    Get PDF
    Density Based Spatial Clustering of Applications of Noise (DBSCAN) is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distribution of dataset. It simply uses the global minimum number of points (MinPts) parameter, so that the clustering result of multi-density database is inaccurate. In addition, when it used to cluster large databases, it will cost too much time. We try to solve these problems by integrated the grid-based in addition to using representative points in our new proposed density-based GMDBSCAN-UR clustering algorithm. In this research, we apply an unsupervised machine learning approach based on DBSCAN algorithm. We propose a grid-based cluster technique to reduce the time complexity. Grid-based technique divides the data space into cells. A number of well scattered points in each cell in the grid are chosen. These scattered points must capture the shape and extent of the dataset as all. Thus, our work in this research adopts a middle ground between the centroid-based and the all-point extremes. Next we treat all data in the same cell as an object, and all the operations of clustering are done on this cell. We make local clustering in each cell and merge between the resulted clusters. We use local MinPts for every cell in the grid to overcome the problem of undetermined clusters in multi-density datasets in clustering with DBSCAN clustering algorithm case. This will enhance the time complexity. Next step is labeling the not chosen points to the resulted clusters. Finally, we make post processing and noise elimination

    encephalitis in Florida

    Get PDF
    Background: Eastern Equine Encephalitis virus (EEEV) is an alphavirus with high pathogenicity in both humans and horses. Florida continues to have the highest occurrence of human cases in the USA, with four fatalities recorded in 2010. Unlike other states, Florida supports year-round EEEV transmission. This research uses GIS to examine spatial patterns of documented horse cases during 2005–2010 in order to understand the relationships between habitat and transmission intensity of EEEV in Florida. Methods: Cumulative incidence rates of EEE in horses were calculated for each county. Two cluster analyses were performed using density-based spatial clustering of applications with noise (DBSCAN). The first analysis was based on regional clustering while the second focused on local clustering. Ecological associations of EEEV were examined using compositional analysis and Euclidean distance analysis to determine if the proportion or proximity of certain habitats played a role in transmission. Results: The DBSCAN algorithm identified five distinct regional spatial clusters that contained 360 of the 438 horse cases. The local clustering resulted in 18 separate clusters containing 105 of the 438 cases. Both the compositional analysis and Euclidean distance analysis indicated that the top five habitats positively associated with horse cases were rural residential areas, crop and pastureland, upland hardwood forests, vegetated non-forested wetlands, an

    A Generalized Density-Based Algorithm for the Spatiotemporal Tracking of Drought Events

    Get PDF
    Drought events evolve simultaneously in space and time; hence, a proper characterization of an event re-quires the tracking of its full spatiotemporal evolution. Here we present a generalized algorithm for the tracking of drought events based on a three-dimensional application of the DBSCAN (density-based spatial clustering of applications with noise) clustering approach. The need for a generalized and flexible algorithm is dictated by the absence of a unanimous consensus on the definition of a drought event, which often depends on the target of the study. The proposed methodology introduces a set of six parameters that control both the spatial and the temporal connectivity between cells under drought conditions, also accounting for the local intensity of the drought itself. The capability of the algorithm to adapt to different drought definitions is tested successfully over a study case in Australia in the period 2017-20 using a set of standardized precipitation index (SPI) data derived from the ERA5 precipitation reanalysis. Insights on the possible range of variability of the model parameters, as well as on their effects on the delineation of drought events, are provided for the case of mete-orological droughts in order to incentivize further applications of the methodology

    Scaling DBSCAN-like algorithms for event detection systems in Twitter

    Get PDF
    The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft
    corecore