6,841 research outputs found

    On Efficiently Detecting Overlapping Communities over Distributed Dynamic Graphs

    Full text link
    Modern networks are of huge sizes as well as high dynamics, which challenges the efficiency of community detection algorithms. In this paper, we study the problem of overlapping community detection on distributed and dynamic graphs. Given a distributed, undirected and unweighted graph, the goal is to detect overlapping communities incrementally as the graph is dynamically changing. We propose an efficient algorithm, called \textit{randomized Speaker-Listener Label Propagation Algorithm} (rSLPA), based on the \textit{Speaker-Listener Label Propagation Algorithm} (SLPA) by relaxing the probability distribution of label propagation. Besides detecting high-quality communities, rSLPA can incrementally update the detected communities after a batch of edge insertion and deletion operations. To the best of our knowledge, rSLPA is the first algorithm that can incrementally capture the same communities as those obtained by applying the detection algorithm from the scratch on the updated graph. Extensive experiments are conducted on both synthetic and real-world datasets, and the results show that our algorithm can achieve high accuracy and efficiency at the same time.Comment: A short version of this paper will be published as ICDE'2018 poste

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    Applied deep learning in intelligent transportation systems and embedding exploration

    Get PDF
    Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization and data retrieval. The first part of this dissertation is about a Spatio-TEmporal Fuzzy neural Network (STEF-Net) which accurately predicts passenger demand by incorporating the complex interaction of all known important factors, such as temporal, spatial and external information. Specifically, a convolutional long short-term memory network is employed to simultaneously capture spatio-temporal feature interaction, and a fuzzy neural network to model external factors. A novel feature fusion method with convolution and an attention layer is proposed to keep the temporal relation and discriminative spatio-temporal feature interaction. Experiments on a large-scale real-world dataset show the proposed model outperforms the state-of-the-art approaches. The second part is a light-weight and energy-efficient system which detects transportation modes using only accelerometer sensors in smartphones. Understanding people’s transportation modes is beneficial to many civilian applications, such as urban transportation planning. The system collects accelerometer data in an efficient way and leverages a convolutional neural network to determine transportation modes. Different architectures and classification methods are tested with the proposed convolutional neural network to optimize the system design. Performance evaluation shows that the proposed approach achieves better accuracy than existing work in detecting people’s transportation modes. The third component of this dissertation is a deep reinforcement learning model, based on Q learning, to control the traffic light. Existing inefficient traffic light control causes numerous problems, such as long delay and waste of energy. In the proposed model, the complex traffic scenario is quantified as states by collecting data and dividing the whole intersection into grids. The timing changes of a traffic light are the actions, which are modeled as a high-dimension Markov decision process. The reward is the cumulative waiting time difference between two cycles. To solve the model, a convolutional neural network is employed to map states to rewards, which is further optimized by several components, such as dueling network, target network, double Q-learning network, and prioritized experience replay. The simulation results in Simulation of Urban MObility (SUMO) show the efficiency of the proposed model in controlling traffic lights. The last part of this dissertation studies the hierarchical structure in an embedding system. Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which generates storage-inefficient representation and fails to effectively encode the internal semantic structure of data. A regularized autoencoder framework is proposed to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of data points, aiming at capturing semantic structures of data. Experimental results on synthetic and real-world datasets show that the proposed HKD embedding can effectively reveal the semantic structure of data via visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy

    Level-Set Based Artery-Vein Separation in Blood Pool Agent CE-MR Angiograms

    Get PDF
    Blood pool agents (BPAs) for contrast-enhanced (CE) magnetic-resonance angiography (MRA) allow prolonged imaging times for higher contrast and resolution. Imaging is performed during the steady state when the contrast agent is distributed through the complete vascular system. However, simultaneous venous and arterial enhancement in this steady state hampers interpretation. In order to improve visualization of the arteries and veins from steady-state BPA data, a semiautomated method for artery-vein separation is presented. In this method, the central arterial axis and central venous axis are used as initializations for two surfaces that simultaneously evolve in order to capture the arterial and venous parts of the vasculature using the level-set framework. Since arteries and veins can be in close proximity of each other, leakage from the evolving arterial (venous) surface into the venous (arterial) part of the vasculature is inevitable. In these situations, voxels are labeled arterial or venous based on the arrival time of the respective surface. The evolution is steered by external forces related to feature images derived from the image data and by internal forces related to the geometry of the level sets. In this paper, the robustness and accuracy of three external forces (based on image intensity, image gradient, and vessel-enhancement filtering) and combinations of them are investigated and tested on seven patient datasets. To this end, results with the level-set-based segmentation are compared to the reference-standard manually obtained segmentations. Best results are achieved by applying a combination of intensity- and gradient-based forces and a smoothness constraint based on the curvature of the surface. By applying this combination to the seven datasets, it is shown that, with minimal user interaction, artery-vein separation for improved arterial and venous visualization in BPA CE-MRA can be achieved

    Hyperspectral-Augmented Target Tracking

    Get PDF
    With the global war on terrorism, the nature of military warfare has changed significantly. The United States Air Force is at the forefront of research and development in the field of intelligence, surveillance, and reconnaissance that provides American forces on the ground and in the air with the capability to seek, monitor, and destroy mobile terrorist targets in hostile territory. One such capability recognizes and persistently tracks multiple moving vehicles in complex, highly ambiguous urban environments. The thesis investigates the feasibility of augmenting a multiple-target tracking system with hyperspectral imagery. The research effort evaluates hyperspectral data classification using fuzzy c-means and the self-organizing map clustering algorithms for remote identification of moving vehicles. Results demonstrate a resounding 29.33% gain in performance from the baseline kinematic-only tracking to the hyperspectral-augmented tracking. Through a novel methodology, the hyperspectral observations are integrated in the MTT paradigm. Furthermore, several novel ideas are developed and implemented—spectral gating of hyperspectral observations, a cost function for hyperspectral observation-to-track association, and a self-organizing map filtering method. It appears that relatively little work in the target tracking and hyperspectral image classification literature exists that addresses these areas. Finally, two hyperspectral sensor modes are evaluated—Pushbroom and Region-of-Interest. Both modes are based on realistic technologies, and investigating their performance is the goal of performance-driven sensing. Performance comparison of the two modes can drive future design of hyperspectral sensors

    Comprehensive Review on Detection and Classification of Power Quality Disturbances in Utility Grid With Renewable Energy Penetration

    Get PDF
    The global concern with power quality is increasing due to the penetration of renewable energy (RE) sources to cater the energy demands and meet de-carbonization targets. Power quality (PQ) disturbances are found to be more predominant with RE penetration due to the variable outputs and interfacing converters. There is a need to recognize and mitigate PQ disturbances to supply clean power to the consumer. This article presents a critical review of techniques used for detection and classification PQ disturbances in the utility grid with renewable energy penetration. The broad perspective of this review paper is to provide various concepts utilized for extraction of the features to detect and classify the PQ disturbances even in the noisy environment. More than 220 research publications have been critically reviewed, classified and listed for quick reference of the engineers, scientists and academicians working in the power quality area

    The path inference filter: model-based low-latency map matching of probe vehicle data

    Full text link
    We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 10 seconds and 2 minutes. We introduce a new class of algorithms, called altogether path inference filter (PIF), that maps GPS data in real time, for a variety of trade-offs and scenarios, and with a high throughput. Numerous prior approaches in map-matching can be shown to be special cases of the path inference filter presented in this article. We present an efficient procedure for automatically training the filter on new data, with or without ground truth observations. The framework is evaluated on a large San Francisco taxi dataset and is shown to improve upon the current state of the art. This filter also provides insights about driving patterns of drivers. The path inference filter has been deployed at an industrial scale inside the Mobile Millennium traffic information system, and is used to map fleets of data in San Francisco, Sacramento, Stockholm and Porto.Comment: Preprint, 23 pages and 23 figure
    • …
    corecore