2,408 research outputs found

    Community detection and stochastic block models: recent developments

    Full text link
    The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

    Distributed Detection and Estimation in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSNs) are typically formed by a large number of densely deployed, spatially distributed sensors with limited sensing, computing, and communication capabilities that cooperate with each other to achieve a common goal. In this dissertation, we investigate the problem of distributed detection, classification, estimation, and localization in WSNs. In this context, the sensors observe the conditions of their surrounding environment, locally process their noisy observations, and send the processed data to a central entity, known as the fusion center (FC), through parallel communication channels corrupted by fading and additive noise. The FC will then combine the received information from the sensors to make a global inference about the underlying phenomenon, which can be either the detection or classification of a discrete variable or the estimation of a continuous one.;In the domain of distributed detection and classification, we propose a novel scheme that enables the FC to make a multi-hypothesis classification of an underlying hypothesis using only binary detections of spatially distributed sensors. This goal is achieved by exploiting the relationship between the influence fields characterizing different hypotheses and the accumulated noisy versions of local binary decisions as received by the FC, where the influence field of a hypothesis is defined as the spatial region in its surrounding in which it can be sensed using some sensing modality. In the realm of distributed estimation and localization, we make four main contributions: (a) We first formulate a general framework that estimates a vector of parameters associated with a deterministic function using spatially distributed noisy samples of the function for both analog and digital local processing schemes. ( b) We consider the estimation of a scalar, random signal at the FC and derive an optimal power-allocation scheme that assigns the optimal local amplification gains to the sensors performing analog local processing. The objective of this optimized power allocation is to minimize the L 2-norm of the vector of local transmission powers, given a maximum estimation distortion at the FC. We also propose a variant of this scheme that uses a limited-feedback strategy to eliminate the requirement of perfect feedback of the instantaneous channel fading coefficients from the FC to local sensors through infinite-rate, error-free links. ( c) We propose a linear spatial collaboration scheme in which sensors collaborate with each other by sharing their local noisy observations. We derive the optimal set of coefficients used to form linear combinations of the shared noisy observations at local sensors to minimize the total estimation distortion at the FC, given a constraint on the maximum average cumulative transmission power in the entire network. (d) Using a novel performance measure called the estimation outage, we analyze the effects of the spatial randomness of the location of the sensors on the quality and performance of localization algorithms by considering an energy-based source-localization scheme under the assumption that the sensors are positioned according to a uniform clustering process

    A generalized, parametric PR-QMF/wavelet transform design approach for multiresolution signal decomposition

    Get PDF
    This dissertation aims to emphasize the interrelations and the linkages of the theories of discrete-time filter banks and wavelet transforms. It is shown that the Binomial-QMF banks are identical to the interscale coefficients or filters of the compactly supported orthonormal wavelet transform bases proposed by Daubechies. A generalized, parametric, smooth 2-band PR-QMF design approach based on Bernstein polynomial approximation is developed. It is found that the most regular compact support orthonormal wavelet filters, coiflet filters are only the special cases of the proposed filter bank design technique. A new objective performance measure called Non-aliasing Energy Ratio(NER) is developed. Its merits are proven with the comparative performance studies of the well known orthonormal signal decomposition techniques. This dissertation also addresses the optimal 2-band PR-QMF design problem. The variables of practical significance in image processing and coding are included in the optimization problem. The upper performance bounds of 2-band PR-QMF and their corresponding filter coefficients are derived. It is objectively shown that there are superior filter bank solutions available over the standard block transform, DCT. It is expected that the theoretical contributions of this dissertation will find its applications particularly in Visual Signal Processing and Coding

    KBGIS-2: A knowledge-based geographic information system

    Get PDF
    The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2

    Analyzing and Enhancing Routing Protocols for Friend-to-Friend Overlays

    Get PDF
    The threat of surveillance by governmental and industrial parties is more eminent than ever. As communication moves into the digital domain, the advances in automatic assessment and interpretation of enormous amounts of data enable tracking of millions of people, recording and monitoring their private life with an unprecedented accurateness. The knowledge of such an all-encompassing loss of privacy affects the behavior of individuals, inducing various degrees of (self-)censorship and anxiety. Furthermore, the monopoly of a few large-scale organizations on digital communication enables global censorship and manipulation of public opinion. Thus, the current situation undermines the freedom of speech to a detrimental degree and threatens the foundations of modern society. Anonymous and censorship-resistant communication systems are hence of utmost importance to circumvent constant surveillance. However, existing systems are highly vulnerable to infiltration and sabotage. In particular, Sybil attacks, i.e., powerful parties inserting a large number of fake identities into the system, enable malicious parties to observe and possibly manipulate a large fraction of the communication within the system. Friend-to-friend (F2F) overlays, which restrict direct communication to parties sharing a real-world trust relationship, are a promising countermeasure to Sybil attacks, since the requirement of establishing real-world trust increases the cost of infiltration drastically. Yet, existing F2F overlays suffer from a low performance, are vulnerable to denial-of-service attacks, or fail to provide anonymity. Our first contribution in this thesis is concerned with an in-depth analysis of the concepts underlying the design of state-of-the-art F2F overlays. In the course of this analysis, we first extend the existing evaluation methods considerably, hence providing tools for both our and future research in the area of F2F overlays and distributed systems in general. Based on the novel methodology, we prove that existing approaches are inherently unable to offer acceptable delays without either requiring exhaustive maintenance costs or enabling denial-of-service attacks and de-anonymization. Consequentially, our second contribution lies in the design and evaluation of a novel concept for F2F overlays based on insights of the prior in-depth analysis. Our previous analysis has revealed that greedy embeddings allow highly efficient communication in arbitrary connectivity-restricted overlays by addressing participants through coordinates and adapting these coordinates to the overlay structure. However, greedy embeddings in their original form reveal the identity of the communicating parties and fail to provide the necessary resilience in the presence of dynamic and possibly malicious users. Therefore, we present a privacy-preserving communication protocol for greedy embeddings based on anonymous return addresses rather than identifying node coordinates. Furthermore, we enhance the communication’s robustness and attack-resistance by using multiple parallel embeddings and alternative algorithms for message delivery. We show that our approach achieves a low communication complexity. By replacing the coordinates with anonymous addresses, we furthermore provably achieve anonymity in the form of plausible deniability against an internal local adversary. Complementary, our simulation study on real-world data indicates that our approach is highly efficient and effectively mitigates the impact of failures as well as powerful denial-of-service attacks. Our fundamental results open new possibilities for anonymous and censorship-resistant applications.Die Bedrohung der Überwachung durch staatliche oder kommerzielle Stellen ist ein drängendes Problem der modernen Gesellschaft. Heutzutage findet Kommunikation vermehrt über digitale Kanäle statt. Die so verfügbaren Daten über das Kommunikationsverhalten eines Großteils der Bevölkerung in Kombination mit den Möglichkeiten im Bereich der automatisierten Verarbeitung solcher Daten erlauben das großflächige Tracking von Millionen an Personen, deren Privatleben mit noch nie da gewesener Genauigkeit aufgezeichnet und beobachtet werden kann. Das Wissen über diese allumfassende Überwachung verändert das individuelle Verhalten und führt so zu (Selbst-)zensur sowie Ängsten. Des weiteren ermöglicht die Monopolstellung einiger weniger Internetkonzernen globale Zensur und Manipulation der öffentlichen Meinung. Deshalb stellt die momentane Situation eine drastische Einschränkung der Meinungsfreiheit dar und bedroht die Grundfesten der modernen Gesellschaft. Systeme zur anonymen und zensurresistenten Kommunikation sind daher von ungemeiner Wichtigkeit. Jedoch sind die momentanen System anfällig gegen Sabotage. Insbesondere ermöglichen es Sybil-Angriffe, bei denen ein Angreifer eine große Anzahl an gefälschten Teilnehmern in ein System einschleust und so einen großen Teil der Kommunikation kontrolliert, Kommunikation innerhalb eines solchen Systems zu beobachten und zu manipulieren. F2F Overlays dagegen erlauben nur direkte Kommunikation zwischen Teilnehmern, die eine Vertrauensbeziehung in der realen Welt teilen. Dadurch erschweren F2F Overlays das Eindringen von Angreifern in das System entscheidend und verringern so den Einfluss von Sybil-Angriffen. Allerdings leiden die existierenden F2F Overlays an geringer Leistungsfähigkeit, Anfälligkeit gegen Denial-of-Service Angriffe oder fehlender Anonymität. Der erste Beitrag dieser Arbeit liegt daher in der fokussierten Analyse der Konzepte, die in den momentanen F2F Overlays zum Einsatz kommen. Im Zuge dieser Arbeit erweitern wir zunächst die existierenden Evaluationsmethoden entscheidend und erarbeiten so Methoden, die Grundlagen für unsere sowie zukünftige Forschung in diesem Bereich bilden. Basierend auf diesen neuen Evaluationsmethoden zeigen wir, dass die existierenden Ansätze grundlegend nicht fähig sind, akzeptable Antwortzeiten bereitzustellen ohne im Zuge dessen enorme Instandhaltungskosten oder Anfälligkeiten gegen Angriffe in Kauf zu nehmen. Folglich besteht unser zweiter Beitrag in der Entwicklung und Evaluierung eines neuen Konzeptes für F2F Overlays, basierenden auf den Erkenntnissen der vorangehenden Analyse. Insbesondere ergab sich in der vorangehenden Evaluation, dass Greedy Embeddings hoch-effiziente Kommunikation erlauben indem sie Teilnehmer durch Koordinaten adressieren und diese an die Struktur des Overlays anpassen. Jedoch sind Greedy Embeddings in ihrer ursprünglichen Form nicht auf anonyme Kommunikation mit einer dynamischen Teilnehmermengen und potentiellen Angreifern ausgelegt. Daher präsentieren wir ein Privätssphäre-schützenden Kommunikationsprotokoll für F2F Overlays, in dem die identifizierenden Koordinaten durch anonyme Adressen ersetzt werden. Des weiteren erhöhen wir die Resistenz der Kommunikation durch den Einsatz mehrerer Embeddings und alternativer Algorithmen zum Finden von Routen. Wir beweisen, dass unser Ansatz eine geringe Kommunikationskomplexität im Bezug auf die eigentliche Kommunikation sowie die Instandhaltung des Embeddings aufweist. Ferner zeigt unsere Simulationstudie, dass der Ansatz effiziente Kommunikation mit kurzen Antwortszeiten und geringer Instandhaltungskosten erreicht sowie den Einfluss von Ausfälle und Angriffe erfolgreich abschwächt. Unsere grundlegenden Ergebnisse eröffnen neue Möglichkeiten in der Entwicklung anonymer und zensurresistenter Anwendungen

    Probabilistic forecast reconciliation: theory, algorithm, and applications

    Get PDF
    Hierarchical time series are common in several applied fields. Forecasts are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this thesis, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a clear improvement over base probabilistic forecasts. We then study the effects of reconciliation on the forecast distribution, both from a theoretical viewpoint and using some examples with Bernoulli and Poisson distributions. We also present an application to count time series of extreme events on the Credit Default Swap (CDS) market. Finally, we introduce and study the p-Fourier Discrepancy Functions, a new family of metrics for comparing discrete probability measures
    corecore