220 research outputs found

    An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization

    Get PDF
    This dissertation focuses on the study of nonparametric quickest detection and its decentralized implementation in a distributed environment. Quickest detection schemes are geared toward detecting a change in the state of a data stream or a real-time process. Classical quickest detection schemes invariably assume knowledge of the pre-change and post-change distributions that may not be available in many applications. A distribution free nonparametric quickest detection procedure is presented based on a novel distance measure, referred to as the Q-Q distance calculated from the Quantile-Quantile plot. Theoretical analysis of the distance measure and detection procedure is presented to justify the proposed algorithm and provide performance guarantees. The Q-Q distance based detection procedure presents comparable performance compared to classical parametric detection procedure and better performance than other nonparametric procedures. The proposed procedure is most effective when detecting small changes. As the technology advances, distributed sensing and detection become feasible. Existing decentralized detection approaches are largely parametric. The decentralized realization of Q-Q distance based nonparametric quickest detection scheme is further studied, where data streams are simultaneously collected from multiple channels located distributively to jointly reach a detection decision. Two implementation schemes, binary quickest detection and local decision fusion, are described. Experimental results show that the proposed method has a comparable performance to the benchmark parametric cumulative sum (CUSUM) test in binary detection. Finally the dissertation concludes with a summary of the contributions to the state of the art

    Mean Estimation from One-Bit Measurements

    Full text link
    We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release nn bits given a sample of size nn, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error

    Cross validation of bi-modal health-related stress assessment

    Get PDF
    This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

    Real-Time Localization Using Software Defined Radio

    Get PDF
    Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devices’ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devices’ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system

    개인 사회망 네트워크 분석 기반 온라인 사회 공격자 탐지

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 김종권.In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, Weibo and LinkedIn. While SNSs provide diverse benefits – for example, fostering inter-personal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with spamming in Twitter and Weibo. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) normal users, and followed a normal user. Sometimes a spammer makes link farm to increase target accounts explicit influence. Based on the assumption that the online relationships of spammers are different from those of normal users, I proposed classification schemes that detect online social attackers including spammers. I firstly focused on ego-network social relations and devised two features, structural features based on Triad Significance Profile (TSP) and relational semantic features based on hierarchical homophily in an ego-network. Experiments on real Twitter and Weibo datasets demonstrated that the proposed approach is very practical. The proposed features are scalable because instead of analyzing the whole network, they inspect user-centered ego-networks. My performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.최근 우리는 Facebook, Twitter, Weibo, LinkedIn 등의 다양한 사회 관계망 서비스가 폭발적으로 성장하는 현상을 목격하였다. 하지만 사회 관계망 서비스가 개인과 개인간의 관계 및 커뮤니티 형성과 뉴스 전파 등의 여러 이점을 제공해 주고 있는데 반해 반갑지 않은 현상 역시 발생하고 있다. 스패머들은 사회 관계망 서비스를 동력 삼아 스팸을 매우 빠르고 넓게 전파하는 식으로 악용하고 있다. 스팸은 수신자가 원치 않는 메시지들을 일컽는데 이는 서비스의 신뢰도와 안정성을 크게 손상시킨다. 따라서, 스패머를 탐지하는 것이 현재 소셜 미디어에서 매우 긴급하고 중요한 문제가 되었다. 이 논문은 대표적인 사회 관계망 서비스들 중 Twitter와 Weibo에서 발생하는 스패밍을 다루고 있다. 이러한 유형의 스패밍들은 불특정 다수에게 메시지를 전파하는 대신에, 많은 일반 사용자들을 '팔로우(구독)'하고 이들로부터 '맞 팔로잉(맞 구독)'을 이끌어 내는 것을 목적으로 하기도 한다. 때로는 link farm을 이용해 특정 계정의 팔로워 수를 높이고 명시적 영향력을 증가시키기도 한다. 스패머의 온라인 관계망이 일반 사용자의 온라인 사회망과 다를 것이라는 가정 하에, 나는 스패머들을 포함한 일반적인 온라인 사회망 공격자들을 탐지하는 분류 방법을 제시한다. 나는 먼저 개인 사회망 내 사회 관계에 주목하고 두 가지 종류의 분류 특성을 제안하였다. 이들은 개인 사회망의 Triad Significance Profile (TSP)에 기반한 구조적 특성과 Hierarchical homophily에 기반한 관계 의미적 특성이다. 실제 Twitter와 Weibo 데이터셋에 대한 실험 결과는 제안한 방법이 매우 실용적이라는 것을 보여준다. 제안한 특성들은 전체 네트워크를 분석하지 않아도 개인 사회망만 분석하면 되기 때문에 scalable하게 측정될 수 있다. 나의 성능 분석 결과는 제안한 기법이 기존 방법에 비해 true positive와 false positive 측면에서 우수하다는 것을 보여준다.1 Introduction 1 2 Related Work 6 2.1 OSN Spammer Detection Approaches 6 2.1.1 Contents-based Approach 6 2.1.2 Social Network-based Approach 7 2.1.3 Subnetwork-based Approach 8 2.1.4 Behavior-based Approach 9 2.2 Link Spam Detection 10 2.3 Data mining schemes for Spammer Detection 10 2.4 Sybil Detection 12 3 Triad Significance Profile Analysis 14 3.1 Motivation 14 3.2 Twitter Dataset 18 3.3 Indegree and Outdegree of Dataset 20 3.4 Twitter spammer Detection with TSP 22 3.5 TSP-Filtering 27 3.6 Performance Evaluation of TSP-Filtering 29 4 Hierarchical Homophily Analysis 33 4.1 Motivation 33 4.2 Hierarchical Homophily in OSN 37 4.2.1 Basic Analysis of Datasets 39 4.2.2 Status gap distribution and Assortativity 44 4.2.3 Hierarchical gap distribution 49 4.3 Performance Evaluation of HH-Filtering 53 5 Overall Performance Evaluation 58 6 Conclusion 63 Bibliography 65Docto

    Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

    Get PDF
    We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

    Quantile Function-based Models for Resource Utilization and Power Consumption of Applications

    Get PDF
    Server consolidation is currently widely employed in order to improve the energy efficiency of data centers. While being a promising technique, server consolidation may lead to resource interference between applications and thus, reduced performance of applications. Current approaches to account for possible resource interference are not well suited to respect the variation in the workloads for the applications. As a consequence, these approaches cannot prevent resource interference if workload for applications vary. It is assumed that having models for the resource utilization and power consumption of applications as functions of the workload to the applications can improve decision making and help to prevent resource interference in scenarios with varying workload. This thesis aims to develop such models for selected applications. To produce varying workload that resembles statistical properties of real-world workload a workload generator is developed in a first step. Usually, the measurement data for such models origins from different sensors and equipment, all producing data at different frequencies. In order to account for these different frequencies, in a second step this thesis particularly investigates the feasibility to employ quantile functions as model inputs. Complementary, since conventional goodness-of-fit tests are not appropriate for this approach, an alternative to assess the estimation error is presented.:1 Introduction 2 Thesis Overview 2.1 Testbed 2.2 Contributions and Thesis Structure 2.3 Scope, Assumptions, and Limitations 3 Generation of Realistic Workload 3.1 Statistical Properties of Internet Traffic 3.2 Statistical Properties of Video Server Traffic 3.3 Implementation of Workload Generation 3.4 Summary 4 Models for Resource Utilization and for Power Consumption 4.1 Introduction 4.2 Prior Work 4.3 Test Cases 4.4 Applying Regression To Samples Of Different Length 4.5 Models for Resource Utilization as Function of Request Size 4.6 Models for Power Consumption as Function of Resource Utilization 4.7 Summary 5 Conclusion & Future Work 5.1 Summary 5.2 Future Work AppendicesServerkonsolidierung wird derzeit weithin zur Verbesserung der Energieeffizienz von Rechenzentren eingesetzt. Während diese Technik vielversprechende Ergebnisse zeitigt, kann sie zu Ressourceninterferenz und somit zu verringerter Performanz von Anwendungen führen. Derzeitige Ansätze, um dieses Problem zu adressieren, sind nicht gut für Szenarien geeignet, in denen die Workload für die Anwendungen variiert. Als Konsequenz daraus folgt, dass diese Ansätze Ressourceninterferenz in solchen Szenarien nicht verhindern können. Es wird angenommen, dass Modelle für Anwendungen, die deren Ressourenauslastung und die Leistungsaufnahme als Funktion der Workload beschreiben, die Entscheidungsfindung bei der Konsolidierung verbessern und Ressourceninterferenz verhindern können. Diese Arbeit zielt darauf ab, solche Modelle für ausgewählte Anwendungen zu entwickeln. Um variierende Workload zu erzeugen, welche den statistischen Eigenschaften realer Workload folgt, wird zunächst ein Workload-Generator entwickelt. Gewöhnlicherweise stammen Messdaten für die Modelle aus verschienenen Sensoren und Messgeräten, welche jeweils mit unterschiedlichen Frequenzen Daten erzeugen. Um diesen verschiedenen Frequenzen Rechnung zu tragen, untersucht diese Arbeit insbesondere die Möglichkeit, Quantilfunktionen als Eingabeparameter für die Modelle zu verwenden. Da konventionelle Anpassungsgütetests bei diesem Ansatz ungeeignet sind, wird ergänzend eine Alternative vorgestellt, um den durch die Modellierung entstehenden Schätzfehler zu bemessen.:1 Introduction 2 Thesis Overview 2.1 Testbed 2.2 Contributions and Thesis Structure 2.3 Scope, Assumptions, and Limitations 3 Generation of Realistic Workload 3.1 Statistical Properties of Internet Traffic 3.2 Statistical Properties of Video Server Traffic 3.3 Implementation of Workload Generation 3.4 Summary 4 Models for Resource Utilization and for Power Consumption 4.1 Introduction 4.2 Prior Work 4.3 Test Cases 4.4 Applying Regression To Samples Of Different Length 4.5 Models for Resource Utilization as Function of Request Size 4.6 Models for Power Consumption as Function of Resource Utilization 4.7 Summary 5 Conclusion & Future Work 5.1 Summary 5.2 Future Work Appendice
    corecore