24 research outputs found
Doctor of Philosophy
dissertationThe next generation mobile network (i.e., 5G network) is expected to host emerging use cases that have a wide range of requirements; from Internet of Things (IoT) devices that prefer low-overhead and scalable network to remote machine operation or remote healthcare services that require reliable end-to-end communications. Improving scalability and reliability is among the most important challenges of designing the next generation mobile architecture. The current (4G) mobile core network heavily relies on hardware-based proprietary components. The core networks are expensive and therefore are available in limited locations in the country. This leads to a high end-to-end latency due to the long latency between base stations and the mobile core, and limitations in having innovations and an evolvable network. Moreover, at the protocol level the current mobile network architecture was designed for a limited number of smart-phones streaming a large amount of high quality traffic but not a massive number of low-capability devices sending small and sporadic traffic. This results in high-overhead control and data planes in the mobile core network that are not suitable for a massive number of future Internet-of-Things (IoT) devices. In terms of reliability, network operators already deployed multiple monitoring sys- tems to detect service disruptions and fix problems when they occur. However, detecting all service disruptions is challenging. First, there is a complex relationship between the network status and user-perceived service experience. Second, service disruptions could happen because of reasons that are beyond the network itself. With technology advancements in Software-defined Network (SDN) and Network Func- tion Virtualization (NFV), the next generation mobile network is expected to be NFV-based and deployed on NFV platforms. However, in contrast to telecom-grade hardware with built-in redundancy, commodity off-the-shell (COTS) hardware in NFV platforms often can't be comparable in term of reliability. Availability of Telecom-grade mobile core network hardwares is typically 99.999% (i.e., "five-9s" availability) while most NFV platforms only guarantee "three-9s" availability - orders of magnitude less reliable. Therefore, an NFV-based mobile core network needs extra mechanisms to guarantee its availability. This Ph.D. dissertation focuses on using SDN/NFV, data analytics and distributed system techniques to enhance scalability and reliability of the next generation mobile core network. The dissertation makes the following contributions. First, it presents SMORE, a practical offloading architecture that reduces end-to-end latency and enables new functionalities in mobile networks. It then presents SIMECA, a light-weight and scalable mobile core network designed for a massive number of future IoT devices. Second, it presents ABSENCE, a passive service monitoring system using customer usage and data analytics to detect silent failures in an operational mobile network. Lastly, it presents ECHO, a distributed mobile core network architecture to improve availability of NFV-based mobile core network in public clouds
Converged Networks and Traffic Tomography by Using Evolutionary Algorithms
Tomografie síťového provozu představuje dnes již nedílnou součást v oblasti konvergovaných sítí a systémů k detekci jejich behaviorálních vlastností. Dizertační práce se zabývá výzkumem její implementace s využitím evolučních algoritmů. Výzkum byl zejména soustředěn na inovaci a řešení behaviorální detekce toků dat v sítích a jejich anomálií s využitím síťové tomografie a evolučních algoritmů. V rámci řešení dizertační práce byl navržen nový algoritmus, vycházející ze základů statistické metody analýzy přežití v kombinaci s algoritmem genetickým. Navržený algoritmus byl testován ve vlastním vytvořeném modelu síťové sondy za pomocí programovacího jazyka Python a laboratorních síťových zařízení Cisco. Provedené testy prokázaly základní funkčnost navrženého řešení.Nowadays, the traffic tomography represents an integral component in converged networks and systems for detecting their behavioral characteristics. The dissertation deals with research of its implementation with the use of evolutionary algorithms. The research was mainly focused on innovation and solving behavioral detection data flows in networks and network anomalies using tomography and evolutionary algorithms. Within the dissertation has been proposed a new algorithm, emerging from the basics of the statistical method survival analysis, combined with a genetics’ algorithm. The proposed algorithm was tested in a model of a self-created network probe using the Python programming language and Cisco laboratory network devices. Performed tests have shown the basic functionality of the proposed solution.
Combating Attacks and Abuse in Large Online Communities
Internet users today are connected more widely and ubiquitously than ever before. As a result, various online communities are formed, ranging from online social networks (Facebook, Twitter), to mobile communities (Foursquare, Waze), to content/interests based networks (Wikipedia, Yelp, Quora). While users are benefiting from the ease of access to information and social interactions, there is a growing concern for users' security and privacy against various attacks such as spam, phishing, malware infection and identity theft. Combating attacks and abuse in online communities is challenging. First, today’s online communities are increasingly dependent on users and user-generated content. Securing online systems demands a deep understanding of the complex and often unpredictable human behaviors. Second, online communities can easily have millions or even billions of users, which requires the corresponding security mechanisms to be highly scalable. Finally, cybercriminals are constantly evolving to launch new types of attacks. This further demands high robustness of security defenses. In this thesis, we take concrete steps towards measuring, understanding, and defending against attacks and abuse in online communities. We begin with a series of empirical measurements to understand user behaviors in different online services and the uniquesecurity and privacy challenges that users are facing with. This effort covers a broad set of popular online services including social networks for question and answering (Quora), anonymous social networks (Whisper), and crowdsourced mobile communities (Waze). Despite the differences of specific online communities, our study provides a first look at their user activity patterns based on empirical data, and reveals the need for reliable mechanisms to curate user content, protect privacy, and defend against emerging attacks. Next, we turn our attention to attacks targeting online communities, with focus on spam campaigns. While traditional spam is mostly generated by automated software, attackers today start to introduce "human intelligence" to implement attacks. This is maliciouscrowdsourcing (or crowdturfing) where a large group of real-users are organized to carry out malicious campaigns, such as writing fake reviews or spreading rumors on social media. Using collective human efforts, attackers can easily bypass many existing defenses (e.g.,CAPTCHA). To understand the ecosystem of crowdturfing, we first use measurements to examine their detailed campaign organization, workers and revenue. Based on insights from empirical data, we develop effective machine learning classifiers to detect crowdturfingactivities. In the meantime, considering the adversarial nature of crowdturfing, we also build practical adversarial models to simulate how attackers can evade or disrupt machine learning based defenses. To aid in this effort, we next explore using user behavior models to detect a wider range of attacks. Instead of making assumptions about attacker behavior, our idea is to model normal user behaviors and capture (malicious) behaviors that are deviated from norm. In this way, we can detect previously unknown attacks. Our behavior model is based on detailed clickstream data, which are sequences of click events generated by users when using the service. We build a similarity graph where each user is a node and the edges are weightedby clickstream similarity. By partitioning this graph, we obtain "clusters" of users with similar behaviors. We then use a small set of known good users to "color" these clusters to differentiate the malicious ones. This technique has been adopted by real-world social networks (Renren and LinkedIn), and already detected unexpected attacks. Finally, we extend clickstream model to understanding more-grained behaviors of attackers (and real users), and tracking how user behavior changes over time. In summary, this thesis illustrates a data-driven approach to understanding and defending against attacks and abuse in online communities. Our measurements have revealed new insights about how attackers are evolving to bypass existing security defenses today. Inaddition, our data-driven systems provide new solutions for online services to gain a deep understanding of their users, and defend them from emerging attacks and abuse
A Large-Scale Network Data Analysis via Sparse and Low Rank Reconstruction
With the rapid growth of data communications in size and complexity, the threat of malicious activities and computer crimes has increased accordingly as well. Thus, investigating efficient data processing techniques for network operation and management over large-scale network traffic is highly required. Some mathematical approaches on flow-level traffic data have been proposed due to the importance of analyzing the structure and situation of the network. Different from the state-of-the-art studies, we first propose a new decomposition model based on accelerated proximal gradient method for packet-level traffic data. In addition, we present the iterative scheme of the algorithm for network anomaly detection problem, which is termed as NAD-APG. Based on the approach, we carry out the intrusion detection for packet-level network traffic data no matter whether it is polluted by noise or not. Finally, we design a prototype system for network anomalies detection such as Probe and R2L attacks. The experiments have shown that our approach is effective in revealing the patterns of network traffic data and detecting attacks from large-scale network traffic. Moreover, the experiments have demonstrated the robustness of the algorithm as well even when the network traffic is polluted by the large volume anomalies and noise
Spectral Properties of the Koopman Operator in the Analysis of Nonstationary Dynamical Systems
The dominating methodology used in the study of dynamical systems is the geometric picture introduced by Poincare. The focus is on the structure of the state space and the asymptotic behavior of trajectories. Special solutions such as fixed points and limit cycles, along with their stable and unstable manifolds, are of interest due to their ability to organize the trajectories in the surrounding state space.Another viewpoint that is becoming more prevalent is the operator-theoretic / functional-analytic one which describes the system in terms of the evolution of functions or measures defined on the state space. Part I of this doctoral dissertation focuses on the Koopman, or composition, operator that determines how a function on the state space evolves as the state trajectories evolve. Most current studies involving the Koopman operator have dealt with its spectral properties that are induced by dynamical systems that are, in some sense, stationary (in the probabilistic sense). The dynamical systems studied are either measure-preserving or initial conditions for trajectories are restricted to an attractor for the system. In these situations, only the point spectrum on the unit circle is considered; this part of the spectrum is called the unimodular spectrum. This work investigates relaxations of these situations in two different directions. The first is an extension of the spectral analysis of the Koopman operator to dynamical systems possessing either dissipation or expansion in regions of their state space. The second is to consider switched, stochastically-driven dynamical systems and the associated collection of semigroups of Koopman operators.In the first direction, we develop the Generalized Laplace Analysis (GLA) for both spectral operators of scalar type (in the sense of Dunford) and non spectral operators. The GLA is a method of constructing eigenfunctions of the Koopman operator corresponding to non-unimodular eigenvalues. It represents an extension of the ergodic theorems proven for ergodic, measure-preserving, on-attractor dynamics to the case where we have off-attractor dynamics. We also give a general procedure for constructing an appropriate Banach space of functions on which the Koopman operator is spectral. We explicitly construct these spaces for attracting fixed points and limit cycles. The spaces that we introduce and construct are generalizations of the familiar Hilbert Hardy spaces in the complex unit disc.In the second direction, we develop the theory of switched semigroups of Koopman operators. Each semigroup is assumed to be spectral of scalar-type with unimodular point spectrum, but possibly non-unimodular continuous spectrum. The functions evolve by applying one semigroup for a period of time, then switching to another semigroup. We develop an approximation of the vector-valued function evolution by a linear approximation in the vector space that the functions map into. A basis for this linear approximation is constructed from the vector-valued modes that are coefficients of the projections of the vector-valued observable onto scalar-valued eigenfunctions of the Koopman operator. The unmodeled modes show up as noisy dynamics in the output space. We apply this methodology to traffic matrices of an Internet Service Provider's (ISP's) network backbone. Traffic matrices measure the traffic volume moving between an ingress and egress router for the network's backbone. It is shown that on each contiguous interval of time in which a single semigroup acts the modal dynamics are deterministic and periodic with Gaussian or nearly-Gaussian noise superimposed.Part II of the dissertation represents a divergence from the first part in that it does not deal with the Koopman operator explicitly. In the second part, we consider the problem of using exponentially mixing dynamical systems to generate trajectories for an agent to follow in its search for a physical target in a large domain. The domain is a compact subset of the n-dimensional Euclidean space Rn. It is assumed that the size of the target is unknown and can take any value in some continuous range. Furthermore, it is assumed that the target can be located anywhere in the domain with equal probability.We cast this problem as one in the field of quantitative recurrence theory, a relatively new sub-branch of ergodic theory. We give constructive proofs for upper bounds of hitting times of small metric balls in Rn for mixing transformations of various speeds. The upper bounds and limit laws we derive say, approximately, that the hitting time is bounded above by some constant multiple of the inverse of the measure of the metric ball. From these results, we derive upper bounds for the expected hitting time, with respect to the range of target sizes [delta, V), to be of order O(-ln delta). First order, continuous time dynamics are constructed from discrete time mixing transformations and upper bounds for these hitting times are shown to be proportional to the discrete time case