29 research outputs found

    Self-Learning Classifier for Internet traffic

    Get PDF
    Network visibility is a critical part of traffic engineering, network management, and security. Recently, unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. In this paper, we propose SeLeCT, a Self-Learning Classifier for Internet traffic. It uses unsupervised algorithms along with an adaptive learning approach to automatically let classes of traffic emerge, being identified and (easily) labeled. SeLeCT automatically groups flows into pure (or homogeneous) clusters using alternating simple clustering and filtering phases to remove outliers. SeLeCT uses an adaptive learning approach to boost its ability to spot new protocols and applications. Finally, SeLeCT also simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. We evaluate the performance of SeLeCT using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that SeLeCT achieves overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of SeLeCT is its ability to help discovering new protocols and applications in an almost automated fashio

    Self-learning classifier for internet traffic

    Get PDF
    A method for classifying network traffic, including (1) processing a first working set portion of a flow batch for a first iteration by dividing the first working set portion into clusters and filtering a cluster by (i) identifying a first server port as most frequently occurring comparing to all other server ports in the cluster, (ii) in response to determining that a first frequency of occurrence of the first server port in the cluster exceeds a pre-determined threshold: (a) identifying the cluster as a dominatedPort cluster, (b) removing the cluster from the first working set portion to generate a remainder as a second working set portion, and (c) removing, from the cluster to be added to the second working set portion, one or more flows having different server port than the first server port, and (2) processing the second working set portion for a second iteration

    DNS to the rescue: Discerning Content and Services in a Tangled Web

    Get PDF
    A careful perusal of the Internet evolution reveals two major trends - explosion of cloud-based services and video stream- ing applications. In both of the above cases, the owner (e.g., CNN, YouTube, or Zynga) of the content and the organiza- tion serving it (e.g., Akamai, Limelight, or Amazon EC2) are decoupled, thus making it harder to understand the asso- ciation between the content, owner, and the host where the content resides. This has created a tangled world wide web that is very hard to unwind, impairing ISPs' and network ad- ministrators' capabilities to control the traffic flowing on the network. In this paper, we present DN-Hunter, a system that lever- ages the information provided by DNS traffic to discern the tangle. Parsing through DNS queries, DN-Hunter tags traffic flows with the associated domain name. This association has several applications and reveals a large amount of useful in- formation: (i) Provides a fine-grained traffic visibility even when the traffic is encrypted (i.e., TLS/SSL flows), thus en- abling more effective policy controls, (ii) Identifies flows even before the flows begin, thus providing superior net- work management capabilities to administrators, (iii) Un- derstand and track (over time) different CDNs and cloud providers that host content for a particular resource, (iv) Discern all the services/content hosted by a given CDN or cloud provider in a particular geography and time, and (v) Provides insights into all applications/services running on any given layer-4 port number. We conduct extensive experimental analysis and show that the results from real traffic traces, ranging from FTTH to 4G ISPs, that support our hypothesis. Simply put, the informa- tion provided by DNS traffic is one of the key components required to unveil the tangled web, and bring the capabilities of controlling the traffic back to the network carrier

    Automatic parsing of binary-based application protocols using network traffic

    Get PDF
    A method for analyzing a binary-based application protocol of a network. The method includes obtaining conversations from the network, extracting content of a candidate field from a message in each conversation, calculating a randomness measure of the content to represent a level of randomness of the content across all conversation, calculating a correlation measure of the content to represent a level of correlation, across all of conversations, between the content and an attribute of a corresponding conversation where the message containing the candidate field is located, and selecting, based on the randomness measure and the correlation measure, and using a pre-determined field selection criterion, the candidate offset from a set of candidate offsets as the offset defined by the protocol

    ABSTRACT Communication-Efficient Distributed Monitoring of Thresholded Counts

    No full text
    Monitoring is an issue of primary concern in current and next generation networked systems. For example, the objective of sensor networks is to monitor their surroundings for a variety of different applications like atmospheric conditions, wildlife behavior, and troop movements among others. Similarly, monitoring in data networks is critical not only for accounting and management, but also for detecting anomalies and attacks. Such monitoring applications are inherently continuous and distributed, and must be designed to minimize the communication overhead that they introduce. In this context we introduce and study a fundamental class of problems called “thresholded counts ” where we must return the aggregate frequency count of an event that is continuously monitored by distributed nodes with a user-specified accuracy whenever the actual count exceeds a given threshold value. In this paper we propose to address the problem of thresholded counts by setting local thresholds at each monitoring node and initiating communication only when the locally observed data exceeds these local thresholds. We explore algorithms in two categories: static thresholds and adaptive thresholds. In the static case, we consider thresholds based on a linear combination of two alternate strategies, and show that there exists an optimal blend of the two strategies that results in minimum communication overhead. We further show that this optimal blend can be found using a steepest descent search. In the adaptive case, we propose algorithms that adjust the local thresholds based on the observed distributions of updated information in the distributed monitoring system. We use extensive simulations not only to verify the accuracy of our algorithms and validate our theoretical results, but also to evaluate the performance of the two approaches. We find that both approaches yield significant savings over the naive approach of performing processing at a centralized location. 1

    Communication-efficient distributed monitoring of thresholded counts

    No full text
    Monitoring is an issue of primary concern in current and next gen-eration networked systems. For example, the objective of sensor networks is to monitor their surroundings for a variety of differ-ent applications like atmospheric conditions, wildlife behavior, and troop movements among others. Similarly, monitoring in data net-works is critical not only for accounting and management, but also for detecting anomalies and attacks. Such monitoring applications are inherently continuous and distributed, and must be designed to minimize the communication overhead that they introduce. In this context we introduce and study a fundamental class of problems called “thresholded counts ” where we must return the aggregate frequency count of an event that is continuously monitored by dis-tributed nodes with a user-specified accuracy whenever the actual count exceeds a given threshold value. In this paper we propose to address the problem of thresholded counts by setting local thresholds at each monitoring node and initi-ating communication only when the locally observed data exceeds these local thresholds. We explore algorithms in two categories: static thresholds and adaptive thresholds. In the static case, we consider thresholds based on a linear combination of two alternate strategies, and show that there exists an optimal blend of the two strategies that results in minimum communication overhead. We further show that this optimal blend can be found using a steep-est descent search. In the adaptive case, we propose algorithms that adjust the local thresholds based on the observed distributions of updated information in the distributed monitoring system. We use extensive simulations not only to verify the accuracy of our algorithms and validate our theoretical results, but also to evalu-ate the performance of the two approaches. We find that both ap-proaches yield significant savings over the naive approach of per-forming processing at a centralized location. 1

    Characterizing Data Services in a 3G Network: Usage, Mobility and Access Issues

    No full text
    Abstract—Although 3G networks have been largely deployed to cope with the increasing demand of wireless data services, little is known on how these networks are used from the network perspective. In this paper, we present analysis of data services based on a nation-wide 3G network trace collected from one of the largest cellular network service providers in North America. Our work differentiates from previous studies by examining data service usage and mobility patterns from various dimensions including application breakdown, user roles, device types and diurnal characteristics. We also look into various access issues such as termination failures and frequent registrations to better understand how the network performs. Our results are important for cellular network operators and protocol designers to improve data service performance and user satisfaction. I

    SeLeCT: Self-Learning Classifier for Internet Traffic

    No full text
    Network visibility is a critical part of traffic engineering, network management, and security. The most popular current solutions - Deep Packet Inspection (DPI) and statistical classification, deeply rely on the availability of a training set. Besides the cumbersome need to regularly update the signatures, their visibility is limited to classes the classifier has been trained for. Unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. To address the above issues, we propose Select, a Self-Learning Classifier for Internet Traffic. It uses unsupervised algorithms along with an adaptive seeding approach to automatically let classes of traffic emerge, being identified and labeled. Unlike traditional classifiers, it requires neither a-priori knowledge of signatures nor a training set to extract the signatures. Instead, Select automatically groups flows into pure (or homogeneous) clusters using simple statistical features. Select simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. Furthermore, Select uses an iterative seeding approach to boost its ability to cope with new protocols and applications. We evaluate the performance of Select using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that Select achieves excellent precision and recall, with overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of Select is its ability to discover new protocols and applications in an almost automated fashion
    corecore