108 research outputs found

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    SLNSpeech: solving extended speech separation problem by the help of sign language

    Full text link
    A speech separation task can be roughly divided into audio-only separation and audio-visual separation. In order to make speech separation technology applied in the real scenario of the disabled, this paper presents an extended speech separation problem which refers in particular to sign language assisted speech separation. However, most existing datasets for speech separation are audios and videos which contain audio and/or visual modalities. To address the extended speech separation problem, we introduce a large-scale dataset named Sign Language News Speech (SLNSpeech) dataset in which three modalities of audio, visual, and sign language are coexisted. Then, we design a general deep learning network for the self-supervised learning of three modalities, particularly, using sign language embeddings together with audio or audio-visual information for better solving the speech separation task. Specifically, we use 3D residual convolutional network to extract sign language features and use pretrained VGGNet model to exact visual features. After that, an improved U-Net with skip connections in feature extraction stage is applied for learning the embeddings among the mixed spectrogram transformed from source audios, the sign language features and visual features. Experiments results show that, besides visual modality, sign language modality can also be used alone to supervise speech separation task. Moreover, we also show the effectiveness of sign language assisted speech separation when the visual modality is disturbed. Source code will be released in http://cheertt.top/homepage/Comment: 33 pages, 8 figures, 5 table

    A parallel approach to pca based malicious activitydetection in distributed honeypot data

    Get PDF
    Model order selection (MOS) schemes, which are frequently employed inseveral signal processing applications, are shown to be effective tools for the detectionof malicious activities in honeypot data. In this paper, we extend previous results byproposing an efficient and parallel MOS method for blind automatic malicious activitydetection in distributed honeypots. Our proposed scheme does not require any previousinformation on attacks or human intervention. We model network traffic data as signalsand noise and then apply modified signal processing methods. However, differently fromthe previous centralized solutions, we propose that the data colected by each honeypotnode be processed by nodes in a cluster (that may consist of the collection nodesthemselves) and then grouped to obtain the final results. This is achieved by having eachnode locally compute the Eigenvalue Decomposition (EVD) to its own sample correlationmatrix (obtained from the honeypot data) and transmit the resulting eigenvalues to acentral node, where the global eigenvalues and final model order are computed. Themodel order computed from the global eigenvalues through RADOI represents the numberof malicious activities detected in the analysed data. The feasibility of the proposedapproach is demonstrated through simulation experiments

    Forensic Memory Classification using Deep Recurrent Neural Networks

    Get PDF
    The goal of this project is to advance the application of machine learning frameworks and tools in the process of malware detection. Specifically, a deep neural network architecture is proposed to classify application modules as benign or malicious, using the lower level memory block patterns that make up these modules. The modules correspond to blocks of functionality within files used in kernel and OS level processes as well as user level applications. The learned model is proposed to reside in an isolated core with strict communication restrictions to achieve incorruptibility as well as efficiency, therefore providing a probabilistic memory-level view of the system that is consistent with the user-level view. The lower level memory blocks are constructed using basic block sequences of varying sizes that are fed as input into Long-Short Term Memory models. Four configurations of the LSTM model are explored, by adding bi-directionality as well as Attention. Assembly level data from 50 PE files are extracted and basic blocks are constructed using the IDA Disassembler toolkit. The results show that longer basic block sequences result in richer LSTM hidden layer representations. The hidden states are fed as features into Max pooling layers or Attention layers, depending on the configuration being tested, and the final classification is performed using Logistic Regression with a single hidden layer. The bidirectional LSTM with Attention proved to be the best model, used on basic block sequences of size 29. The differences between the model’s ROC curves indicate a strong reliance on lower level, instructional features, as opposed to metadata or String features, that speak to the success of using entire assembly instructions as data, as opposed to just opcodes or higher level features

    A real-time FPGA-based implementation of a high-performance MIMO-OFDM mobile WiMAX transmitter

    Get PDF
    The Multiple Input Multiple Output (MIMO)-Orthogonal Frequency Division Multiplexing (OFDM) is considered a key technology in modern wireless-access communication systems. The IEEE 802.16e standard, also denoted as mobile WiMAX, utilizes the MIMO-OFDM technology and it was one of the first initiatives towards the roadmap of fourth generation systems. This paper presents the PHY-layer design, implementation and validation of a high-performance real-time 2x2 MIMO mobile WiMAX transmitter that accounts for low-level deployment issues and signal impairments. The focus is mainly laid on the impact of the selected high bandwidth, which scales the implementation complexity of the baseband signal processing algorithms. The latter also requires an advanced pipelined memory architecture to timely address the datapath operations that involve high memory utilization. We present in this paper a first evaluation of the extracted results that demonstrate the performance of the system using a 2x2 MIMO channel emulation.Postprint (published version

    The Embedding Capacity of Information Flows Under Renewal Traffic

    Full text link
    Given two independent point processes and a certain rule for matching points between them, what is the fraction of matched points over infinitely long streams? In many application contexts, e.g., secure networking, a meaningful matching rule is that of a maximum causal delay, and the problem is related to embedding a flow of packets in cover traffic such that no traffic analysis can detect it. We study the best undetectable embedding policy and the corresponding maximum flow rate ---that we call the embedding capacity--- under the assumption that the cover traffic can be modeled as arbitrary renewal processes. We find that computing the embedding capacity requires the inversion of very structured linear systems that, for a broad range of renewal models encountered in practice, admits a fully analytical expression in terms of the renewal function of the processes. Our main theoretical contribution is a simple closed form of such relationship. This result enables us to explore properties of the embedding capacity, obtaining closed-form solutions for selected distribution families and a suite of sufficient conditions on the capacity ordering. We evaluate our solution on real network traces, which shows a noticeable match for tight delay constraints. A gap between the predicted and the actual embedding capacities appears for looser constraints, and further investigation reveals that it is caused by inaccuracy of the renewal traffic model rather than of the solution itself.Comment: Sumbitted to IEEE Trans. on Information Theory on March 10, 201

    MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network

    Get PDF
    Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased ‘Threat’ of an ‘Attack’. Attackers are targeting these devices, as it may provide an easier ‘backdoor entry to the users’ network’.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method
    • 

    corecore