108 research outputs found
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
It is generally recognized that the traffic generated by an individual
connected to a network acts as his biometric signature. Several tools exploit
this fact to fingerprint and monitor users. Often, though, these tools assume
to access the entire traffic, including IP addresses and payloads. This is not
feasible on the grounds that both performance and privacy would be negatively
affected. In reality, most ISPs convert user traffic into NetFlow records for a
concise representation that does not include, for instance, any payloads. More
importantly, large and distributed networks are usually NAT'd, thus a few IP
addresses may be associated to thousands of users. We devised a new
fingerprinting framework that overcomes these hurdles. Our system is able to
analyze a huge amount of network traffic represented as NetFlows, with the
intent to track people. It does so by accurately inferring when users are
connected to the network and which IP addresses they are using, even though
thousands of users are hidden behind NAT. Our prototype implementation was
deployed and tested within an existing large metropolitan WiFi network serving
about 200,000 users, with an average load of more than 1,000 users
simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned
out to be very effective, with an accuracy greater than 90%. We also devised
new tools and refined existing ones that may be applied to other contexts
related to NetFlow analysis
SLNSpeech: solving extended speech separation problem by the help of sign language
A speech separation task can be roughly divided into audio-only separation
and audio-visual separation. In order to make speech separation technology
applied in the real scenario of the disabled, this paper presents an extended
speech separation problem which refers in particular to sign language assisted
speech separation. However, most existing datasets for speech separation are
audios and videos which contain audio and/or visual modalities. To address the
extended speech separation problem, we introduce a large-scale dataset named
Sign Language News Speech (SLNSpeech) dataset in which three modalities of
audio, visual, and sign language are coexisted. Then, we design a general deep
learning network for the self-supervised learning of three modalities,
particularly, using sign language embeddings together with audio or
audio-visual information for better solving the speech separation task.
Specifically, we use 3D residual convolutional network to extract sign language
features and use pretrained VGGNet model to exact visual features. After that,
an improved U-Net with skip connections in feature extraction stage is applied
for learning the embeddings among the mixed spectrogram transformed from source
audios, the sign language features and visual features. Experiments results
show that, besides visual modality, sign language modality can also be used
alone to supervise speech separation task. Moreover, we also show the
effectiveness of sign language assisted speech separation when the visual
modality is disturbed. Source code will be released in
http://cheertt.top/homepage/Comment: 33 pages, 8 figures, 5 table
A parallel approach to pca based malicious activitydetection in distributed honeypot data
Model order selection (MOS) schemes, which are frequently employed inseveral signal processing applications, are shown to be effective tools for the detectionof malicious activities in honeypot data. In this paper, we extend previous results byproposing an efficient and parallel MOS method for blind automatic malicious activitydetection in distributed honeypots. Our proposed scheme does not require any previousinformation on attacks or human intervention. We model network traffic data as signalsand noise and then apply modified signal processing methods. However, differently fromthe previous centralized solutions, we propose that the data colected by each honeypotnode be processed by nodes in a cluster (that may consist of the collection nodesthemselves) and then grouped to obtain the final results. This is achieved by having eachnode locally compute the Eigenvalue Decomposition (EVD) to its own sample correlationmatrix (obtained from the honeypot data) and transmit the resulting eigenvalues to acentral node, where the global eigenvalues and final model order are computed. Themodel order computed from the global eigenvalues through RADOI represents the numberof malicious activities detected in the analysed data. The feasibility of the proposedapproach is demonstrated through simulation experiments
Forensic Memory Classification using Deep Recurrent Neural Networks
The goal of this project is to advance the application of machine learning frameworks and tools in the process of malware detection. Specifically, a deep neural network architecture is proposed to classify application modules as benign or malicious, using the lower level memory block patterns that make up these modules. The modules correspond to blocks of functionality within files used in kernel and OS level processes as well as user level applications. The learned model is proposed to reside in an isolated core with strict communication restrictions to achieve incorruptibility as well as efficiency, therefore providing a probabilistic memory-level view of the system that is consistent with the user-level view. The lower level memory blocks are constructed using basic block sequences of varying sizes that are fed as input into Long-Short Term Memory models. Four configurations of the LSTM model are explored, by adding bi-directionality as well as Attention. Assembly level data from 50 PE files are extracted and basic blocks are constructed using the IDA Disassembler toolkit. The results show that longer basic block sequences result in richer LSTM hidden layer representations. The hidden states are fed as features into Max pooling layers or Attention layers, depending on the configuration being tested, and the final classification is performed using Logistic Regression with a single hidden layer. The bidirectional LSTM with Attention proved to be the best model, used on basic block sequences of size 29. The differences between the modelâs ROC curves indicate a strong reliance on lower level, instructional features, as opposed to metadata or String features, that speak to the success of using entire assembly instructions as data, as opposed to just opcodes or higher level features
A real-time FPGA-based implementation of a high-performance MIMO-OFDM mobile WiMAX transmitter
The Multiple Input Multiple Output (MIMO)-Orthogonal
Frequency Division Multiplexing (OFDM) is considered a key technology
in modern wireless-access communication systems. The IEEE 802.16e
standard, also denoted as mobile WiMAX, utilizes the MIMO-OFDM
technology and it was one of the first initiatives towards the roadmap of
fourth generation systems. This paper presents the PHY-layer design, implementation
and validation of a high-performance real-time 2x2 MIMO
mobile WiMAX transmitter that accounts for low-level deployment issues
and signal impairments. The focus is mainly laid on the impact of
the selected high bandwidth, which scales the implementation complexity
of the baseband signal processing algorithms. The latter also requires
an advanced pipelined memory architecture to timely address the datapath
operations that involve high memory utilization. We present in this
paper a first evaluation of the extracted results that demonstrate the
performance of the system using a 2x2 MIMO channel emulation.Postprint (published version
The Embedding Capacity of Information Flows Under Renewal Traffic
Given two independent point processes and a certain rule for matching points
between them, what is the fraction of matched points over infinitely long
streams? In many application contexts, e.g., secure networking, a meaningful
matching rule is that of a maximum causal delay, and the problem is related to
embedding a flow of packets in cover traffic such that no traffic analysis can
detect it. We study the best undetectable embedding policy and the
corresponding maximum flow rate ---that we call the embedding capacity--- under
the assumption that the cover traffic can be modeled as arbitrary renewal
processes. We find that computing the embedding capacity requires the inversion
of very structured linear systems that, for a broad range of renewal models
encountered in practice, admits a fully analytical expression in terms of the
renewal function of the processes. Our main theoretical contribution is a
simple closed form of such relationship. This result enables us to explore
properties of the embedding capacity, obtaining closed-form solutions for
selected distribution families and a suite of sufficient conditions on the
capacity ordering. We evaluate our solution on real network traces, which shows
a noticeable match for tight delay constraints. A gap between the predicted and
the actual embedding capacities appears for looser constraints, and further
investigation reveals that it is caused by inaccuracy of the renewal traffic
model rather than of the solution itself.Comment: Sumbitted to IEEE Trans. on Information Theory on March 10, 201
Recommended from our members
Multimedia delivery in the future internet
The term âNetworked Mediaâ implies that all kinds of media including text, image, 3D graphics, audio
and video are produced, distributed, shared, managed and consumed on-line through various networks,
like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white
paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked
challenges of the Networked Media in the transition to the Future of the Internet.
Internet has evolved and changed the way we work and live. End users of the Internet have been confronted
with a bewildering range of media, services and applications and of technological innovations concerning
media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace
of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more
than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so
regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected
to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising
to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged
that in a near- to mid-term future, the Internet will provide the means to share and distribute (new)
multimedia content and services with superior quality and striking flexibility, in a trusted and personalized
way, improving citizensâ quality of life, working conditions, edutainment and safety.
In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe
network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as
community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of
interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and
innovative applications âon the moveâ, like virtual collaboration environments, personalised services/
media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content
combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P
networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to
contribute towards such a vision.
Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6)
and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily
contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way
ahead in the area of Content Aware media delivery platforms
MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network
Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased âThreatâ of an âAttackâ. Attackers are targeting these devices, as it may provide an easier âbackdoor entry to the usersâ networkâ.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method
- âŠ