5,000 research outputs found
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
It is generally recognized that the traffic generated by an individual
connected to a network acts as his biometric signature. Several tools exploit
this fact to fingerprint and monitor users. Often, though, these tools assume
to access the entire traffic, including IP addresses and payloads. This is not
feasible on the grounds that both performance and privacy would be negatively
affected. In reality, most ISPs convert user traffic into NetFlow records for a
concise representation that does not include, for instance, any payloads. More
importantly, large and distributed networks are usually NAT'd, thus a few IP
addresses may be associated to thousands of users. We devised a new
fingerprinting framework that overcomes these hurdles. Our system is able to
analyze a huge amount of network traffic represented as NetFlows, with the
intent to track people. It does so by accurately inferring when users are
connected to the network and which IP addresses they are using, even though
thousands of users are hidden behind NAT. Our prototype implementation was
deployed and tested within an existing large metropolitan WiFi network serving
about 200,000 users, with an average load of more than 1,000 users
simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned
out to be very effective, with an accuracy greater than 90%. We also devised
new tools and refined existing ones that may be applied to other contexts
related to NetFlow analysis
Data-driven design of intelligent wireless networks: an overview and tutorial
Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Target Tracking in Confined Environments with Uncertain Sensor Positions
To ensure safety in confined environments such as mines or subway tunnels, a
(wireless) sensor network can be deployed to monitor various environmental
conditions. One of its most important applications is to track personnel,
mobile equipment and vehicles. However, the state-of-the-art algorithms assume
that the positions of the sensors are perfectly known, which is not necessarily
true due to imprecise placement and/or dropping of sensors. Therefore, we
propose an automatic approach for simultaneous refinement of sensors' positions
and target tracking. We divide the considered area in a finite number of cells,
define dynamic and measurement models, and apply a discrete variant of belief
propagation which can efficiently solve this high-dimensional problem, and
handle all non-Gaussian uncertainties expected in this kind of environments.
Finally, we use ray-tracing simulation to generate an artificial mine-like
environment and generate synthetic measurement data. According to our extensive
simulation study, the proposed approach performs significantly better than
standard Bayesian target tracking and localization algorithms, and provides
robustness against outliers.Comment: IEEE Transactions on Vehicular Technology, 201
Improving k-nn search and subspace clustering based on local intrinsic dimensionality
In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks.
Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset.
Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces
Thermal Fingerprinting—Multi-Dimensional Analysis of Computational Loads
Digital fingerprinting is used in several domains to identify and track variable activities and processes. In this paper, we propose a novel approach to categorize and recognize computational tasks based on thermal system information. The concept focuses on all kinds of data center environments to control required cooling capacity dynamically. The concept monitors basic thermal sensor data from each server and chassis entity. The respective, characteristic curves are merged with additional general system information, such as CPU load behavior, memory usage, and I/O characteristics. This results in two-dimensional thermal fingerprints, which are unique and achievable. The fingerprints are used as input for an adaptive, pre-active air-conditioning control system. This allows a precise estimation of the data center health status. First test cases and reference scenarios clarify a huge potential for energy savings without any negative aspects regarding health status or durability. In consequence, we provide a cost-efficient, light-weight, and flexible solution to optimize the energy-efficiency for a huge number of existing, conventional data center environments
Multi-sensor data fusion in mobile devices for the identification of Activities of Daily Living
Following the recent advances in technology and the growing use of mobile devices such as
smartphones, several solutions may be developed to improve the quality of life of users in the
context of Ambient Assisted Living (AAL). Mobile devices have different available sensors, e.g.,
accelerometer, gyroscope, magnetometer, microphone and Global Positioning System (GPS)
receiver, which allow the acquisition of physical and physiological parameters for the
recognition of different Activities of Daily Living (ADL) and the environments in which they are
performed. The definition of ADL includes a well-known set of tasks, which include basic selfcare
tasks, based on the types of skills that people usually learn in early childhood, including
feeding, bathing, dressing, grooming, walking, running, jumping, climbing stairs, sleeping,
watching TV, working, listening to music, cooking, eating and others. On the context of AAL,
some individuals (henceforth called user or users) need particular assistance, either because
the user has some sort of impairment, or because the user is old, or simply because users
need/want to monitor their lifestyle. The research and development of systems that provide a
particular assistance to people is increasing in many areas of application. In particular, in the
future, the recognition of ADL will be an important element for the development of a personal
digital life coach, providing assistance to different types of users. To support the recognition
of ADL, the surrounding environments should be also recognized to increase the reliability of
these systems.
The main focus of this Thesis is the research on methods for the fusion and classification of the
data acquired by the sensors available in off-the-shelf mobile devices in order to recognize ADL
in almost real-time, taking into account the large diversity of the capabilities and
characteristics of the mobile devices available in the market. In order to achieve this objective,
this Thesis started with the review of the existing methods and technologies to define the
architecture and modules of the method for the identification of ADL. With this review and
based on the knowledge acquired about the sensors available in off-the-shelf mobile devices,
a set of tasks that may be reliably identified was defined as a basis for the remaining research
and development to be carried out in this Thesis. This review also identified the main stages
for the development of a new method for the identification of the ADL using the sensors
available in off-the-shelf mobile devices; these stages are data acquisition, data processing,
data cleaning, data imputation, feature extraction, data fusion and artificial intelligence. One
of the challenges is related to the different types of data acquired from the different sensors,
but other challenges were found, including the presence of environmental noise, the positioning
of the mobile device during the daily activities, the limited capabilities of the mobile devices
and others. Based on the acquired data, the processing was performed, implementing data
cleaning and feature extraction methods, in order to define a new framework for the recognition of ADL. The data imputation methods were not applied, because at this stage of
the research their implementation does not have influence in the results of the identification
of the ADL and environments, as the features are extracted from a set of data acquired during
a defined time interval and there are no missing values during this stage. The joint selection of
the set of usable sensors and the identifiable set of tasks will then allow the development of a
framework that, considering multi-sensor data fusion technologies and context awareness, in
coordination with other information available from the user context, such as his/her agenda
and the time of the day, will allow to establish a profile of the tasks that the user performs in
a regular activity day. The classification method and the algorithm for the fusion of the features
for the recognition of ADL and its environments needs to be deployed in a machine with some
computational power, while the mobile device that will use the created framework, can
perform the identification of the ADL using a much less computational power. Based on the
results reported in the literature, the method chosen for the recognition of the ADL is composed
by three variants of Artificial Neural Networks (ANN), including simple Multilayer Perceptron
(MLP) networks, Feedforward Neural Networks (FNN) with Backpropagation, and Deep Neural
Networks (DNN).
Data acquisition can be performed with standard methods. After the acquisition, the data must
be processed at the data processing stage, which includes data cleaning and feature extraction
methods. The data cleaning method used for motion and magnetic sensors is the low pass filter,
in order to reduce the noise acquired; but for the acoustic data, the Fast Fourier Transform
(FFT) was applied to extract the different frequencies. When the data is clean, several features
are then extracted based on the types of sensors used, including the mean, standard deviation,
variance, maximum value, minimum value and median of raw data acquired from the motion
and magnetic sensors; the mean, standard deviation, variance and median of the maximum
peaks calculated with the raw data acquired from the motion and magnetic sensors; the five
greatest distances between the maximum peaks calculated with the raw data acquired from
the motion and magnetic sensors; the mean, standard deviation, variance, median and 26 Mel-
Frequency Cepstral Coefficients (MFCC) of the frequencies obtained with FFT based on the raw
data acquired from the microphone data; and the distance travelled calculated with the data
acquired from the GPS receiver. After the extraction of the features, these will be grouped in
different datasets for the application of the ANN methods and to discover the method and
dataset that reports better results. The classification stage was incrementally developed,
starting with the identification of the most common ADL (i.e., walking, running, going upstairs,
going downstairs and standing activities) with motion and magnetic sensors. Next, the
environments were identified with acoustic data, i.e., bedroom, bar, classroom, gym, kitchen,
living room, hall, street and library. After the environments are recognized, and based on the
different sets of sensors commonly available in the mobile devices, the data acquired from the
motion and magnetic sensors were combined with the recognized environment in order to
differentiate some activities without motion, i.e., sleeping and watching TV. The number of recognized activities in this stage was increased with the use of the distance travelled,
extracted from the GPS receiver data, allowing also to recognize the driving activity.
After the implementation of the three classification methods with different numbers of
iterations, datasets and remaining configurations in a machine with high processing
capabilities, the reported results proved that the best method for the recognition of the most
common ADL and activities without motion is the DNN method, but the best method for the
recognition of environments is the FNN method with Backpropagation. Depending on the
number of sensors used, this implementation reports a mean accuracy between 85.89% and
89.51% for the recognition of the most common ADL, equals to 86.50% for the recognition of
environments, and equals to 100% for the recognition of activities without motion, reporting
an overall accuracy between 85.89% and 92.00%.
The last stage of this research work was the implementation of the structured framework for
the mobile devices, verifying that the FNN method requires a high processing power for the
recognition of environments and the results reported with the mobile application are lower
than the results reported with the machine with high processing capabilities used. Thus, the
DNN method was also implemented for the recognition of the environments with the mobile
devices. Finally, the results reported with the mobile devices show an accuracy between 86.39%
and 89.15% for the recognition of the most common ADL, equal to 45.68% for the recognition
of environments, and equal to 100% for the recognition of activities without motion, reporting
an overall accuracy between 58.02% and 89.15%.
Compared with the literature, the results returned by the implemented framework show only
a residual improvement. However, the results reported in this research work comprehend the
identification of more ADL than the ones described in other studies. The improvement in the
recognition of ADL based on the mean of the accuracies is equal to 2.93%, but the maximum
number of ADL and environments previously recognized was 13, while the number of ADL and
environments recognized with the framework resulting from this research is 16. In conclusion,
the framework developed has a mean improvement of 2.93% in the accuracy of the recognition
for a larger number of ADL and environments than previously reported.
In the future, the achievements reported by this PhD research may be considered as a start
point of the development of a personal digital life coach, but the number of ADL and
environments recognized by the framework should be increased and the experiments should be
performed with different types of devices (i.e., smartphones and smartwatches), and the data
imputation and other machine learning methods should be explored in order to attempt to
increase the reliability of the framework for the recognition of ADL and its environments.Após os recentes avanços tecnológicos e o crescente uso dos dispositivos móveis, como por
exemplo os smartphones, várias soluções podem ser desenvolvidas para melhorar a qualidade
de vida dos utilizadores no contexto de Ambientes de Vida Assistida (AVA) ou Ambient Assisted
Living (AAL). Os dispositivos móveis integram vários sensores, tais como acelerómetro,
giroscópio, magnetómetro, microfone e recetor de Sistema de Posicionamento Global (GPS),
que permitem a aquisição de vários parâmetros fÃsicos e fisiológicos para o reconhecimento de
diferentes Atividades da Vida Diária (AVD) e os seus ambientes. A definição de AVD inclui um
conjunto bem conhecido de tarefas que são tarefas básicas de autocuidado, baseadas nos tipos
de habilidades que as pessoas geralmente aprendem na infância. Essas tarefas incluem
alimentar-se, tomar banho, vestir-se, fazer os cuidados pessoais, caminhar, correr, pular, subir
escadas, dormir, ver televisão, trabalhar, ouvir música, cozinhar, comer, entre outras. No
contexto de AVA, alguns indivÃduos (comumente chamados de utilizadores) precisam de
assistência particular, seja porque o utilizador tem algum tipo de deficiência, seja porque é
idoso, ou simplesmente porque o utilizador precisa/quer monitorizar e treinar o seu estilo de
vida. A investigação e desenvolvimento de sistemas que fornecem algum tipo de assistência
particular está em crescente em muitas áreas de aplicação. Em particular, no futuro, o
reconhecimento das AVD é uma parte importante para o desenvolvimento de um assistente
pessoal digital, fornecendo uma assistência pessoal de baixo custo aos diferentes tipos de
pessoas. pessoas. Para ajudar no reconhecimento das AVD, os ambientes em que estas se
desenrolam devem ser reconhecidos para aumentar a fiabilidade destes sistemas.
O foco principal desta Tese é o desenvolvimento de métodos para a fusão e classificação dos
dados adquiridos a partir dos sensores disponÃveis nos dispositivos móveis, para o
reconhecimento quase em tempo real das AVD, tendo em consideração a grande diversidade
das caracterÃsticas dos dispositivos móveis disponÃveis no mercado. Para atingir este objetivo,
esta Tese iniciou-se com a revisão dos métodos e tecnologias existentes para definir a
arquitetura e os módulos do novo método de identificação das AVD. Com esta revisão da
literatura e com base no conhecimento adquirido sobre os sensores disponÃveis nos dispositivos
móveis disponÃveis no mercado, um conjunto de tarefas que podem ser identificadas foi
definido para as pesquisas e desenvolvimentos desta Tese. Esta revisão também identifica os
principais conceitos para o desenvolvimento do novo método de identificação das AVD,
utilizando os sensores, são eles: aquisição de dados, processamento de dados, correção de
dados, imputação de dados, extração de caracterÃsticas, fusão de dados e extração de
resultados recorrendo a métodos de inteligência artificial. Um dos desafios está relacionado
aos diferentes tipos de dados adquiridos pelos diferentes sensores, mas outros desafios foram
encontrados, sendo os mais relevantes o ruÃdo ambiental, o posicionamento do dispositivo durante a realização das atividades diárias, as capacidades limitadas dos dispositivos móveis.
As diferentes caracterÃsticas das pessoas podem igualmente influenciar a criação dos métodos,
escolhendo pessoas com diferentes estilos de vida e caracterÃsticas fÃsicas para a aquisição e
identificação dos dados adquiridos a partir de sensores. Com base nos dados adquiridos,
realizou-se o processamento dos dados, implementando-se métodos de correção dos dados e a
extração de caracterÃsticas, para iniciar a criação do novo método para o reconhecimento das
AVD. Os métodos de imputação de dados foram excluÃdos da implementação, pois não iriam
influenciar os resultados da identificação das AVD e dos ambientes, na medida em que são
utilizadas as caracterÃsticas extraÃdas de um conjunto de dados adquiridos durante um intervalo
de tempo definido.
A seleção dos sensores utilizáveis, bem como das AVD identificáveis, permitirá o
desenvolvimento de um método que, considerando o uso de tecnologias para a fusão de dados
adquiridos com múltiplos sensores em coordenação com outras informações relativas ao
contexto do utilizador, tais como a agenda do utilizador, permitindo estabelecer um perfil de
tarefas que o utilizador realiza diariamente. Com base nos resultados obtidos na literatura, o
método escolhido para o reconhecimento das AVD são as diferentes variantes das Redes
Neuronais Artificiais (RNA), incluindo Multilayer Perceptron (MLP), Feedforward Neural
Networks (FNN) with Backpropagation and Deep Neural Networks (DNN). No final, após a
criação dos métodos para cada fase do método para o reconhecimento das AVD e ambientes, a
implementação sequencial dos diferentes métodos foi realizada num dispositivo móvel para
testes adicionais.
Após a definição da estrutura do método para o reconhecimento de AVD e ambientes usando
dispositivos móveis, verificou-se que a aquisição de dados pode ser realizada com os métodos
comuns. Após a aquisição de dados, os mesmos devem ser processados no módulo de
processamento de dados, que inclui os métodos de correção de dados e de extração de
caracterÃsticas. O método de correção de dados utilizado para sensores de movimento e
magnéticos é o filtro passa-baixo de modo a reduzir o ruÃdo, mas para os dados acústicos, a
Transformada Rápida de Fourier (FFT) foi aplicada para extrair as diferentes frequências.
Após a correção dos dados, as diferentes caracterÃsticas foram extraÃdas com base nos tipos de
sensores usados, sendo a média, desvio padrão, variância, valor máximo, valor mÃnimo e
mediana de dados adquiridos pelos sensores magnéticos e de movimento, a média, desvio
padrão, variância e mediana dos picos máximos calculados com base nos dados adquiridos pelos
sensores magnéticos e de movimento, as cinco maiores distâncias entre os picos máximos
calculados com os dados adquiridos dos sensores de movimento e magnéticos, a média, desvio
padrão, variância e 26 Mel-Frequency Cepstral Coefficients (MFCC) das frequências obtidas
com FFT com base nos dados obtidos a partir do microfone, e a distância calculada com os
dados adquiridos pelo recetor de GPS. Após a extração das caracterÃsticas, as mesmas são agrupadas em diferentes conjuntos de dados
para a aplicação dos métodos de RNA de modo a descobrir o método e o conjunto de
caracterÃsticas que reporta melhores resultados. O módulo de classificação de dados foi
incrementalmente desenvolvido, começando com a identificação das AVD comuns com sensores
magnéticos e de movimento, i.e., andar, correr, subir escadas, descer escadas e parado. Em
seguida, os ambientes são identificados com dados de sensores acústicos, i.e., quarto, bar, sala
de aula, ginásio, cozinha, sala de estar, hall, rua e biblioteca. Com base nos ambientes
reconhecidos e os restantes sensores disponÃveis nos dispositivos móveis, os dados adquiridos
dos sensores magnéticos e de movimento foram combinados com o ambiente reconhecido para
diferenciar algumas atividades sem movimento (i.e., dormir e ver televisão), onde o número
de atividades reconhecidas nesta fase aumenta com a fusão da distância percorrida, extraÃda
a partir dos dados do recetor GPS, permitindo também reconhecer a atividade de conduzir.
Após a implementação dos três métodos de classificação com diferentes números de iterações,
conjuntos de dados e configurações numa máquina com alta capacidade de processamento, os
resultados relatados provaram que o melhor método para o reconhecimento das atividades
comuns de AVD e atividades sem movimento é o método DNN, mas o melhor método para o
reconhecimento de ambientes é o método FNN with Backpropagation. Dependendo do número
de sensores utilizados, esta implementação reporta uma exatidão média entre 85,89% e 89,51%
para o reconhecimento das AVD comuns, igual a 86,50% para o reconhecimento de ambientes,
e igual a 100% para o reconhecimento de atividades sem movimento, reportando uma exatidão
global entre 85,89% e 92,00%.
A última etapa desta Tese foi a implementação do método nos dispositivos móveis, verificando
que o método FNN requer um alto poder de processamento para o reconhecimento de
ambientes e os resultados reportados com estes dispositivos são inferiores aos resultados
reportados com a máquina com alta capacidade de processamento utilizada no
desenvolvimento do método. Assim, o método DNN foi igualmente implementado para o
reconhecimento dos ambientes com os dispositivos móveis. Finalmente, os resultados relatados
com os dispositivos móveis reportam uma exatidão entre 86,39% e 89,15% para o
reconhecimento das AVD comuns, igual a 45,68% para o reconhecimento de ambientes, e igual
a 100% para o reconhecimento de atividades sem movimento, reportando uma exatidão geral
entre 58,02% e 89,15%.
Com base nos resultados relatados na literatura, os resultados do método desenvolvido mostram
uma melhoria residual, mas os resultados desta Tese identificam mais AVD que os demais
estudos disponÃveis na literatura. A melhoria no reconhecimento das AVD com base na média
das exatidões é igual a 2,93%, mas o número máximo de AVD e ambientes reconhecidos pelos
estudos disponÃveis na literatura é 13, enquanto o número de AVD e ambientes reconhecidos
com o método implementado é 16. Assim, o método desenvolvido tem uma melhoria de 2,93%
na exatidão do reconhecimento num maior número de AVD e ambientes. Como trabalho futuro, os resultados reportados nesta Tese podem ser considerados um ponto
de partida para o desenvolvimento de um assistente digital pessoal, mas o número de ADL e
ambientes reconhecidos pelo método deve ser aumentado e as experiências devem ser
repetidas com diferentes tipos de dispositivos móveis (i.e., smartphones e smartwatches), e os
métodos de imputação e outros métodos de classificação de dados devem ser explorados de
modo a tentar aumentar a confiabilidade do método para o reconhecimento das AVD e
ambientes
- …