48 research outputs found
An Empirical Investigation on Snort NIDS versus Supervised Machine Learning Classifiers
With the vast usage of network services, Security became an important issue for all network types. Various techniques emerged to grant network security; among them is Network Intrusion Detection System (NIDS). Many extant NIDSs actively work against various intrusions, but there are still a number of performance issues including high false alarm rates, and numerous undetected attacks. To keep up with these attacks, some of the academic researchers turned towards machine learning (ML) techniques to create software that automatically predict intrusive and abnormal traffic, another approach is to utilize ML algorithms in enhancing Traditional NIDSs which is a more feasible solution since they are widely spread. To upgrade the detection rates of current NIDSs, thorough analyses are essential to identify where ML predictors outperform them. The first step is to provide assessment of most used NIDS worldwide, Snort, and comparing its performance with ML classifiers. This paper provides an empirical study to evaluate performance of Snort and four supervised ML classifiers, KNN, Decision Tree, Bayesian net and Naïve Bays against network attacks, probing, Brute force and DoS. By measuring Snort metric, True Alarm Rate, F-measure, Precision and Accuracy and compares them with the same metrics conducted from applying ML algorithms using Weka tool. ML classifiers show an elevated performance with over 99% correctly classified instances for most algorithms, While Snort intrusion detection system shows a degraded classification of about 25% correctly classified instances, hence identifying Snort weaknesses towards certain attack types and giving leads on how to overcome those weaknesses.
es
ANDRODET: An adaptive Android obfuscation detector
Obfuscation techniques modify an app's source (or machine) code in order to make it more difficult to analyze. This is typically applied to protect intellectual property in benign apps, or to hinder the process of extracting actionable information in the case malware. Since malware analysis often requires considerable resource investment, detecting the particular obfuscation technique used may contribute to apply the right analysis tools, thus leading to some savings. In this paper, we propose ANDRODET, a mechanism to detect three popular types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. ANDRODET leverages online learning techniques, thus being suitable for resource-limited environments that need to operate in a continuous manner. We compare our results with a batch learning algorithm using a dataset of 34,962 apps from both malware and benign apps. Experimental results show that online learning approaches are not only able to compete with batch learning methods in terms of accuracy, but they also save significant amount of time and computational resources. Particularly, ANDRODET achieves an accuracy of 92.02% for identifier renaming detection, 81.41% for string encryption detection, and 68.32% for control flow obfuscation detection, on average. Also, the overall accuracy of the system when apps might be obfuscated with more than one technique is around 80.66%. (C) 2018 The Authors. Published by Elsevier B.V.This work has been partially supported by MINECO grantTIN2016-79095-C2-2-R (SMOG-DEV) and CAM grant S2013/ICE-3095 (CIBERDINE), co-funded with European FEDER funds. Furthermore, it has been partially supported by the UC3M’sgrant Programa de Ayudas para la Movilida
An optimized context-aware mobile computing model to filter inappropriate incoming calls in smartphone
Requests for communication via mobile devices can be disruptive to the receiver in certain social situation. For example, unsuitable incoming calls may put the receiver in a dangerous condition, as in the case of receiving calls while driving. Therefore, designers of mobile computing interfaces require plans for minimizing annoying calls. To reduce the frequency of these calls, one promising approach is to provide an intelligent and accurate system, based on context awareness with cues of a callee's context allowing informed decisions of when to answer a call. The processing capabilities and advantages of mobile devices equipped with portable sensors provide the basis for new context-awareness services and applications. However, contextawareness mobile computing systems are needed to manage the difficulty of multiple sources of context that affects the accuracy of the systems, and the challenge of energy hungry GPS sensor that affects the battery consumption of mobile phone. Hence, reducing the cost of GPS sensor and increasing the accuracy of current contextawareness call filtering systems are two main motivations of this study. Therefore, this study proposes a new localization mechanism named Improved Battery Life in Context Awareness System (IBCS) to deal with the energy-hungry GPS sensor and optimize the battery consumption of GPS sensor in smartphone for more than four hours. Finally, this study investigates the context-awareness models in smartphone and develops an alternative intelligent model structure to improve the accuracy rate. Hence, a new optimized context-awareness mobile computing model named Optimized Context Filtering (OCF) is developed to filter unsuitable incoming calls based on context information of call receiver. In this regard, a new extended Naive Bayesian classifier was proposed based on the Naive Bayesian classifier by combining the incremental learning strategy with appropriate weight on the new training data. This new classifier is utilized as an inference engine to the proposed model to increase its accuracy rate. The results indicated that 7% improvement was seen in the accuracy rate of the proposed extended naive Bayesian classifier. On the other hand, the proposed model result showed that the OCF model improved the accuracy rate by 14%. These results indicated that the proposed model is a hopeful approach to provide an intelligent call filtering system based on context information for smartphones
Predictive Modelling Approach to Data-Driven Computational Preventive Medicine
This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus.
Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance.
In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics
The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
Sentence-level sentiment tagging across different domains and genres
The demand for information about sentiment expressed in texts has stimulated a growing interest into automatic sentiment analysis in Natural Language Processing (NLP). This dissertation is motivated by an unmet need for high-performance domain-independent sentiment taggers and by pressing theoretical questions in NLP, where the exploration of limitations of specific approaches, as well as synergies between them, remain practically unaddressed. This study focuses on sentiment tagging at the sentence level and covers four genres: news, blogs, movie reviews, and product reviews. It draws comparisons between sentiment annotation at different linguistic levels (words, sentences, and texts) and highlights the key differences between supervised machine learning methods that rely on annotated corpora (corpus-based, CBA) and lexicon-based approaches (LBA) to sentiment tagging. Exploring the performance of supervised corpus-based approach to sentiment tagging, this study highlights the strong domain-dependence of the CBA. I present the development of LBA approaches based on general lexicons, such as WordNet, as a potential solution to the domain portability problem. A system for sentiment marker extraction from WordNet's relations and glosses is developed and used to acquire lists for a lexicon-based system for sentiment annotation at the sentence and text levels. It demonstrates that LBA's performance across domains is more stable than that of CBA. Finally, the study proposes an integration of LBA and CBA in an ensemble of classifiers using a precision-based voting technique that allows the ensemble system to incorporate the best features of both CBA and LBA. This combined approach outperforms both base learners and provides a promising solution to the domain-adaptation problem. The study contributes to NLP (1) by developing algorithms for automatic acquisition of sentiment-laden words from dictionary definitions; (2) by conducting a systematic study of approaches to sentiment classification and of factors affecting their performance; (3) by refining the lexicon-based approach by introducing valence shifter handling and parse tree information; and (4) by development of the combined, CBA/LBA approach that brings together the strengths of the two approaches and allows domain-adaptation with limited amounts of labeled training data
Extracting business performance signals from Twitter news
Social media and social networks underpin a revolution in communication
between people, with the particular feature that much of that communication is open to
all. This provides a massive pool of data that can be exploited by researchers for a wide
variety of different applications. Data from Twitter is of particular interest in this sense,
given its large global usage levels, and the availability of APIs and other tools that enable
easy access to the publicly available stream of tweets. Owing to the wide public
penetration of Twitter, many businesses make use of it to share their latest news,
effectively using Twitter as a gateway to connect to end-users, consumers and/or
investors.
In this thesis, we focus on the potential for extracting information from Twitter that is
relevant to the financial and competitiveness status of a business. We consider a collection
of well-regarded Twitter accounts that are known for communicating recent business
news, and we investigate the automated analysis of the stream of tweets from these
sources, with a view to learning business-relevant information about specific companies.
A key aspect of our approach is the idea of extracting specific areas of business
performance: we explore three such areas: productivity, competitiveness, and industrial
risk. We propose a two-step model which first classifies a tweet into one of these areas,
and then assigns a sentiment value (on a positive/negative scale). The resulting sentiment
values across specific aspects represent novel business indicators that could add
significant value to the toolset used by business analysts. Our experiments are based on a
new manually pre-classified data set (available from a URL provided).
Additionally, we propose n-grams made from non-contiguous words as a novel feature to
enhance performance in this context. Experiments involving a range of feature selection
methods show that these new features provide valuable benefits in comparison with
standard n-gram features.
We also interduce the concept of an extra layer added to the primary classifier, with the
role of filtering out noisy tweets before they enter the system. We use a One-Class SVM
for this purpose.
Broadly, we show that the methods developed in this thesis achieve promising results in
both topic and sentiment classification in the business performance context, suggesting
that twitter can indeed be a useful source of signals related to different aspects of business performance. We also find that our system can provide valuable insight into unseen test
data. However, more research is needed to be able to extract robust signals for industrial
risk, and there seems to be a considerable promise for further development
Techniques for advanced android malware triage
Mención Internacional en el título de doctorAndroid is the leading operating system in smartphones with a big difference.
Statistics show that 88% of all smartphones sold to end users in
the second quarter of 2018 were phones with the Android OS. Regardless
of the operating systems which are running on smartphones, most of
the functionalities of these devices are offered through applications. There
are currently over 2 million apps only on the official Google store, known
as Google Play. This huge market with billions of users is tempting for
attackers to develop and distribute their malicious apps (or malware).
Mobile malware has raised explosively since 2009. Symantec reported
an increase of 54% in the new mobile malware variants in 2017 as compared
to the previous year. Additionally, more incentive has been provided
for profit-driven malware by the growth of black markets. This rise has
happened for Android malware as well since only 20% of devices are running
the newest major version of Android OS based on Symantec report in
2018. Android continued to be the most targeted platform with the biggest
number of attacks in 2015. After that year, attacks against the Android
platform slowed for the first time as attackers were faced with improved
security architectures though Android is still the main appealing target OS
for attackers. Moreover, advanced types of Android malware are found
which make use of extensive anit-analysis techniques to evade static or
dynamic analysis.
To address the security and privacy concerns of complex Android malware,
this dissertation focuses on three main objectives. First of all, we
propose a light-weight yet efficient method to identify risky Android applications.
Next, we present a precise approach to characterize Android
malware based on their malicious behavior. Finally, we propose an adaptive learning system to address the security concerns of obfuscation in Android
malware.
Identifying potentially dangerous and risky applications is an important
step in Android malware analysis. To this end, we develop a triage system
to rank applications based on their potential risk. Our approach, called TriFlow, relies on static features which are quick to obtain. TriFlow combines
a probabilistic model to predict the existence of information flows with a
metric of how significant a flow is in benign and malicious apps. Based
on this, TriFlow provides a score for each application that can be used to
prioritize analysis. It also provides the analysts with an explanatory report
of the associated risk. Our tool can also be used as a complement with
computationally expensive static and dynamic analysis tools.
Another important step towards Android malware analysis lies in their
accurate characterization. Labeling Android malware is challenging yet
crucially important, as it helps to identify upcoming malware samples and
threats. A key challenge is that different researchers and anti-virus vendors
assign labels using their own criteria, and it is not known to what
extent these labels are aligned with the apps’ real behavior. Based on this,
we propose a new behavioral characterization method for Android apps
based on their extracted information flows. As information flows can be
used to track why and how apps use specific pieces of information, a flowbased
characterization provides a relatively easy-to-interpret summary of
the malware sample’s behavior.
Not all Android malware are easy to analyze due to advanced and easyto-apply anti-analysis techniques that are available nowadays. Obfuscation
is the most common anti-analysis technique that Android malware use to
evade detection. Obfuscation techniques modify an app’s source (or machine)
code in order to make it more difficult to analyze. This is typically
applied to protect intellectual property in benign apps, or to hinder the process
of extracting actionable information in the case of malware. Since
malware analysis often requires considerable resource investment, detecting
the particular obfuscation technique used may contribute to apply the
right analysis tools, thus leading to some savings.
Therefore, we propose AndrODet, a mechanism to detect three popular
types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. AndrODet leverages online
learning techniques, thus being suitable for resource-limited environments
that need to operate in a continuous manner. We compare our results
with a batch learning algorithm using a dataset of 34,962 apps from both
malware and benign apps. Experimental results show that online learning
approaches are not only able to compete with batch learning methods in
terms of accuracy, but they also save significant amount of time and computational
resources.
Finally, we present a number of open research directions based on the
outcome of this thesis.Android es el sistema operativo líder en teléfonos inteligentes (también
denominados con la palabra inglesa smartphones), con una gran diferencia
con respecto al resto de competidores. Las estadísticas muestran que el
88% de todos los smartphones vendidos a usuarios finales en el segundo
trimestre de 2018 fueron teléfonos con sistema operativo Android. Independientemente
de su sistema operativo, la mayoría de las funcionalidades
de estos dispositivos se ofrecen a través de aplicaciones. Actualmente hay
más de 2 millones de aplicaciones solo en la tienda oficial de Google, conocida
como Google Play. Este enorme mercado con miles de millones de
usuarios es tentador para los atacantes, que buscan distribuir sus aplicaciones
malintencionadas (o malware).
El malware para dispositivos móviles ha aumentado de forma exponencial
desde 2009. Symantec ha detectado un aumento del 54% en las nuevas
variantes de malware para dispositivos móviles en 2017 en comparación
con el año anterior. Además, el crecimiento del mercado negro (es decir,
plataformas no oficiales de descargas de aplicaciones) supone un incentivo
para los programas maliciosos con fines lucrativos. Este aumento también
ha ocurrido en el malware de Android, aprovechando la circunstancia de
que solo el 20% de los dispositivos ejecutan la versión mas reciente del sistema
operativo Android, de acuerdo con el informe de Symantec en 2018.
De hecho, Android ha sido la plataforma que ha centrado los esfuerzos de
los atacantes desde 2015, aunque los ataques decayeron ligeramente tras
ese año debido a las mejoras de seguridad incorporadas en el sistema operativo.
En todo caso, existen formas avanzadas de malware para Android
que hacen uso de técnicas sofisticadas para evadir el análisis estático o
dinámico.
Para abordar los problemas de seguridad y privacidad que causa el malware
en Android, esta Tesis se centra en tres objetivos principales. En
primer lugar, se propone un método ligero y eficiente para identificar aplicaciones
de Android que pueden suponer un riesgo. Por otra parte, se presenta
un mecanismo para la caracterización del malware atendiendo a su
comportamiento. Finalmente, se propone un mecanismo basado en aprendizaje
adaptativo para la detección de algunos tipos de ofuscación que son
empleados habitualmente en las aplicaciones maliciosas.
Identificar aplicaciones potencialmente peligrosas y riesgosas es un
paso importante en el análisis de malware de Android. Con este fin, en
esta Tesis se desarrolla un mecanismo de clasificación (llamado TriFlow)
que ordena las aplicaciones según su riesgo potencial. La aproximación
se basa en características estáticas que se obtienen rápidamente, siendo de
especial interés los flujos de información. Un flujo de información existe
cuando un cierto dato es recibido o producido mediante una cierta función
o llamada al sistema, y atraviesa la lógica de la aplicación hasta que
llega a otra función. Así, TriFlow combina un modelo probabilístico para
predecir la existencia de un flujo con una métrica de lo habitual que es
encontrarlo en aplicaciones benignas y maliciosas. Con ello, TriFlow proporciona
una puntuación para cada aplicación que puede utilizarse para
priorizar su análisis. Al mismo tiempo, proporciona a los analistas un informe
explicativo de las causas que motivan dicha valoración. Así, esta
herramienta se puede utilizar como complemento a otras técnicas de análisis
estático y dinámico que son mucho más costosas desde el punto de vista
computacional.
Otro paso importante hacia el análisis de malware de Android radica
en caracterizar su comportamiento. Etiquetar el malware de Android es
un desafío de crucial importancia, ya que ayuda a identificar las próximas
muestras y amenazas de malware. Una cuestión relevante es que los
diferentes investigadores y proveedores de antivirus asignan etiquetas utilizando
sus propios criterios, de modo no se sabe en qué medida estas etiquetas
están en línea con el comportamiento real de las aplicaciones. Sobre
esta base, en esta Tesis se propone un nuevo método de caracterización de
comportamiento para las aplicaciones de Android en función de sus flujos
de información. Como dichos flujos se pueden usar para estudiar el uso de
cada dato por parte de una aplicación, permiten proporcionar un resumen relativamente sencillo del comportamiento de una determinada muestra de
malware.
A pesar de la utilidad de las técnicas de análisis descritas, no todos los
programas maliciosos de Android son fáciles de analizar debido al uso de
técnicas anti-análisis que están disponibles en la actualidad. Entre ellas, la
ofuscación es la técnica más común que se utiliza en el malware de Android
para evadir la detección. Dicha técnica modifica el código de una
aplicación para que sea más difícil de entender y analizar. Esto se suele
aplicar para proteger la propiedad intelectual en aplicaciones benignas o
para dificultar la obtención de pistas sobre su funcionamiento en el caso
del malware. Dado que el análisis de malware a menudo requiere una inversión
considerable de recursos, detectar la técnica de ofuscación que se
ha utilizado en un caso particular puede contribuir a utilizar herramientas
de análisis adecuadas, contribuyendo así a un cierto ahorro de recursos.
Así, en esta Tesis se propone AndrODet, un mecanismo para detectar tres
tipos populares de ofuscación, a saber, el renombrado de identificadores,
cifrado de cadenas de texto y la modificación del flujo de control de la aplicación.
AndrODet se basa en técnicas de aprendizaje automático en línea
(online machine learning), por lo que es adecuado para entornos con recursos
limitados que necesitan operar de forma continua, sin interrupción.
Para medir su eficacia respecto de las técnicas de aprendizaje automático
tradicionales, se comparan los resultados con un algoritmo de aprendizaje
por lotes (batch learning) utilizando un dataset de 34.962 aplicaciones de
malware y benignas. Los resultados experimentales muestran que el enfoque
de aprendizaje en línea no solo es capaz de competir con el basado
en lotes en términos de precisión, sino que también ahorra una gran cantidad
de tiempo y recursos computacionales.
Tras la exposición de las contribuciones anteriormente mencionadas,
esta Tesis concluye con la identificación de una serie de líneas abiertas de
investigación con el fin de alentar el desarrollo de trabajos futuros en esta
dirección.Omid Mirzaei is a Ph.D. candidate in the Computer Security Lab (COSEC)
at the Department of Computer Science and Engineering of Universidad
Carlos III de Madrid (UC3M). His Ph.D. is funded by the Community
of Madrid and the European Union through the research project CIBERDINE
(Ref. S2013/ICE-3095).Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Gregorio Martínez Pérez.- Secretario: Pedro Peris López.- Vocal: Pablo Picazo Sánche
Smart Sensing Technologies for Personalised Coaching
People living in both developed and developing countries face serious health challenges related to sedentary lifestyles. It is therefore essential to find new ways to improve health so that people can live longer and can age well. With an ever-growing number of smart sensing systems developed and deployed across the globe, experts are primed to help coach people toward healthier behaviors. The increasing accountability associated with app- and device-based behavior tracking not only provides timely and personalized information and support but also gives us an incentive to set goals and to do more. This book presents some of the recent efforts made towards automatic and autonomous identification and coaching of troublesome behaviors to procure lasting, beneficial behavioral changes
Uncertainty Estimation, Explanation and Reduction with Insufficient Data
Human beings have been juggling making smart decisions under uncertainties, where we manage to trade off between swift actions and collecting sufficient evidence. It is naturally expected that a generalized artificial intelligence (GAI) to navigate through uncertainties meanwhile predicting precisely. In this thesis, we aim to propose strategies that underpin machine learning with uncertainties from three perspectives: uncertainty estimation, explanation and reduction. Estimation quantifies the variability in the model inputs and outputs. It can endow us to evaluate the model predictive confidence. Explanation provides a tool to interpret the mechanism of uncertainties and to pinpoint the potentials for uncertainty reduction, which focuses on stabilizing model training, especially when the data is insufficient. We hope that this thesis can motivate related studies on quantifying predictive uncertainties in deep learning. It also aims to raise awareness for other stakeholders in the fields of smart transportation and automated medical diagnosis where data insufficiency induces high uncertainty.
The thesis is dissected into the following sections: Introduction. we justify the necessity to investigate AI uncertainties and clarify the challenges existed in the latest studies, followed by our research objective. Literature review. We break down the the review of the state-of-the-art methods into uncertainty estimation, explanation and reduction. We make comparisons with the related fields encompassing meta learning, anomaly detection, continual learning as well. Uncertainty estimation. We introduce a variational framework, neural process that approximates Gaussian processes to handle uncertainty estimation. Two variants from the neural process families are proposed to enhance neural processes with scalability and continual learning. Uncertainty explanation. We inspect the functional distribution of neural processes to discover the global and local factors that affect the degree of predictive uncertainties. Uncertainty reduction. We validate the proposed uncertainty framework on two scenarios: urban irregular behaviour detection and neurological disorder diagnosis, where the intrinsic data insufficiency undermines the performance of existing deep learning models. Conclusion. We provide promising directions for future works and conclude the thesis
Recommended from our members
Foveated Vision Models for Search and Recognition
Computer vision has made a significant progress in recent years thanks to advancement in neural network architectures and computing power. At the sensory level, the current machine vision systems sample the visual data uniformly to make predictions about the scene. This is in contrast with the human vision system that has high visual acuity only in a small central region, the fovea, and much coarser sampling away from the center. There has been a renewed interest, particularly in the context of active vision for robotics navigation and scene exploration, to develop biologically motivated methods that can leverage such foveated computations. While foveated vision offers computational savings at or near the region of interest, it requires eye movements to scan the scene for effective image understanding. The hypothesis is that methods that can leverage non-uniform sampling of the field of view together with eye-movements will lead to a new class of active vision systems that are optimized computationally for specific tasks of interest.Inspired by the above observations, this research provides, for the first time, a comprehensive study of the human visual search in the constrained setting of person identification in the wild. A novel video database is created that systematically tests how different parts of a person contribute towards eye-movements and person identification. Our study shows that the search errors can dominate the overall recognition accuracy in human subject experiments. This calls for new strategies for integrating eye tracking with foveated image representations. Towards this two specific approaches are investigated further.In the first approach, a deep neural network based method is developed to model eye movements. Using the long-short-term-memory to model the successive fixations. The proposed method outperforms state of the state of the art performance while simplifying the feature extraction procedure. The second approach focuses on the foveated image model that leverages multiple fixations. A convolutional neural network method is proposed that works directly with the foveated input images that achieves competitive recognition rates compared to standard neural networks operating on the same number of input pixels. Overall the thesis investigates the requirements and implementations that could support active foveated vision, and lays down the ground work for future studies in this area