33 research outputs found
Intelligent Cooperative Adaptive Weight Ranking Policy via dynamic aging based on NB and J48 classifiers
The increased usage of World Wide Web leads to increase in network traffic and create a bottleneck over the internet performance. For most people, the accessing speed or the response time is the most critical factor when using the internet. Reducing response time was done by using web proxy cache technique that storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. But, the cache size is limited, so cache replacement algorithms are used to remove pages from the cache when it is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomised Policy, etc. may discard essential pages just before use. Furthermore, using conventional algorithms cannot be well optimized since it requires some decision to evict intelligently before a page is replaced. Hence, this paper proposes an integration of Adaptive Weight Ranking Policy (AWRP) with intelligent classifiers (NB-AWRP-DA and J48-AWRP-DA) via dynamic aging factor. To enhance classifiers power of prediction before integrating them with AWRP, particle swarm optimization (PSO) automated wrapper feature selection methods are used to choose the best subset of features that are relevant and influence classifiers prediction accuracy. Experimental Result shows that NB-AWRP-DA enhances the performance of web proxy cache across multi proxy datasets by 4.008%,4.087% and 14.022% over LRU, LFU, and FIFO while, J48-AWRP-DA increases HR by 0.483%, 0.563% and 10.497% over LRU, LFU, and FIFO respectively. Meanwhile, BHR of NB-AWRP-DA rises by 0.9911%,1.008% and 11.5842% over LRU, LFU, and FIFO respectively while 0.0204%, 0.0379% and 10.6136 for LRU, LFU, FIFO respectively using J48-AWRP-DA
Intelligent cooperative web caching policies for media objects based on J48 decision tree and Naive bayes supervised machine learning algorithms in structured peer-to-peer systems
Web caching plays a key role in delivering web items to end users in World Wide Web (WWW).On the other hand, cache size is considered as a limitation of web caching.Furthermore, retrieving the same media object from the origin server many times consumes the network bandwidth. Furthermore, full caching for media objects is not a practical solution and consumes cache storage in keeping few media objects because of its limited capacity. Moreover, traditional web caching policies such as Least Recently Used (LRU), Least Frequently Used (LFU), and Greedy Dual Size (GDS) suffer from caching pollution (i.e. media objects that are stored in the cache are not frequently visited which negatively affects on the performance of web proxy caching). In this work, intelligent cooperative web caching approaches based on J48 decision tree and Naïve Bayes (NB) supervised machine learning algorithms are presented. The proposed approaches take the advantages of structured peer-to-peer systems where the contents of peers’ caches are shared using Distributed Hash Table (DHT) in order to enhance the performance of the web caching policy. The performance of the proposed approaches is evaluated by running a trace-driven simulation on a dataset that is collected from IRCache network. The results demonstrate that the new proposed policies improve the performance of traditional web caching policies that are LRU, LFU, and GDS in terms of Hit Ratio (HR) and Byte Hit Ratio (BHR). Moreover, the results are compared to the most relevant and state-of-the-art web proxy caching policies.
Ratio (HR) and Byte Hit Ratio (BHR). Moreover, the results are compared to the most relevant and state-of-the-art web proxy caching policies
Computer-language based data prefetching techniques
Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data.
However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead.
In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead.
The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach.
The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché.Postprint (published version
Network Threat Detection Using Machine/Deep Learning in SDN-Based Platforms: A Comprehensive Analysis of State-of-the-Art Solutions, Discussion, Challenges, and Future Research Direction
A revolution in network technology has been ushered in by software defined networking (SDN), which makes it possible to control the network from a central location and provides an overview of the network’s security. Despite this, SDN has a single point of failure that increases the risk of potential threats. Network intrusion detection systems (NIDS) prevent intrusions into a network and preserve the network’s integrity, availability, and confidentiality. Much work has been done on NIDS but there are still improvements needed in reducing false alarms and increasing threat detection accuracy. Recently advanced approaches such as deep learning (DL) and machine learning (ML) have been implemented in SDN-based NIDS to overcome the security issues within a network. In the first part of this survey paper, we offer an introduction to the NIDS theory, as well as recent research that has been conducted on the topic. After that, we conduct a thorough analysis of the most recent ML- and DL-based NIDS approaches to ensure reliable identification of potential security risks. Finally, we focus on the opportunities and difficulties that lie ahead for future research on SDN-based ML and DL for NIDS.publishedVersio
Computer-language based data prefetching techniques
Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data.
However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead.
In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead.
The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach.
The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché
On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses
In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Being motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data
as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT security experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an effective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely and relevant cyber-threat intelligence for the purpose of detection, prevention and mitigation
of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to uncover
their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic approach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of
infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint malicious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected
from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and implement
a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies
observed in DNS records, which are potentially generated by widespread cyber-threats
Performance Evaluation of Smart Decision Support Systems on Healthcare
Medical activity requires responsibility not only from clinical knowledge and skill but
also on the management of an enormous amount of information related to patient care. It is
through proper treatment of information that experts can consistently build a healthy wellness
policy. The primary objective for the development of decision support systems (DSSs) is
to provide information to specialists when and where they are needed. These systems provide
information, models, and data manipulation tools to help experts make better decisions in a
variety of situations.
Most of the challenges that smart DSSs face come from the great difficulty of dealing
with large volumes of information, which is continuously generated by the most diverse types
of devices and equipment, requiring high computational resources. This situation makes this
type of system susceptible to not recovering information quickly for the decision making. As a
result of this adversity, the information quality and the provision of an infrastructure capable
of promoting the integration and articulation among different health information systems (HIS)
become promising research topics in the field of electronic health (e-health) and that, for this
same reason, are addressed in this research. The work described in this thesis is motivated
by the need to propose novel approaches to deal with problems inherent to the acquisition,
cleaning, integration, and aggregation of data obtained from different sources in e-health environments,
as well as their analysis.
To ensure the success of data integration and analysis in e-health environments, it
is essential that machine-learning (ML) algorithms ensure system reliability. However, in this
type of environment, it is not possible to guarantee a reliable scenario. This scenario makes
intelligent SAD susceptible to predictive failures, which severely compromise overall system
performance. On the other hand, systems can have their performance compromised due to the
overload of information they can support.
To solve some of these problems, this thesis presents several proposals and studies
on the impact of ML algorithms in the monitoring and management of hypertensive disorders
related to pregnancy of risk. The primary goals of the proposals presented in this thesis are
to improve the overall performance of health information systems. In particular, ML-based
methods are exploited to improve the prediction accuracy and optimize the use of monitoring
device resources. It was demonstrated that the use of this type of strategy and methodology
contributes to a significant increase in the performance of smart DSSs, not only concerning precision
but also in the computational cost reduction used in the classification process.
The observed results seek to contribute to the advance of state of the art in methods
and strategies based on AI that aim to surpass some challenges that emerge from the integration
and performance of the smart DSSs. With the use of algorithms based on AI, it is possible to
quickly and automatically analyze a larger volume of complex data and focus on more accurate
results, providing high-value predictions for a better decision making in real time and without
human intervention.A atividade médica requer responsabilidade não apenas com base no conhecimento
e na habilidade clínica, mas também na gestão de uma enorme quantidade de informações
relacionadas ao atendimento ao paciente. É através do tratamento adequado das informações
que os especialistas podem consistentemente construir uma política saudável de bem-estar. O
principal objetivo para o desenvolvimento de sistemas de apoio à decisão (SAD) é fornecer informações
aos especialistas onde e quando são necessárias. Esses sistemas fornecem informações,
modelos e ferramentas de manipulação de dados para ajudar os especialistas a tomar melhores
decisões em diversas situações.
A maioria dos desafios que os SAD inteligentes enfrentam advêm da grande dificuldade
de lidar com grandes volumes de dados, que é gerada constantemente pelos mais diversos
tipos de dispositivos e equipamentos, exigindo elevados recursos computacionais. Essa situação
torna este tipo de sistemas suscetível a não recuperar a informação rapidamente para a
tomada de decisão. Como resultado dessa adversidade, a qualidade da informação e a provisão
de uma infraestrutura capaz de promover a integração e a articulação entre diferentes sistemas
de informação em saúde (SIS) tornam-se promissores tópicos de pesquisa no campo da saúde
eletrônica (e-saúde) e que, por essa mesma razão, são abordadas nesta investigação. O trabalho
descrito nesta tese é motivado pela necessidade de propor novas abordagens para lidar
com os problemas inerentes à aquisição, limpeza, integração e agregação de dados obtidos de
diferentes fontes em ambientes de e-saúde, bem como sua análise.
Para garantir o sucesso da integração e análise de dados em ambientes e-saúde é
importante que os algoritmos baseados em aprendizagem de máquina (AM) garantam a confiabilidade
do sistema. No entanto, neste tipo de ambiente, não é possível garantir um cenário
totalmente confiável. Esse cenário torna os SAD inteligentes suscetíveis à presença de falhas
de predição que comprometem seriamente o desempenho geral do sistema. Por outro lado, os
sistemas podem ter seu desempenho comprometido devido à sobrecarga de informações que
podem suportar.
Para tentar resolver alguns destes problemas, esta tese apresenta várias propostas e
estudos sobre o impacto de algoritmos de AM na monitoria e gestão de transtornos hipertensivos
relacionados com a gravidez (gestação) de risco. O objetivo das propostas apresentadas nesta
tese é melhorar o desempenho global de sistemas de informação em saúde. Em particular, os
métodos baseados em AM são explorados para melhorar a precisão da predição e otimizar o
uso dos recursos dos dispositivos de monitorização. Ficou demonstrado que o uso deste tipo
de estratégia e metodologia contribui para um aumento significativo do desempenho dos SAD
inteligentes, não só em termos de precisão, mas também na diminuição do custo computacional
utilizado no processo de classificação.
Os resultados observados buscam contribuir para o avanço do estado da arte em métodos
e estratégias baseadas em inteligência artificial que visam ultrapassar alguns desafios que
advêm da integração e desempenho dos SAD inteligentes. Como o uso de algoritmos baseados
em inteligência artificial é possível analisar de forma rápida e automática um volume maior de
dados complexos e focar em resultados mais precisos, fornecendo previsões de alto valor para uma melhor tomada de decisão em tempo real e sem intervenção humana
Unified processing framework of high-dimensional and overly imbalanced chemical datasets for virtual screening.
Virtual screening in drug discovery involves processing large datasets containing unknown molecules in order to find the ones that are likely to have the desired effects on a biological target, typically a protein receptor or an enzyme. Molecules are thereby classified into active or non-active in relation to the target. Misclassification of molecules in cases such as drug discovery and medical diagnosis is costly, both in time and finances. In the process of discovering a drug, it is mainly the inactive molecules classified as active towards the biological target i.e. false positives that cause a delay in the progress and high late-stage attrition. However, despite the pool of techniques available, the selection of the suitable approach in each situation is still a major challenge. This PhD thesis is designed to develop a pioneering framework which enables the analysis of the virtual screening of chemical compounds datasets in a wide range of settings in a unified fashion. The proposed method provides a better understanding of the dynamics of innovatively combining data processing and classification methods in order to screen massive, potentially high dimensional and overly imbalanced datasets more efficiently
A semantic sensor web framework for proactive environmental monitoring and control.
Doctor of Philosophy in Computer Science, University of KwaZulu-Natal, Westville, 2017.Observing and monitoring of the natural and built environments is crucial for main-
taining and preserving human life. Environmental monitoring applications typically incorporate
some sensor technology to continually observe specific features of inter- est in the physical
environment and transmitting data emanating from these sensors to a computing system for analysis.
Semantic Sensor Web technology supports se- mantic enrichment of sensor data and provides
expressive analytic techniques for data fusion, situation detection and situation analysis.
Despite the promising successes of the Semantic Sensor Web technology, current Semantic
Sensor Web frameworks are typically focused at developing applications for detecting and
reacting to situations detected from current or past observations. While these reactive
applications provide a quick response to detected situations to minimize adverse effects,
they are limited when it comes to anticipating future adverse situations and determining
proactive control actions to prevent or mitigate these situations. Most current Semantic Sensor
Web frameworks lack two essential mechanisms required to achieve proactive control, namely,
mechanisms for antici- pating the future and coherent mechanisms for consistent decision
processing and planning.
Designing and developing proactive monitoring and control Semantic Sensor Web applications
is challenging. It requires incorporating and integrating different tech- niques for supporting
situation detection, situation prediction, decision making and planning in a coherent framework.
This research proposes a coherent Semantic Sen- sor Web framework for proactive monitoring and
control. It incorporates ontology
to facilitate situation detection from streaming sensor observations, statistical ma- chine
learning for situation prediction and Markov Decision Processes for decision making and
planning. The efficacy and use of the framework is evaluated through the development of two
different prototype applications. The first application is for proactive monitoring and
control of indoor air quality to avoid poor air quality situations. The second is for
proactive monitoring and control of electricity usage in blocks of residential houses to
prevent strain on the national grid. These appli- cations show the effectiveness of
the proposed framework for developing Semantic Sensor Web applications that proactively avert
unwanted environmental situations before they occur