9 research outputs found

    A heterogeneous mobile cloud computing model for hybrid clouds

    Get PDF
    Mobile cloud computing is a paradigm that delivers applications to mobile devices by using cloud computing. In this way, mobile cloud computing allows for a rich user experience; since client applications run remotely in the cloud infrastructure, applications use fewer resources in the user's mobile devices. In this paper, we present a new mobile cloud computing model, in which platforms of volunteer devices provide part of the resources of the cloud, inspired by both volunteer computing and mobile edge computing paradigms. These platforms may be hierarchical, based on the capabilities of the volunteer devices and the requirements of the services provided by the clouds. We also describe the orchestration between the volunteer platform and the public, private or hybrid clouds. As we show, this new model can be an inexpensive solution to different application scenarios, highlighting its benefits in cost savings, elasticity, scalability, load balancing, and efficiency. Moreover, with the evaluation performed we also show that our proposed model is a feasible solution for cloud services that have a large number of mobile users. (C) 2018 Elsevier B.V. All rights reserved.This work has been partially supported by the Spanish MINISTERIO DE ECONOMÍA Y COMPETITIVIDAD under the project grant TIN2016-79637-P TOWARDS UNIFICATION OF HPC AND BIG DATA PARADIGMS

    Virtual Organization Clusters: Self-Provisioned Clouds on the Grid

    Get PDF
    Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization

    A Practical Evaluation of a High-Security Energy-Efficient Gateway for IoT Fog Computing Applications

    Get PDF
    [Abstract] Fog computing extends cloud computing to the edge of a network enabling new Internet of Things (IoT) applications and services, which may involve critical data that require privacy and security. In an IoT fog computing system, three elements can be distinguished: IoT nodes that collect data, the cloud, and interconnected IoT gateways that exchange messages with the IoT nodes and with the cloud. This article focuses on securing IoT gateways, which are assumed to be constrained in terms of computational resources, but that are able to offload some processing from the cloud and to reduce the latency in the responses to the IoT nodes. However, it is usually taken for granted that IoT gateways have direct access to the electrical grid, which is not always the case: in mission-critical applications like natural disaster relief or environmental monitoring, it is common to deploy IoT nodes and gateways in large areas where electricity comes from solar or wind energy that charge the batteries that power every device. In this article, how to secure IoT gateway communications while minimizing power consumption is analyzed. The throughput and power consumption of Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC) are considered, since they are really popular, but have not been thoroughly analyzed when applied to IoT scenarios. Moreover, the most widespread Transport Layer Security (TLS) cipher suites use RSA as the main public key-exchange algorithm, but the key sizes needed are not practical for most IoT devices and cannot be scaled to high security levels. In contrast, ECC represents a much lighter and scalable alternative. Thus, RSA and ECC are compared for equivalent security levels, and power consumption and data throughput are measured using a testbed of IoT gateways. The measurements obtained indicate that, in the specific fog computing scenario proposed, ECC is clearly a much better alternative than RSA, obtaining energy consumption reductions of up to 50% and a data throughput that doubles RSA in most scenarios. These conclusions are then corroborated by a frame temporal analysis of Ethernet packets. In addition, current data compression algorithms are evaluated, concluding that, when dealing with the small payloads related to IoT applications, they do not pay off in terms of real data throughput and power consumption.Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; ED431C 2016-045Agencia Estatal de Investigación (España); TEC2013-47141-C4-1-RAgencia Estatal de Investigación (España); TEC2015-69648-REDCAgencia Estatal de Investigación (España); TEC2016-75067-C4-1-RGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; ED341D2016/012Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; ED431G/0

    Model-Driven Development of Domain-Specific Interfaces for Complex Event Processing in Service-Oriented Architectures

    Get PDF
    Juan Boubeta-Puig has been honoured with the Extraordinary PhD Award.En la actualidad, las empresas y organizaciones de todo el planeta necesitan gestionar cada día una ingente cantidad de datos provenientes de fuentes muy diversas, tales como aplicaciones propias y de terceros, servicios web, sensores, plataformas de Internet de las cosas o redes sociales, con el fin de llevar a cabo la toma de decisiones. Un buen proceso de toma de decisiones requiere, entre otros factores, conocer tan pronto como sea posible cuál es el valor que tienen dichos datos para el negocio empresarial. La realización de un exhaustivo análisis de datos, así como una actuación temprana en relación a las situaciones críticas o relevantes que supongan una amenaza para la empresa, permitirá que esta se posicione por encima de sus competidores. No obstante, se trata de un proceso bastante complejo, debido, entre otras razones, a que gran parte de estos datos son heterogéneos - no comparten un formato común - y además deberían procesarse en tiempo real. En este contexto, el procesamiento de eventos complejos o CEP (Complex Event Processing) es una de las tecnologías software que permite analizar y correlacionar grandes volúmenes de datos en forma de eventos con el propósito de detectar situaciones de una mayor complejidad semántica, así como inferir conocimiento valioso que ayudará en el proceso de toma de decisiones. Para ello, se hace uso de los denominados patrones de eventos en los que se especifican las condiciones que han de cumplirse para detectar dichas situaciones de interés. A pesar de las grandes ventajas que CEP puede aportar en el ámbito empresarial, supone un gran reto para los usuarios que son expertos en el negocio, pero que carecen de la experiencia y los conocimientos necesarios para el uso de esta tecnología. Uno de los principales problemas a los que deben enfrentarse estos usuarios es precisamente la definición de dichos patrones usando algún lenguaje concreto, los denominados lenguajes de procesamiento de eventos o EPL (Event Processing Language). Aunque algunas soluciones software actuales ofrecen herramientas gráficas con la finalidad de solventar este problema, estas no son lo suficientemente amigables, puesto que requieren que el usuario escriba manualmente, al menos, una parte del código necesario para la definición de los patrones. Por otra parte, los sistemas de información actuales tienden a estar basados en arquitecturas orientadas a servicios o SOA (Service-Oriented Architecture), debido a que este tipo de arquitectura software permite desarrollar sistemas distribuidos altamente escalables e integrables con otros sistemas propios o de terceros. Recientemente, estas se están combinando con las arquitecturas dirigidas por eventos o EDA (Event-Driven Architecture) dando como resultado las denominadas arquitecturas orientadas a servicios y dirigidas por eventos - ED-SOA (Event-Driven Service-Oriented Architecture) o SOA 2.0, que se caracterizan porque las comunicaciones entre los usuarios, las aplicaciones y los servicios se realizan por medio de eventos de una forma totalmente desacoplada. Para detectar, en tiempo real, situaciones críticas o relevantes en estos sistemas complejos y heterogéneos, así como de ejecutar las acciones pertinentes, se hace necesaria la integración de CEP con SOA 2.0. Con el fin de dar respuesta a estas necesidades, la investigación llevada a cabo en esta tesis doctoral se ha centrado en el desarrollo dirigido por modelos de interfaces específicas de dominio para CEP en SOA 2.0, con el objeto de facilitar a los expertos en el negocio la definición tanto de los patrones que necesiten detectar en sus sistemas de información, como de las alertas que deban notificarse en tiempo real. Para lograrlo, se ha propuesto un enfoque dirigido por modelos para CEP en SOA 2.0, un lenguaje de modelado y un editor gráfico para la definición de dominios CEP, un lenguaje de modelado y un editor gráfico reconfigurable para la definición y la generación de código de patrones de eventos, así como una solución tecnológica que integra CEP con SOA 2.0. Para la evaluación del enfoque se han desarrollado dos casos de estudio que confirman que este es independiente del dominio donde se aplique. Asimismo, se concluye que los lenguajes definidos son independientes del código que implemente los patrones de eventos y las acciones a llevar a cabo, así como que los editores gráficos facilitan todo esto de una forma amigable e intuitiva, y haciendo los detalles de implementación totalmente transparentes para los expertos en el negocio. Se trata, por tanto, de un enfoque novedoso que pone la tecnología CEP al alcance de cualquier usuario, repercutiendo beneficiosamente en el proceso de toma de decisiones.Nowadays, companies and organizations all around the world need to manage huge amounts of data from heterogeneous sources - such as own and third-party applications, web services, sensors, Internet of things platforms or social networks - every day in order to conduct the decision-making process. To be successful, a decision-making process requires, among other factors, prompt information regarding what the value of such data is for the business in question. A thorough analysis of data, as well as early action for critical or relevant situations considered as threats for an organization, will allow the organization to be positioned above its competitors. Nevertheless, it is a complex process since, among other reasons, data are heterogeneous - they do not share a common format - and they should be processed in real time. In this context, Complex Event Processing (CEP) is a technology that allows the analysis and correlation of large volumes of data with the aim of detecting complex and meaningful events, and of inferring valuable knowledge for end users. This knowledge will be really helpful in the decision-making process. To do this, so-called event patterns are used. These patterns specify which conditions must be met in order to detect such situations of interest. Despite the great advantages that CEP can bring to a business, it is a substantial challenge for users who are business experts, but do not have the necessary experience and knowledge for the use of this technology. One of the main problems these users have to face is precisely the definition of these event patterns using particular languages, those called Event Processing Languages (EPLs). Although some current software solutions provide graphical tools in order to solve this problem, none of them are user-friendly enough, since they require users to hand-write at least a portion of the code for the pattern definition. On the other hand, current information systems tend to be based on Service-Oriented Architectures (SOAs) due to the fact that this type of software architecture allows the development of highly scalable distributed systems as well as their integration with own and third-party systems. Recently, these are being combined with Event-Driven Architectures (EDAs), what is known as Event-Driven Service-Oriented Architectures (ED-SOAs or SOAs 2.0). The latter enable us to establish decoupled communications between users, applications and services. The integration of CEP with SOA 2.0 is a requirement to be able to detect real-time, relevant or critical situations in these complex and heterogeneous systems, as well as for the execution of the appropriate actions. In order to respond to these needs, the research carried out in this thesis has focused on the model-driven development of domain-specific interfaces for CEP in SOAs 2.0, with the aim of facilitating the task of defining both event patterns to be detected and alerts for real time notification for domain experts. To reach this goal, the following contributions have been supplied: a model-driven approach for CEP in SOA 2.0, a modeling language and a graphical editor for CEP domain definition, a modeling language and a reconfigurable graphical editor for event pattern definition and code generation, as well as a technological solution integrating CEP with SOA 2.0. The approach has been evaluated through the development of two case studies that have confirmed independence of the approach from the application domain. Besides, we can assert that the proposed languages are independent of event patterns and actions implementation code and that user-friendly and intuitive graphical editors hide all the implementation details from end users. This is therefore a novel approach for bringing CEP technology closer to any user, positively impacting the decision making process

    New Secure IoT Architectures, Communication Protocols and User Interaction Technologies for Home Automation, Industrial and Smart Environments

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e das Comunicacións en Redes Móbiles. 5029V01Tese por compendio de publicacións[Abstract] The Internet of Things (IoT) presents a communication network where heterogeneous physical devices such as vehicles, homes, urban infrastructures or industrial machinery are interconnected and share data. For these communications to be successful, it is necessary to integrate and embed electronic devices that allow for obtaining environmental information (sensors), for performing physical actuations (actuators) as well as for sending and receiving data (network interfaces). This integration of embedded systems poses several challenges. It is needed for these devices to present very low power consumption. In many cases IoT nodes are powered by batteries or constrained power supplies. Moreover, the great amount of devices needed in an IoT network makes power e ciency one of the major concerns of these deployments, due to the cost and environmental impact of the energy consumption. This need for low energy consumption is demanded by resource constrained devices, con icting with the second major concern of IoT: security and data privacy. There are critical urban and industrial systems, such as tra c management, water supply, maritime control, railway control or high risk industrial manufacturing systems such as oil re neries that will obtain great bene ts from IoT deployments, for which non-authorized access can posse severe risks for public safety. On the other hand, both these public systems and the ones deployed on private environments (homes, working places, malls) present a risk for the privacy and security of their users. These IoT deployments need advanced security mechanisms, both to prevent access to the devices and to protect the data exchanged by them. As a consequence, it is needed to improve two main aspects: energy e ciency of IoT devices and the use of lightweight security mechanisms that can be implemented by these resource constrained devices but at the same time guarantee a fair degree of security. The huge amount of data transmitted by this type of networks also presents another challenge. There are big data systems capable of processing large amounts of data, but with IoT the granularity and dispersion of the generated information presents a new scenario very di erent from the one existing nowadays. Forecasts anticipate that there will be a growth from the 15 billion installed devices in 2015 to more than 75 billion devices in 2025. Moreover, there will be much more services exploiting the data produced by these networks, meaning the resulting tra c will be even higher. The information must not only be processed in real time, but data mining processes will have to be performed to historical data. The main goal of this Ph.D. thesis is to analyze each one of the previously described challenges and to provide solutions that allow for an adequate adoption of IoT in Industrial, domestic and, in general, any scenario that can obtain any bene t from the interconnection and exibility that IoT brings.[Resumen] La internet de las cosas (IoT o Internet of Things) representa una red de intercomunicaciones en la que participan dispositivos físicos de toda índole, como vehículos, viviendas, electrodomésticos, infraestructuras urbanas o maquinaria y dispositivos industriales. Para que esta comunicación se pueda llevar a cabo es necesario integrar elementos electr onicos que permitan obtener informaci on del entorno (sensores), realizar acciones f sicas (actuadores) y enviar y recibir la informaci on necesaria (interfaces de comunicaciones de red). La integración y uso de estos sistemas electrónicos embebidos supone varios retos. Es necesario que dichos dispositivos presenten un consumo reducido. En muchos casos deberían ser alimentados por baterías o fuentes de alimentación limitadas. Además, la gran cantidad de dispositivos que involucra la IoT hace necesario que la e ciencia energética de los mismos sea una de las principales preocupaciones, por el coste e implicaciones medioambientales que supone el consumo de electricidad de los mismos. Esta necesidad de limitar el consumo provoca que dichos dispositivos tengan unas prestaciones muy limitadas, lo que entra en conflicto con la segunda mayor preocupación de la IoT: la seguridad y privacidad de los datos. Por un lado existen sistemas críticos urbanos e industriales, como puede ser la regulación del tráfi co, el control del suministro de agua, el control marítimo, el control ferroviario o los sistemas de producción industrial de alto riesgo, como refi nerías, que son claros candidatos a benefi ciarse de la IoT, pero cuyo acceso no autorizado supone graves problemas de seguridad ciudadana. Por otro lado, tanto estos sistemas de naturaleza publica, como los que se desplieguen en entornos privados (viviendas, entornos de trabajo o centros comerciales, entre otros) suponen un riesgo para la privacidad y también para la seguridad de los usuarios. Todo esto hace que sean necesarios mecanismos de seguridad avanzados, tanto de acceso a los dispositivos como de protección de los datos que estos intercambian. En consecuencia, es necesario avanzar en dos aspectos principales: la e ciencia energética de los dispositivos y el uso de mecanismos de seguridad e ficientes, tanto computacional como energéticamente, que permitan la implantación de la IoT sin comprometer la seguridad y la privacidad de los usuarios. Por otro lado, la ingente cantidad de información que estos sistemas puede llegar a producir presenta otros dos retos que deben ser afrontados. En primer lugar, el tratamiento y análisis de datos toma una nueva dimensión. Existen sistemas de big data capaces de procesar cantidades enormes de información, pero con la internet de las cosas la granularidad y dispersión de los datos plantean un escenario muy distinto al actual. La previsión es pasar de 15.000.000.000 de dispositivos instalados en 2015 a más de 75.000.000.000 en 2025. Además existirán multitud de servicios que harán un uso intensivo de estos dispositivos y de los datos que estos intercambian, por lo que el volumen de tráfico será todavía mayor. Asimismo, la información debe ser procesada tanto en tiempo real como a posteriori sobre históricos, lo que permite obtener información estadística muy relevante en diferentes entornos. El principal objetivo de la presente tesis doctoral es analizar cada uno de estos retos (e ciencia energética, seguridad, procesamiento de datos e interacción con el usuario) y plantear soluciones que permitan una correcta adopción de la internet de las cosas en ámbitos industriales, domésticos y en general en cualquier escenario que se pueda bene ciar de la interconexión y flexibilidad de acceso que proporciona el IoT.[Resumo] O internet das cousas (IoT ou Internet of Things) representa unha rede de intercomunicaci óns na que participan dispositivos físicos moi diversos, coma vehículos, vivendas, electrodomésticos, infraestruturas urbanas ou maquinaria e dispositivos industriais. Para que estas comunicacións se poidan levar a cabo é necesario integrar elementos electrónicos que permitan obter información da contorna (sensores), realizar accións físicas (actuadores) e enviar e recibir a información necesaria (interfaces de comunicacións de rede). A integración e uso destes sistemas electrónicos integrados supón varios retos. En primeiro lugar, é necesario que estes dispositivos teñan un consumo reducido. En moitos casos deberían ser alimentados por baterías ou fontes de alimentación limitadas. Ademais, a gran cantidade de dispositivos que se empregan na IoT fai necesario que a e ciencia enerxética dos mesmos sexa unha das principais preocupacións, polo custo e implicacións medioambientais que supón o consumo de electricidade dos mesmos. Esta necesidade de limitar o consumo provoca que estes dispositivos teñan unhas prestacións moi limitadas, o que entra en con ito coa segunda maior preocupación da IoT: a seguridade e privacidade dos datos. Por un lado existen sistemas críticos urbanos e industriais, como pode ser a regulación do tráfi co, o control de augas, o control marítimo, o control ferroviario ou os sistemas de produción industrial de alto risco, como refinerías, que son claros candidatos a obter benefi cios da IoT, pero cuxo acceso non autorizado supón graves problemas de seguridade cidadá. Por outra parte tanto estes sistemas de natureza pública como os que se despreguen en contornas privadas (vivendas, contornas de traballo ou centros comerciais entre outros) supoñen un risco para a privacidade e tamén para a seguridade dos usuarios. Todo isto fai que sexan necesarios mecanismos de seguridade avanzados, tanto de acceso aos dispositivos como de protección dos datos que estes intercambian. En consecuencia, é necesario avanzar en dous aspectos principais: a e ciencia enerxética dos dispositivos e o uso de mecanismos de seguridade re cientes, tanto computacional como enerxéticamente, que permitan o despregue da IoT sen comprometer a seguridade e a privacidade dos usuarios. Por outro lado, a inxente cantidade de información que estes sistemas poden chegar a xerar presenta outros retos que deben ser tratados. O tratamento e a análise de datos toma unha nova dimensión. Existen sistemas de big data capaces de procesar cantidades enormes de información, pero coa internet das cousas a granularidade e dispersión dos datos supón un escenario moi distinto ao actual. A previsión e pasar de 15.000.000.000 de dispositivos instalados no ano 2015 a m ais de 75.000.000.000 de dispositivos no ano 2025. Ademais existirían multitude de servizos que farían un uso intensivo destes dispositivos e dos datos que intercambian, polo que o volume de tráfico sería aínda maior. Do mesmo xeito a información debe ser procesada tanto en tempo real como posteriormente sobre históricos, o que permite obter información estatística moi relevante en diferentes contornas. O principal obxectivo da presente tese doutoral é analizar cada un destes retos (e ciencia enerxética, seguridade, procesamento de datos e interacción co usuario) e propor solucións que permitan unha correcta adopción da internet das cousas en ámbitos industriais, domésticos e en xeral en todo aquel escenario que se poda bene ciar da interconexión e flexibilidade de acceso que proporciona a IoT

    Secure Communication in Disaster Scenarios

    Get PDF
    Während Naturkatastrophen oder terroristischer Anschläge ist die bestehende Kommunikationsinfrastruktur häufig überlastet oder fällt komplett aus. In diesen Situationen können mobile Geräte mithilfe von drahtloser ad-hoc- und unterbrechungstoleranter Vernetzung miteinander verbunden werden, um ein Notfall-Kommunikationssystem für Zivilisten und Rettungsdienste einzurichten. Falls verfügbar, kann eine Verbindung zu Cloud-Diensten im Internet eine wertvolle Hilfe im Krisen- und Katastrophenmanagement sein. Solche Kommunikationssysteme bergen jedoch ernsthafte Sicherheitsrisiken, da Angreifer versuchen könnten, vertrauliche Daten zu stehlen, gefälschte Benachrichtigungen von Notfalldiensten einzuspeisen oder Denial-of-Service (DoS) Angriffe durchzuführen. Diese Dissertation schlägt neue Ansätze zur Kommunikation in Notfallnetzen von mobilen Geräten vor, die von der Kommunikation zwischen Mobilfunkgeräten bis zu Cloud-Diensten auf Servern im Internet reichen. Durch die Nutzung dieser Ansätze werden die Sicherheit der Geräte-zu-Geräte-Kommunikation, die Sicherheit von Notfall-Apps auf mobilen Geräten und die Sicherheit von Server-Systemen für Cloud-Dienste verbessert

    Classification and Analysis of Computer Network Traffic

    Get PDF

    Classifying tor traffic using character analysis

    Get PDF
    Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field.Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field

    An exploration of the overlap between open source threat intelligence and active internet background radiation

    Get PDF
    Organisations and individuals are facing increasing persistent threats on the Internet from worms, port scanners, and malicious software (malware). These threats are constantly evolving as attack techniques are discovered. To aid in the detection and prevention of such threats, and to stay ahead of the adversaries conducting the attacks, security specialists are utilising Threat Intelligence (TI) data in their defense strategies. TI data can be obtained from a variety of different sources such as private routers, firewall logs, public archives, and public or private network telescopes. However, at the rate and ease at which TI is produced and published, specifically Open Source Threat Intelligence (OSINT), the quality is dropping, resulting in fragmented, context-less and variable data. This research utilised two sets of TI data, a collection of OSINT and active Internet Background Radiation (IBR). The data was collected over a period of 12 months, from 37 publicly available OSINT datasets and five IBR datasets. Through the identification and analysis of common data between the OSINT and IBR datasets, this research was able to gain insight into how effective OSINT is at detecting and potentially reducing ongoing malicious Internet traffic. As part of this research, a minimal framework for the collection, processing/analysis, and distribution of OSINT was developed and tested. The research focused on exploring areas in common between the two datasets, with the intention of creating an enriched, contextualised, and reduced set of malicious source IP addresses that could be published for consumers to use in their own environment. The findings of this research pointed towards a persistent group of IP addresses observed on both datasets, over the period under research. Using these persistent IP addresses, the research was able to identify specific services being targeted. Amongst these persistent IP addresses were significant packets from Mirai like IoT Malware on port 23/tcp and 2323/tcp as well as general scanning activity on port 445/TCP
    corecore