1,056 research outputs found
Flashpoint: A Low-latency Serverless Platform for Deep Learning Inference Serving
Recent breakthroughs in Deep Learning (DL) have led to high demand for executing inferences in interactive services such as ChatGPT and GitHub Copilot. However, these interactive services require low-latency inferences, which can only be met with GPUs and result in exorbitant operating costs. For instance, ChatGPT reportedly requires millions of U.S. dollars in cloud GPUs to serve its 1+ million users. A potential solution to meet low-latency requirements with acceptable costs is to use serverless platforms. These platforms automatically scale resources to meet user demands. However, current serverless systems have long cold starts which worsen with larger DL models and lead to poor performance during bursts of requests. Meanwhile, the demand for larger and larger DL models make it more challenging to deliver an acceptable user experience cost-effectively. While current systems over-provision GPUs to address this issue, they incur high costs in idle resources which greatly reduces the benefit of using a serverless platform.
In this thesis, we introduce Flashpoint, a GPU-based serverless platform that serves DL inferences with low latencies. Flashpoint achieves this by reducing cold start durations, especially for large DL models, making serverless computing feasible for latency-sensitive DL workloads. To reduce cold start durations, Flashpoint reduces download times by sourcing the DL model data from within the compute cluster rather than slow cloud storage. Additionally, Flashpoint minimizes in-cluster network congestion from redundant packet transfers of the same DL model to multiple machines with multicasting. Finally, Flashpoint also reduces cold start durations by automatically partitioning models and deploying them in parallel on multiple machines. The reduced cold start durations achieved by Flashpoint enable the platform to scale resource allocations elastically and complete requests with low latencies without over-provisioning expensive GPU resources.
We perform large-scale data center simulations that were parameterized with measurements our prototype implementations. We evaluate the system using six state-of-the-art DL models ranging from 499 MB to 11 GB in size. We also measure the performance of the system in representative real-world traces from Twitter and Microsoft Azure. Our results in the full-scale simulations show that Flashpoint achieves an arithmetic mean of 93.51% shorter average cold start durations, leading to 75.42% and 66.90% respective reductions in average and 99th percentile end-to-end request latencies across the DL models with the same amount of resources. These results show that Flashpoint boosts the performance of serving DL inferences on a serverless platform without increasing costs
A Cybersecurity review of Healthcare Industry
Antecedentes La ciberseguridad no es un concepto nuevo de nuestros días. Desde los años 60 la ciberseguridad ha sido un ámbito de discusión e investigación. Aunque los mecanismos de defensa en materia de seguridad han evolucionado, las capacidades del atacante también se han incrementado de igual o mayor manera. Prueba de este hecho es la precaria situación en materia de ciberseguridad de muchas empresas, que ha llevado a un incremento de ataques de ransomware y el establecimiento de grandes organizaciones criminales dedicadas al cibercrimen. Esta situación, evidencia la necesidad de avances e inversión en ciberseguridad en multitud de sectores, siendo especialmente relevante en la protección de infraestructuras críticas. Se conoce como infraestructuras críticas aquellas infraestructuras estratégicas cuyo funcionamiento es indispensable y no permite soluciones alternativas, por lo que su perturbación o destrucción tendría un grave impacto sobre los servicios esenciales. Dentro de esta categorización se encuentran los servicios e infraestructuras sanitarias. Estas infraestructuras ofrecen un servicio, cuya interrupción conlleva graves consecuencias, como la pérdida de vidas humanas. Un ciberataque puede afectar a estos servicios sanitarios, llevando a su paralización total o parcial, como se ha visto en recientes incidentes, llevando incluso a la pérdida de vidas humanas. Además, este tipo de servicios contienen multitud de información personal de carácter altamente sensible. Los datos médicos son un tipo de datos con alto valor en mercados ilegales, y por tanto objetivos de ataques centrados en su robo. Por otra parte, se debe mencionar, que al igual que otros sectores, actualmente los servicios sanitarios se encuentran en un proceso de digitalización. Esta evolución, ha obviado la ciberseguridad en la mayoría de sus desarrollos, contribuyendo al crecimiento y gravedad de los ataques previamente mencionados. - Metodología e investigación El trabajo presentado en esta tesis sigue claramente un método experimental y deductivo. Está investigación se ha centrado en evaluar el estado de la ciberseguridad en infraestructuras sanitarias y proponer mejoras y mecanismos de detección de ciberataques. Las tres publicaciones científicas incluidas en esta tesis buscan dar soluciones y evaluar problemas actuales en el ámbito de las infraestructuras y sistemas sanitarios. La primera publicación, 'Mobile malware detection using machine learning techniques', se centró en desarrollar nuevas técnicas de detección de amenazas basadas en el uso de tecnologías de inteligencia artificial y ‘machine learning’. Esta investigación fue capaz de desarrollar un método de detección de aplicaciones potencialmente no deseadas y maliciosas en entornos móviles de tipo Android. Además, tanto en el diseño y creación se tuvo en cuenta las necesidades específicas de los entornos sanitarios. Buscando ofrecer una implantación sencilla y viable de acorde las necesidades de estos centros, obteniéndose resultados satisfactorios. La segunda publicación, 'Interconnection Between Darknets', buscaba identificar y detectar robos y venta de datos médicos en darknets. El desarrollo de esta investigación conllevó el descubrimiento y prueba de la interconexión entre distintas darknets. La búsqueda y el análisis de información en este tipo de redes permitió demostrar como distintas redes comparten información y referencias entre ellas. El análisis de una darknet implica la necesidad de analizar otras, para obtener una información más completa de la primera. Finalmente, la última publicación, 'Security and privacy issues of data-over-sound technologies used in IoT healthcare devices' buscó investigar y evaluar la seguridad de dispositivos médicos IoT ('Internet of Things'). Para desarrollar esta investigación se adquirió un dispositivo médico, un electrocardiógrafo portable, actualmente en uso por diversos hospitales. Las pruebas realizadas sobre este dispositivo fueron capaces de descubrir múltiples fallos de ciberseguridad. Estos descubrimientos evidenciaron la carencia de certificaciones y revisiones obligatorias en materia ciberseguridad en productos sanitarios, comercializados actualmente. Desgraciadamente la falta de presupuesto dedicado a investigación no permitió la adquisición de varios dispositivos médicos, para su posterior evaluación en ciberseguridad. - Conclusiones La realización de los trabajos e investigaciones previamente mencionadas permitió obtener las siguientes conclusiones. Partiendo de la necesidad en mecanismos de ciberseguridad de las infraestructuras sanitarias, se debe tener en cuenta su particularidad diseño y funcionamiento. Las pruebas y mecanismos de ciberseguridad diseñados han de ser aplicables en entornos reales. Desgraciadamente actualmente en las infraestructuras sanitarias hay sistemas tecnológicos imposibles de actualizar o modificar. Multitud de máquinas de tratamiento y diagnostico cuentan con software y sistemas operativos propietarios a los cuales los administradores y empleados no tienen acceso. Teniendo en cuenta esta situación, se deben desarrollar medidas que permitan su aplicación en este ecosistema y que en la medida de los posible puedan reducir y paliar el riesgo ofrecido por estos sistemas. Esta conclusión viene ligada a la falta de seguridad en dispositivos médicos. La mayoría de los dispositivos médicos no han seguido un proceso de diseño seguro y no han sido sometidos a pruebas de seguridad por parte de los fabricantes, al suponer esto un coste directo en el desarrollo del producto. La única solución en este aspecto es la aplicación de una legislación que fuerce a los fabricantes a cumplir estándares de seguridad. Y aunque actualmente se ha avanzado en este aspecto regulatorio, se tardaran años o décadas en sustituir los dispositivos inseguros. La imposibilidad de actualizar, o fallos relacionados con el hardware de los productos, hacen imposible la solución de todos los fallos de seguridad que se descubran. Abocando al reemplazo del dispositivo, cuando exista una alternativa satisfactoria en materia de ciberseguridad. Por esta razón es necesario diseñar nuevos mecanismos de ciberseguridad que puedan ser aplicados actualmente y puedan mitigar estos riesgos en este periodo de transición. Finalmente, en materia de robo de datos. Aunque las investigaciones preliminares realizadas en esta tesis no consiguieron realizar ningún descubrimiento significativo en el robo y venta de datos. Actualmente las darknets, en concreto la red Tor, se han convertido un punto clave en el modelo de Ransomware as a Business (RaaB), al ofrecer sitios webs de extorsión y contacto con estos grupos
Optimizing Flow Routing Using Network Performance Analysis
Relevant conferences were attended at which work was often presented and several papers were published in the course of this project.
• Muna Al-Saadi, Bogdan V Ghita, Stavros Shiaeles, Panagiotis Sarigiannidis. A novel approach for performance-based clustering and management of network traffic flows, IWCMC, ©2019 IEEE.
• M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: Unsupervised Machine Learning-Based Elephant and Mice Flow Identification, Computing Conference 2021.
• M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: SDN-Based Routing Framework for Elephant and Mice Flows Using Unsupervised Machine Learning, Network, 3(1), pp.218-238, 2023.The main task of a network is to hold and transfer data between its nodes. To achieve this task, the network needs to find the optimal route for data to travel by employing a particular routing system. This system has a specific job that examines each possible path for data and chooses the suitable one and transmit the data packets where it needs to go as fast as possible. In addition, it contributes to enhance the performance of network as optimal routing algorithm helps to run network efficiently. The clear performance advantage that provides by routing procedures is the faster data access. For example, the routing algorithm take a decision that determine the best route based on the location where the data is stored and the destination device that is asking for it. On the other hand, a network can handle many types of traffic simultaneously, but it cannot exceed the bandwidth allowed as the maximum data rate that the network can transmit. However, the overloading problem are real and still exist. To avoid this problem, the network chooses the route based on the available bandwidth space. One serious problem in the network is network link congestion and disparate load caused by elephant flows. Through forwarding elephant flows, network links will be congested with data packets causing transmission collision, congestion network, and delay in transmission. Consequently, there is not enough bandwidth for mice flows, which causes the problem of transmission delay.
Traffic engineering (TE) is a network application that concerns with measuring and managing network traffic and designing feasible routing mechanisms to guide the traffic of the network for improving the utilization of network resources. The main function of traffic engineering is finding an obvious route to achieve the bandwidth requirements of the network consequently optimizing the network performance [1]. Routing optimization has a key role in traffic engineering by finding efficient routes to achieve the desired performance of the network [2]. Furthermore, routing optimization can be considered as one of the primary goals in the field of networks. In particular, this goal is directly related to traffic engineering, as it is based on one particular idea: to achieve that traffic is routed according to accurate traffic requirements [3]. Therefore, we can say that traffic engineering is one of the applications of multiple improvements to routing; routing can also be optimized based on other factors (not just on traffic requirements). In addition, these traffic requirements are variable depending on analyzed dataset that considered if it is data or traffic control. In this regard, the logical central view of the Software Defined Network (SDN) controller facilitates many aspects compared to traditional routing. The main challenge in all network types is performance optimization, but the situation is different in SDN because the technique is changed from distributed approach to a centralized one. The characteristics of SDN such as centralized control and programmability make the possibility of performing not only routing in traditional distributed manner but also routing in centralized manner. The first advantage of centralized routing using SDN is the existence of a path to exchange information between the controller and infrastructure devices. Consequently, the controller has the information for the entire network, flexible routing can be achieved. The second advantage is related to dynamical control of routing due to the capability of each device to change its configuration based on the controller commands [4].
This thesis begins with a wide review of the importance of network performance analysis and its role for understanding network behavior, and how it contributes to improve the performance of the network. Furthermore, it clarifies the existing solutions of network performance optimization using machine learning (ML) techniques in traditional networks and SDN environment. In addition, it highlights recent and ongoing studies of the problem of unfair use of network resources by a particular flow (elephant flow) and the possible solutions to solve this problem. Existing solutions are predominantly, flow routing-based and do not consider the relationship between network performance analysis and flow characterization and how to take advantage of it to optimize flow routing by finding the convenient path for each type of flow. Therefore, attention is given to find a method that may describe the flow based on network performance analysis and how to utilize this method for managing network performance efficiently and find the possible integration for the traffic controlling in SDN. To this purpose, characteristics of network flows is identified as a mechanism which may give insight into the diversity in flow features based on performance metrics and provide the possibility of traffic engineering enhancement using SDN environment. Two different feature sets with respect to network performance metrics are employed to characterize network traffic. Applying unsupervised machine learning techniques including Principal Component Analysis (PCA) and k-means cluster analysis to derive a traffic performance-based clustering model. Afterward, thresholding-based flow identification paradigm has been built using pre-defined parameters and thresholds. Finally, the resulting data clusters are integrated within a unified SDN architectural solution, which improves network management by finding the best flow routing based on the type of flow, to be evaluated against a number of traffic data sources and different performance experiments. The validation process of the novel framework performance has been done by making a performance comparison between SDN-Ryu controller and the proposed SDN-external application based on three factors: throughput, bandwidth,and data transfer rate by conducting two experiments. Furthermore, the proposed method has been validated by using different Data Centre Network (DCN) topologies to demonstrate the effectiveness of the network traffic management solution. The overall validation metrics shows real gains, the results show that 70% of the time, it has high performance with different flows. The proposed routing SDN traffic-engineering paradigm for a particular flow therefore, dynamically provisions network resources among different flow types
Strengthening Privacy and Cybersecurity through Anonymization and Big Data
L'abstract è presente nell'allegato / the abstract is in the attachmen
Timely Classification of Encrypted or ProtocolObfuscated Internet Traffic Using Statistical Methods
Internet traffic classification aims to identify the type of application or protocol that generated
a particular packet or stream of packets on the network. Through traffic classification,
Internet Service Providers (ISPs), governments, and network administrators can
access basic functions and several solutions, including network management, advanced
network monitoring, network auditing, and anomaly detection. Traffic classification is
essential as it ensures the Quality of Service (QoS) of the network, as well as allowing
efficient resource planning.
With the increase of encrypted or obfuscated protocol traffic on the Internet and multilayer
data encapsulation, some classical classification methods have lost interest from the
scientific community. The limitations of traditional classification methods based on port
numbers and payload inspection to classify encrypted or obfuscated Internet traffic have
led to significant research efforts focused on Machine Learning (ML) based classification
approaches using statistical features from the transport layer. In an attempt to increase
classification performance, Machine Learning strategies have gained interest from the scientific
community and have shown promise in the future of traffic classification, specially
to recognize encrypted traffic.
However, ML approach also has its own limitations, as some of these methods have a
high computational resource consumption, which limits their application when classifying
large traffic or realtime
flows. Limitations of ML application have led to the investigation
of alternative approaches, including featurebased
procedures and statistical methods. In
this sense, statistical analysis methods, such as distances and divergences, have been used
to classify traffic in large flows and in realtime.
The main objective of statistical distance is to differentiate flows and find a pattern in
traffic characteristics through statistical properties, which enable classification. Divergences
are functional expressions often related to information theory, which measure the
degree of discrepancy between any two distributions.
This thesis focuses on proposing a new methodological approach to classify encrypted
or obfuscated Internet traffic based on statistical methods that enable the evaluation of
network traffic classification performance, including the use of computational resources
in terms of CPU and memory. A set of traffic classifiers based on KullbackLeibler
and
JensenShannon
divergences, and Euclidean, Hellinger, Bhattacharyya, and Wootters distances
were proposed. The following are the four main contributions to the advancement
of scientific knowledge reported in this thesis.
First, an extensive literature review on the classification of encrypted and obfuscated Internet traffic was conducted. The results suggest that portbased
and payloadbased
methods are becoming obsolete due to the increasing use of traffic encryption and multilayer
data encapsulation. MLbased
methods are also becoming limited due to their computational
complexity. As an alternative, Support Vector Machine (SVM), which is also
an ML method, and the KolmogorovSmirnov
and Chisquared
tests can be used as reference
for statistical classification. In parallel, the possibility of using statistical methods
for Internet traffic classification has emerged in the literature, with the potential of good
results in classification without the need of large computational resources. The potential
statistical methods are Euclidean Distance, Hellinger Distance, Bhattacharyya Distance,
Wootters Distance, as well as KullbackLeibler
(KL) and JensenShannon
divergences.
Second, we present a proposal and implementation of a classifier based on SVM for P2P
multimedia traffic, comparing the results with KolmogorovSmirnov
(KS) and Chisquare
tests. The results suggest that SVM classification with Linear kernel leads to a better classification
performance than KS and Chisquare
tests, depending on the value assigned to
the Self C parameter. The SVM method with Linear kernel and suitable values for the Self
C parameter may be a good choice to identify encrypted P2P multimedia traffic on the
Internet.
Third, we present a proposal and implementation of two classifiers based on KL Divergence
and Euclidean Distance, which are compared to SVM with Linear kernel, configured
with the standard Self C parameter, showing a reduced ability to classify flows based
solely on packet sizes compared to KL and Euclidean Distance methods. KL and Euclidean
methods were able to classify all tested applications, particularly streaming and P2P,
where for almost all cases they efficiently identified them with high accuracy, with reduced
consumption of computational resources. Based on the obtained results, it can be
concluded that KL and Euclidean Distance methods are an alternative to SVM, as these
statistical approaches can operate in realtime
and do not require retraining every time a
new type of traffic emerges.
Fourth, we present a proposal and implementation of a set of classifiers for encrypted
Internet traffic, based on JensenShannon
Divergence and Hellinger, Bhattacharyya, and
Wootters Distances, with their respective results compared to those obtained with methods
based on Euclidean Distance, KL, KS, and ChiSquare.
Additionally, we present a comparative
qualitative analysis of the tested methods based on Kappa values and Receiver
Operating Characteristic (ROC) curves. The results suggest average accuracy values above
90% for all statistical methods, classified as ”almost perfect reliability” in terms of Kappa
values, with the exception of KS. This result indicates that these methods are viable options
to classify encrypted Internet traffic, especially Hellinger Distance, which showed
the best Kappa values compared to other classifiers. We conclude that the considered
statistical methods can be accurate and costeffective
in terms of computational resource
consumption to classify network traffic. Our approach was based on the classification of Internet network traffic, focusing on statistical
distances and divergences. We have shown that it is possible to classify and obtain
good results with statistical methods, balancing classification performance and the
use of computational resources in terms of CPU and memory. The validation of the proposal
supports the argument of this thesis, which proposes the implementation of statistical
methods as a viable alternative to Internet traffic classification compared to methods
based on port numbers, payload inspection, and ML.A classificação de tráfego Internet visa identificar o tipo de aplicação ou protocolo que
gerou um determinado pacote ou fluxo de pacotes na rede. Através da classificação de
tráfego, Fornecedores de Serviços de Internet (ISP), governos e administradores de rede
podem ter acesso às funções básicas e várias soluções, incluindo gestão da rede, monitoramento
avançado de rede, auditoria de rede e deteção de anomalias. Classificar o tráfego é
essencial, pois assegura a Qualidade de Serviço (QoS) da rede, além de permitir planear
com eficiência o uso de recursos.
Com o aumento de tráfego cifrado ou protocolo ofuscado na Internet e do encapsulamento
de dados multicamadas, alguns métodos clássicos da classificação perderam interesse de
investigação da comunidade científica. As limitações dos métodos tradicionais da classificação
com base no número da porta e na inspeção de carga útil payload para classificar
o tráfego de Internet cifrado ou ofuscado levaram a esforços significativos de investigação
com foco em abordagens da classificação baseadas em técnicas de Aprendizagem
Automática (ML) usando recursos estatísticos da camada de transporte. Na tentativa
de aumentar o desempenho da classificação, as estratégias de Aprendizagem Automática
ganharam o interesse da comunidade científica e se mostraram promissoras no futuro da
classificação de tráfego, principalmente no reconhecimento de tráfego cifrado.
No entanto, a abordagem em ML também têm as suas próprias limitações,
pois alguns
desses métodos possuem um elevado consumo de recursos computacionais, o que limita
a sua aplicação para classificação de grandes fluxos de tráfego ou em tempo real. As limitações
no âmbito da aplicação de ML levaram à investigação de abordagens alternativas,
incluindo procedimentos baseados em características e métodos estatísticos. Neste sentido,
os métodos de análise estatística, tais como distâncias e divergências, têm sido utilizados
para classificar tráfego em grandes fluxos e em tempo real.
A distância estatística possui como objetivo principal diferenciar os fluxos e permite encontrar
um padrão nas características de tráfego através de propriedades estatísticas, que
possibilitam a classificação. As divergências são expressões funcionais frequentemente
relacionadas com a teoria da informação, que mede o grau de discrepância entre duas
distribuições quaisquer.
Esta tese focase
na proposta de uma nova abordagem metodológica para classificação de
tráfego cifrado ou ofuscado da Internet com base em métodos estatísticos que possibilite
avaliar o desempenho da classificação de tráfego de rede, incluindo a utilização de recursos
computacionais, em termos de CPU e memória. Foi proposto um conjunto de classificadores
de tráfego baseados nas Divergências de KullbackLeibler
e JensenShannon
e Distâncias Euclidiana, Hellinger, Bhattacharyya e Wootters. A seguir resumemse
os tese.
Primeiro, realizámos uma ampla revisão de literatura sobre classificação de tráfego cifrado
e ofuscado de Internet. Os resultados sugerem que os métodos baseados em porta e
baseados em carga útil estão se tornando obsoletos em função do crescimento da utilização
de cifragem de tráfego e encapsulamento de dados multicamada. O tipo de métodos
baseados em ML também está se tornando limitado em função da complexidade computacional.
Como alternativa, podese
utilizar a Máquina de Vetor de Suporte (SVM),
que também é um método de ML, e os testes de KolmogorovSmirnov
e Quiquadrado
como referência de comparação da classificação estatística. Em paralelo, surgiu na literatura
a possibilidade de utilização de métodos estatísticos para classificação de tráfego
de Internet, com potencial de bons resultados na classificação sem aporte de grandes recursos
computacionais. Os métodos estatísticos potenciais são as Distâncias Euclidiana,
Hellinger, Bhattacharyya e Wootters, além das Divergências de Kullback–Leibler (KL) e
JensenShannon.
Segundo, apresentamos uma proposta e implementação de um classificador baseado na
Máquina de Vetor de Suporte (SVM) para o tráfego multimédia P2P (PeertoPeer),
comparando
os resultados com os testes de KolmogorovSmirnov
(KS) e Quiquadrado.
Os
resultados sugerem que a classificação da SVM com kernel Linear conduz a um melhor
desempenho da classificação do que os testes KS e Quiquadrado,
dependente do valor
atribuído ao parâmetro Self C. O método SVM com kernel Linear e com valores adequados
para o parâmetro Self C pode ser uma boa escolha para identificar o tráfego Par a Par
(P2P) multimédia cifrado na Internet.
Terceiro, apresentamos uma proposta e implementação de dois classificadores baseados
na Divergência de KullbackLeibler (KL) e na Distância Euclidiana, sendo comparados
com a SVM com kernel Linear, configurado para o parâmestro Self C padrão, apresenta
reduzida
capacidade de classificar fluxos com base apenas nos tamanhos dos pacotes
em relação aos métodos KL e Distância Euclidiana. Os métodos KL e Euclidiano foram
capazes de classificar todas as aplicações testadas, destacandose
streaming e P2P, onde
para quase todos os casos foi eficiente identificálas
com alta precisão, com reduzido consumo
de recursos computacionais.Com base nos resultados obtidos, podese
concluir que
os métodos KL e Distância Euclidiana são uma alternativa à SVM, porque essas abordagens
estatísticas podem operar em tempo real e não precisam de retreinamento cada vez
que surge um novo tipo de tráfego.
Quarto, apresentamos uma proposta e implementação de um conjunto de classificadores
para o tráfego de Internet cifrado, baseados na Divergência de JensenShannon
e nas Distâncias
de Hellinger, Bhattacharyya e Wootters, sendo os respetivos resultados comparados
com os resultados obtidos com os métodos baseados na Distância Euclidiana, KL, KS e Quiquadrado.
Além disso, apresentamos uma análise qualitativa comparativa dos
métodos testados com base nos valores de Kappa e Curvas Característica de Operação do
Receptor (ROC). Os resultados sugerem valores médios de precisão acima de 90% para todos
os métodos estatísticos, classificados como “confiabilidade quase perfeita” em valores
de Kappa, com exceçãode KS. Esse resultado indica que esses métodos são opções viáveis
para a classificação de tráfego cifrado da Internet, em especial a Distância de Hellinger,
que apresentou os melhores resultados do valor de Kappa em comparaçãocom os demais
classificadores. Concluise
que os métodos estatísticos considerados podem ser precisos e
económicos em termos de consumo de recursos computacionais para classificar o tráfego
da rede.
A nossa abordagem baseouse
na classificação de tráfego de rede Internet, focando em
distâncias e divergências estatísticas. Nós mostramos que é possível classificar e obter
bons resultados com métodos estatísticos, equilibrando desempenho de classificação e
uso de recursos computacionais em termos de CPU e memória. A validação da proposta
sustenta o argumento desta tese, que propõe a implementação de métodos estatísticos
como alternativa viável à classificação de tráfego da Internet em relação aos métodos com
base no número da porta, na inspeção de carga útil e de ML.Thesis prepared at Instituto de Telecomunicações Delegação
da Covilhã and at the Department
of Computer Science of the University of Beira Interior, and submitted to the
University of Beira Interior for discussion in public session to obtain the Ph.D. Degree in
Computer Science and Engineering.
This work has been funded by Portuguese FCT/MCTES through national funds and, when
applicable, cofunded
by EU funds under the project UIDB/50008/2020, and by operation
Centro010145FEDER000019
C4
Centro
de Competências em Cloud Computing,
cofunded
by the European Regional Development Fund (ERDF/FEDER) through
the Programa Operacional Regional do Centro (Centro 2020). This work has also been
funded by CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education)
within the Ministry of Education of Brazil under a scholarship supported by the
International Cooperation Program CAPES/COFECUB Project
9090134/
2013 at the
University of Beira Interior
Toward Dynamic Social-Aware Networking Beyond Fifth Generation
The rise of the intelligent information world presents significant challenges for the telecommunication industry in meeting the service-level requirements of future applications and incorporating societal and behavioral awareness into the Internet of Things (IoT) objects. Social Digital Twins (SDTs), or Digital Twins augmented with social capabilities, have the potential to revolutionize digital transformation and meet the connectivity, computing, and storage needs of IoT devices in dynamic Fifth-Generation (5G) and Beyond Fifth-Generation (B5G) networks.
This research focuses on enabling dynamic social-aware B5G networking. The main contributions of this work include(i) the design of a reference architecture for the orchestration of SDTs at the network edge to accelerate the service discovery procedure across the Social Internet of Things (SIoT); (ii) a methodology to evaluate the highly dynamic system performance considering jointly communication and computing resources; (iii) a set of practical conclusions and outcomes helpful in designing future digital twin-enabled B5G networks. Specifically, we propose an orchestration for SDTs and an SIoT-Edge framework aligned with the Multi-access Edge Computing (MEC) architecture ratified by the European Telecommunications Standards Institute (ETSI). We formulate the optimal placement of SDTs as a Quadratic Assignment Problem (QAP) and propose a graph-based approximation scheme considering the different types of IoT devices, their social features, mobility patterns, and the limited computing resources of edge servers. We also study the appropriate intervals for re-optimizing the SDT deployment at the network edge. The results demonstrate that accounting for social features in SDT placement offers considerable improvements in the SIoT browsing procedure. Moreover, recent advancements in wireless communications, edge computing, and intelligent device technologies are expected to promote the growth of SIoT with pervasive sensing and computing capabilities, ensuring seamless connections among SIoT objects.
We then offer a performance evaluation methodology for eXtended Reality (XR) services in edge-assisted wireless networks and propose fluid approximations to characterize the XR content evolution. The approach captures the time and space dynamics of the content distribution process during its transient phase, including time-varying loads, which are affected by arrival, transition, and departure processes. We examine the effects of XR user mobility on both communication and computing patterns. The results demonstrate that communication and computing planes are the key barriers to meeting the requirement for real-time transmissions. Furthermore, due to the trend toward immersive, interactive, and contextualized experiences, new use cases affect user mobility patterns and, therefore, system performance.Cotutelle -yhteisväitöskirj
Flames: Benchmarking Value Alignment of Chinese Large Language Models
The widespread adoption of large language models (LLMs) across various
regions underscores the urgent need to evaluate their alignment with human
values. Current benchmarks, however, fall short of effectively uncovering
safety vulnerabilities in LLMs. Despite numerous models achieving high scores
and 'topping the chart' in these evaluations, there is still a significant gap
in LLMs' deeper alignment with human values and achieving genuine harmlessness.
To this end, this paper proposes the first highly adversarial benchmark named
Flames, consisting of 2,251 manually crafted prompts, ~18.7K model responses
with fine-grained annotations, and a specified scorer. Our framework
encompasses both common harmlessness principles, such as fairness, safety,
legality, and data protection, and a unique morality dimension that integrates
specific Chinese values such as harmony. Based on the framework, we carefully
design adversarial prompts that incorporate complex scenarios and jailbreaking
methods, mostly with implicit malice. By prompting mainstream LLMs with such
adversarially constructed prompts, we obtain model responses, which are then
rigorously annotated for evaluation. Our findings indicate that all the
evaluated LLMs demonstrate relatively poor performance on Flames, particularly
in the safety and fairness dimensions. Claude emerges as the best-performing
model overall, but with its harmless rate being only 63.08% while GPT-4 only
scores 39.04%. The complexity of Flames has far exceeded existing benchmarks,
setting a new challenge for contemporary LLMs and highlighting the need for
further alignment of LLMs. To efficiently evaluate new models on the benchmark,
we develop a specified scorer capable of scoring LLMs across multiple
dimensions, achieving an accuracy of 77.4%. The Flames Benchmark is publicly
available on https://github.com/AIFlames/Flames
BlockTorrent: A Privacy-Preserving Data Availability Protocol for Multiple Stakeholder Scenarios
As industries across the globe continue to digitize their processes, the need for a mechanism to share private data between multiple stakeholders is becoming increasingly apparent. However, sharing data poses challenges around privacy and accessibility, particularly in disputes between stakeholders with a shared interest, such as a supply chain. Auditors currently rely on stakeholders’ compliance in order to verify data. Malicious parties may falsify the data before passing it on to the auditor. Using supply chains as a case study we present BlockTorrent, a protocol to address these challenges and help facilitate data sharing between supply chain participants and named after the integration of Blockchain technology and the BitTorrent protocol. BlockTorrent allows participants to securely share their data in near real-time with other participants without the risk of information leakage or allowing data falsification, whilst guaranteeing data availability for auditors. This is achieved using a novel combination of distributed storage and on-chain secret sharing. This thesis provides an implementation and evaluation of BlockTorrent, highlighting its performance and a security discussion, specifically that a system like BlockTorrent can reach large transaction throughput as high as 500 tps and be viable in a real world environment. Lastly, the thesis provides a discussion on the privacy challenges that were considered when designing BlockTorrent
Towards Smart and Green Features of Cloud Computing in Healthcare Services: A Systematic Literature Review
Background: The healthcare sector has been facing multilateral challenges regarding the quality of services and access to healthcare innovations. As the population grows, the sector requires faster and more reliable services, but the opposite is true in developing countries. As a robust technology, cloud computing has numerous features and benefits that are still to be explored. The intervention of the latest technologies in healthcare is crucial to shifting toward next-generation healthcare systems. In developing countries like Ethiopia, cloud features are still far from being systematically explored to design smart and green healthcare services.
Objective: To excavate contextualized research gaps in the existing studies towards smart and green features of cloud computing in healthcare information services.
Methods: We conducted a systematic review of research publications indexed in Scopus, Web of Science, IEEE Xplore, PubMed, and ProQuest. 52 research articles were screened based on significant selection criteria and systematically reviewed. Extensive efforts have been made to rigorously review recent, contemporary, and relevant research articles.
Results: This study presented a summary of parameters, proposed solutions from the reviewed articles, and identified research gaps. These identified research gaps are related to security and privacy concerns, data repository standardization, data shareability, self-health data access control, service collaboration, energy efficiency/greenness, consolidation of health data repositories, carbon footprint, and performance evaluation.
Conclusion: The paper consolidated research gaps from multiple research investigations into a single paper, allowing researchers to develop innovative solutions for improving healthcare services. Based on a rigorous analysis of the literature, the existing systems overlooked green computing features and were highly vulnerable to security violations. Several studies reveal that security and privacy threats have been seriously hampering the exponential growth of cloud computing. 54 percent of the reviewed articles focused on security and privacy concerns.
Keywords: Cloud computing, Consolidation, Green computing, Green features, Healthcare services, Systematic literature review
Themelio: a new blockchain paradigm
Public blockchains hold great promise in building protocols that uphold security properties like transparency and consistency based on internal, incentivized cryptoeconomic mechanisms rather than preexisting trust in participants. Yet user-facing blockchain applications beyond "internal" immediate derivatives of blockchain incentive models, like cryptocurrency and decentralized finance, have not achieved widespread development or adoption.
We propose that this is not primarily due to "engineering" problems in aspects such as scaling, but due to an overall lack of transferable endogenous trust—the twofold ability to uphold strong, internally-generated security guarantees and to translate them into application-level security. Yet we argue that blockchains, due to their foundation on game-theoretic incentive models rather than trusted authorities, are uniquely suited for building transferable endogenous trust, despite their current deficiencies. We then engage in a survey of existing public blockchains and the difficulties and crises that they have faced, noting that in almost every case, problems such as governance disputes and ecosystem inflexibility stem from a lack of transferable endogenous trust.
Next, we introduce Themelio, a decentralized, public blockchain designed to support a new blockchain paradigm focused on transferable endogenous trust. Here, the blockchain is used as a low-level, stable, and simple root of trust, capable of sharing this trust with applications through scalable light clients. This contrasts with current blockchains, which are either applications or application execution platforms. We present evidence that this new paradigm is crucial to achieving flexible deployment of blockchain-based trust.
We then describe the Themelio blockchain in detail, focusing on three areas key to its overall theme of transferable, strong endogenous trust: a traditional yet enhanced UTXO model with features that allow powerful programmability and light-client composability, a novel proof-of-stake system with unique cryptoeconomic guarantees against collusion, and Themelio's unique cryptocurrency "mel", which achieves stablecoin-like low volatility without sacrificing decentralization and security.
Finally, we explore the wide variety of novel, partly off-chain applications enabled by Themelio's decoupled blockchain paradigm. This includes Astrape, a privacy-protecting off-chain micropayment network, Bitforest, a blockchain-based PKI that combines blockchain-backed security guarantees with the performance and administration benefits of traditional systems, as well as sketches of further applications
- …