7 research outputs found
RGen: Generador de datos para benchmarking de cargas de trabajo Big Data
[Resumen]
El presente Trabajo Fin de Grado (TFG) presenta el diseño e implementación de RGen, un generador
de datos paralelo para el benchmarking de cargas de trabajo Big Data. La herramienta
está desarrollada en Java bajo el paradigma de programación MapReduce, más concretamente
haciendo uso del framework de procesamiento Apache Hadoop. Además, RGen soporta la
generaciĂłn de datos directamente sobre el sistema de ficheros distribuido de Hadoop, piedra
angular del almacenamiento de los frameworks Big Data para procesamiento por lotes (batch
processing). RGen conjuga una doble labor de integraciĂłn de caracterĂsticas preexistentes y
desarrollo de nuevas funcionalidades en una herramienta independiente. El objetivo final que
se persigue es la creaciĂłn de una herramienta completa, paralela y escalable que reĂşna las
funcionalidades necesarias, sin tener que depender de software de terceros, para la generaciĂłn
de datos de las distintas cargas de trabajo soportadas en la suite de benchmarking Big
Data Evaluator (BDEv).
Las principales funcionalidades desarrolladas en este TFG son la generaciĂłn de texto y
grafos que cumplen las caracterĂsticas definidas por las 4 Vs del Big Data: Volumen, Variedad,
Velocidad y Veracidad. Se pone especial Ă©nfasis en esta Ăşltima ya que en muchos benchmarks
especĂficos la necesidad de una gran cantidad de informaciĂłn verĂdica es primordial. Para ello
se ha escogido el modelo LDA, utilizado para la extracciĂłn de tĂłpicos o temas tratados en una
serie de documentos, para la generaciĂłn de texto. Por otro lado, en cuanto a la generaciĂłn de
grafos se refiere, se realiza a partir del modelo Kronecker.
Para el desarrollo de RGen se han empleado prácticas bien asentadas en la IngenierĂa del
Software. En cuanto al diseño, se ha hecho uso de patrones de diseño y arquitecturales con
el objetivo de conseguir una herramienta fácilmente mantenible y extensible, a la vez que se
proporciona un cĂłdigo limpio y de calidad. Para facilitar la organizaciĂłn en el trabajo se ha
utilizado Scrum, marco de desarrollo ágil basado en Sprints.
Con respecto a la evaluaciĂłn del rendimiento y escalabilidad del generador de datos se ha
realizado la experimentaciĂłn tanto en un entorno local como en un clĂşster de altas prestaciones.
Para ello se han evaluado distintas configuraciones tanto en el nĂşmero de nodos como
en la cantidad de datos a generar en paralelo.
La herramienta desarrollada se encuentra disponible para su descarga en el siguiente repositorio
Git: https://github.com/rubenperez98/RGen.[Abstract]
This BSc Thesis presents the design and implementation of RGen, a parallel data generator
for benchmarking Big Data workloads. The tool is developed in Java under the MapReduce
programming paradigm, more specifically making use of the Apache Hadoop processing
framework. In addition, RGen supports the generation of data directly on the Hadoop
distributed file system, cornerstone of the storage of Big Data frameworks for batch processing.
RGen brings together a twofold task of integrating existing features and developing new
functionalities in a standalone tool. The main objective is the creation of a complete, parallel
and scalable tool that gathers the necessary functionalities without having to depend on
third-party software to generate data for the different workloads supported by the Big Data
Evaluator (BDEv) benchmarking suite.
The main functionalities developed in this BSc Thesis are the generation of text and graphs
that meet the characteristics defined by the 4 Vs of Big Data: Volume, Variety, Velocity and
Veracity. Special emphasis is placed on the last one since many specific benchmarks require
a huge amount of truthful information. On the one hand, the LDA model has been used for
text generation, which is employed for the extraction of topics or themes covered in a series
of documents. On the other hand, graphs generation is based on the Kronecker model.
RGen has been developed following well-established practices in software engineering.
Design and architectural patterns have been used with the aim of obtaining an easily maintainable
and extensible tool, while also providing clean and quality code. Scrum, an agile
development framework based on Sprints, has been used to facilitate work organization.
Regarding the performance evaluation and scalability of the data generator, multiple experiments
have been carried out both in a local environment and in a high-performance cluster.
Different configurations have been evaluated both in the number of nodes and the amount
of data to be generated in parallel.
The developed tool is publicly available to download at the following Git repository:
https://github.com/rubenperez98/RGen.Traballo fin de grao (UDC.FIC). EnxeñarĂa informática. Curso 2019/202
RGen: Data Generator for Benchmarking Big Data Workloads
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.CITIC, as Research Center accredited by Galician University System, is funded by “ConsellerĂa de Cultura, EducaciĂłn e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “SecretarĂa Xeral de Universidades (Grant ED431G 2019/01). This project was also supported by the “ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria” via the Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49 and 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431C 2021/3
Applying Artificial Intelligence for Operating System Fingerprinting
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] In the field of computer security, the possibility of knowing which specific version of an operating system is running behind a machine can be useful, to assist in a penetration test or monitor the devices connected to a specific network. One of the most widespread tools that better provides this functionality is Nmap, which follows a rule-based approach for this process. In this context, applying machine learning techniques seems to be a good option for addressing this task. The present work explores the strengths of different machine learning algorithms to perform operating system fingerprinting, using for that, the Nmap reference database. Moreover, some optimizations were applied to the method which brought the best results, random forest, obtaining an accuracy higher than 96%.CITIC, as a research center accredited by the Galician University System, is funded by “ConsellerĂa de Cultura, EducaciĂłn e Universidade from Xunta de Galicia”, supported—80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “SecretarĂa Xeral de Universidades (Grant ED431G 2019/01). This project was also supported by the “ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria” via the Consolidation and Structuring of Competitive Research Units–Competitive Reference Groups (ED431C 2018/49) and the COST Action 17124 DigForAsp, supported by COST (European Cooperation in Science and Technology, www.cost.eu, (accessed on 25 October 2021)).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/4
Address Space Layout Randomization Comparative Analysis on Windows 10 and Ubuntu 18.04 LTS
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021[Abstract] Memory management is one of the main tasks of an Operating System, where the data of each process running in the system is kept. In this context, there exist several types of attacks that exploit memory-related vulnerabilities, forcing Operating Systems to feature memory protection techniques that make difficult to exploit them. One of these techniques is ASLR, whose function is to introduce randomness into the virtual address space of a process. The goal of this work was to measure, analyze and compare the behavior of ASLR on the 64-bit versions of Windows 10 and Ubuntu 18.04 LTS. The results have shown that the implementation of ASLR has improved significantly on these two Operating Systems compared to previous versions. However, there are aspects, such as partial correlations or a frequency distribution that is not always uniform, so it can still be improved.We wish to acknowledge the support received from the Centro de InvestigaciĂłn de Galicia “CITIC”. CITIC, as Research Center accredited by Galician University System, is funded by “ConsellerĂa de Cultura, EducaciĂłn e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “SecretarĂa Xeral de Universidades” (Grant ED431G 2019/01). This work was also supported by the “ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria” via the Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49) and the COST Action 17124 DigForAsp, supported by COST (European Cooperation in Science and Technology, www.cost.eu, (accessed on 20 July 2021))Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/4
Improving Authentication in the Amazon Alexa Virtual Assistant by Using a Geofence
Cursos e Congresos , C-155[Abstract] Amazon Alexa processes voice commands as input to help users perform tasks. For
protecting this commands, Amazon Alexa implements some security measures. These security
measures, such as voice recognition and user’s PIN, do not have the ability to mitigate replay
attacks. In order to mitigate replay attacks, in this paper, we propose an authentication method
based on Geofencing, consisting of (1) an Android application and (2) an Alexa Skill. By using
the Android application, the user is able to configure a geofence near the Amazon Echo smart
speaker. The developed Alexa Skill only accepts requests when the user is within the established
geofence. This method mitigates replay attacks: an attacker could only try to use a replay attack
when the legitimate user is close to the speaker, making it unfeasibleThis work was supported by the grant ED431C 2022/46 – Competitive Reference Groups GRC – funded by: EU and ”Xunta de Galicia” (Spain). This work was also supported by CITIC, funded by ”Xunta de Galicia” through the collaboration agreement between the ”ConsellerĂa de Cultura, Educaci´on, Formaci´on Profesional e Universidades” and the Galician universities
to strengthen the research centres of the ”Sistema Universitario de Galicia” (CIGUS). Also, the work is founded by the ”Formaci´on de Profesorado Universitario” (FPU) grant from the Spanish Ministry of Universities to Marti Ëśno Rivera Dourado (Grant FPU21/04519)This work was supported by the grant ED431C 2022/46 – Competitive Reference Groups GRC – funded by: EU and ”Xunta de Galicia” (Spain). This work was also supported by CITIC, funded by ”Xunta de Galicia” through the collaboration agreement between the ”ConsellerĂa de Cultura, Educaci´on, Formaci´on Profesional e Universidades” and the Galician universities to strengthen the research centres of the ”Sistema Universitario de Galicia” (CIGUS). Also, the work is founded by the ”FormaciĂłn de Profesorado Universitario” (FPU) grant from the Spanish Ministry of Universities to Marti Ëśno Rivera Dourado (Grant FPU21/04519)
RGen: Data Generator for Benchmarking Big Data Workloads
This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021
Applying Artificial Intelligence for Operating System Fingerprinting
In the field of computer security, the possibility of knowing which specific version of an operating system is running behind a machine can be useful, to assist in a penetration test or monitor the devices connected to a specific network. One of the most widespread tools that better provides this functionality is Nmap, which follows a rule-based approach for this process. In this context, applying machine learning techniques seems to be a good option for addressing this task. The present work explores the strengths of different machine learning algorithms to perform operating system fingerprinting, using for that, the Nmap reference database. Moreover, some optimizations were applied to the method which brought the best results, random forest, obtaining an accuracy higher than 96%