7 research outputs found

    RGen: Generador de datos para benchmarking de cargas de trabajo Big Data

    Get PDF
    [Resumen] El presente Trabajo Fin de Grado (TFG) presenta el diseño e implementación de RGen, un generador de datos paralelo para el benchmarking de cargas de trabajo Big Data. La herramienta está desarrollada en Java bajo el paradigma de programación MapReduce, más concretamente haciendo uso del framework de procesamiento Apache Hadoop. Además, RGen soporta la generación de datos directamente sobre el sistema de ficheros distribuido de Hadoop, piedra angular del almacenamiento de los frameworks Big Data para procesamiento por lotes (batch processing). RGen conjuga una doble labor de integración de características preexistentes y desarrollo de nuevas funcionalidades en una herramienta independiente. El objetivo final que se persigue es la creación de una herramienta completa, paralela y escalable que reúna las funcionalidades necesarias, sin tener que depender de software de terceros, para la generación de datos de las distintas cargas de trabajo soportadas en la suite de benchmarking Big Data Evaluator (BDEv). Las principales funcionalidades desarrolladas en este TFG son la generación de texto y grafos que cumplen las características definidas por las 4 Vs del Big Data: Volumen, Variedad, Velocidad y Veracidad. Se pone especial énfasis en esta última ya que en muchos benchmarks específicos la necesidad de una gran cantidad de información verídica es primordial. Para ello se ha escogido el modelo LDA, utilizado para la extracción de tópicos o temas tratados en una serie de documentos, para la generación de texto. Por otro lado, en cuanto a la generación de grafos se refiere, se realiza a partir del modelo Kronecker. Para el desarrollo de RGen se han empleado prácticas bien asentadas en la Ingeniería del Software. En cuanto al diseño, se ha hecho uso de patrones de diseño y arquitecturales con el objetivo de conseguir una herramienta fácilmente mantenible y extensible, a la vez que se proporciona un código limpio y de calidad. Para facilitar la organización en el trabajo se ha utilizado Scrum, marco de desarrollo ágil basado en Sprints. Con respecto a la evaluación del rendimiento y escalabilidad del generador de datos se ha realizado la experimentación tanto en un entorno local como en un clúster de altas prestaciones. Para ello se han evaluado distintas configuraciones tanto en el número de nodos como en la cantidad de datos a generar en paralelo. La herramienta desarrollada se encuentra disponible para su descarga en el siguiente repositorio Git: https://github.com/rubenperez98/RGen.[Abstract] This BSc Thesis presents the design and implementation of RGen, a parallel data generator for benchmarking Big Data workloads. The tool is developed in Java under the MapReduce programming paradigm, more specifically making use of the Apache Hadoop processing framework. In addition, RGen supports the generation of data directly on the Hadoop distributed file system, cornerstone of the storage of Big Data frameworks for batch processing. RGen brings together a twofold task of integrating existing features and developing new functionalities in a standalone tool. The main objective is the creation of a complete, parallel and scalable tool that gathers the necessary functionalities without having to depend on third-party software to generate data for the different workloads supported by the Big Data Evaluator (BDEv) benchmarking suite. The main functionalities developed in this BSc Thesis are the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data: Volume, Variety, Velocity and Veracity. Special emphasis is placed on the last one since many specific benchmarks require a huge amount of truthful information. On the one hand, the LDA model has been used for text generation, which is employed for the extraction of topics or themes covered in a series of documents. On the other hand, graphs generation is based on the Kronecker model. RGen has been developed following well-established practices in software engineering. Design and architectural patterns have been used with the aim of obtaining an easily maintainable and extensible tool, while also providing clean and quality code. Scrum, an agile development framework based on Sprints, has been used to facilitate work organization. Regarding the performance evaluation and scalability of the data generator, multiple experiments have been carried out both in a local environment and in a high-performance cluster. Different configurations have been evaluated both in the number of nodes and the amount of data to be generated in parallel. The developed tool is publicly available to download at the following Git repository: https://github.com/rubenperez98/RGen.Traballo fin de grao (UDC.FIC). Enxeñaría informática. Curso 2019/202

    RGen: Data Generator for Benchmarking Big Data Workloads

    Get PDF
    Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades (Grant ED431G 2019/01). This project was also supported by the “Consellería de Cultura, Educación e Ordenación Universitaria” via the Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49 and 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431C 2021/3

    Applying Artificial Intelligence for Operating System Fingerprinting

    Get PDF
    Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] In the field of computer security, the possibility of knowing which specific version of an operating system is running behind a machine can be useful, to assist in a penetration test or monitor the devices connected to a specific network. One of the most widespread tools that better provides this functionality is Nmap, which follows a rule-based approach for this process. In this context, applying machine learning techniques seems to be a good option for addressing this task. The present work explores the strengths of different machine learning algorithms to perform operating system fingerprinting, using for that, the Nmap reference database. Moreover, some optimizations were applied to the method which brought the best results, random forest, obtaining an accuracy higher than 96%.CITIC, as a research center accredited by the Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported—80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades (Grant ED431G 2019/01). This project was also supported by the “Consellería de Cultura, Educación e Ordenación Universitaria” via the Consolidation and Structuring of Competitive Research Units–Competitive Reference Groups (ED431C 2018/49) and the COST Action 17124 DigForAsp, supported by COST (European Cooperation in Science and Technology, www.cost.eu, (accessed on 25 October 2021)).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/4

    Address Space Layout Randomization Comparative Analysis on Windows 10 and Ubuntu 18.04 LTS

    Get PDF
    Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021[Abstract] Memory management is one of the main tasks of an Operating System, where the data of each process running in the system is kept. In this context, there exist several types of attacks that exploit memory-related vulnerabilities, forcing Operating Systems to feature memory protection techniques that make difficult to exploit them. One of these techniques is ASLR, whose function is to introduce randomness into the virtual address space of a process. The goal of this work was to measure, analyze and compare the behavior of ASLR on the 64-bit versions of Windows 10 and Ubuntu 18.04 LTS. The results have shown that the implementation of ASLR has improved significantly on these two Operating Systems compared to previous versions. However, there are aspects, such as partial correlations or a frequency distribution that is not always uniform, so it can still be improved.We wish to acknowledge the support received from the Centro de Investigación de Galicia “CITIC”. CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). This work was also supported by the “Consellería de Cultura, Educación e Ordenación Universitaria” via the Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49) and the COST Action 17124 DigForAsp, supported by COST (European Cooperation in Science and Technology, www.cost.eu, (accessed on 20 July 2021))Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2018/4

    Improving Authentication in the Amazon Alexa Virtual Assistant by Using a Geofence

    Get PDF
    Cursos e Congresos , C-155[Abstract] Amazon Alexa processes voice commands as input to help users perform tasks. For protecting this commands, Amazon Alexa implements some security measures. These security measures, such as voice recognition and user’s PIN, do not have the ability to mitigate replay attacks. In order to mitigate replay attacks, in this paper, we propose an authentication method based on Geofencing, consisting of (1) an Android application and (2) an Alexa Skill. By using the Android application, the user is able to configure a geofence near the Amazon Echo smart speaker. The developed Alexa Skill only accepts requests when the user is within the established geofence. This method mitigates replay attacks: an attacker could only try to use a replay attack when the legitimate user is close to the speaker, making it unfeasibleThis work was supported by the grant ED431C 2022/46 – Competitive Reference Groups GRC – funded by: EU and ”Xunta de Galicia” (Spain). This work was also supported by CITIC, funded by ”Xunta de Galicia” through the collaboration agreement between the ”Consellería de Cultura, Educaci´on, Formaci´on Profesional e Universidades” and the Galician universities to strengthen the research centres of the ”Sistema Universitario de Galicia” (CIGUS). Also, the work is founded by the ”Formaci´on de Profesorado Universitario” (FPU) grant from the Spanish Ministry of Universities to Marti ˜no Rivera Dourado (Grant FPU21/04519)This work was supported by the grant ED431C 2022/46 – Competitive Reference Groups GRC – funded by: EU and ”Xunta de Galicia” (Spain). This work was also supported by CITIC, funded by ”Xunta de Galicia” through the collaboration agreement between the ”Consellería de Cultura, Educaci´on, Formaci´on Profesional e Universidades” and the Galician universities to strengthen the research centres of the ”Sistema Universitario de Galicia” (CIGUS). Also, the work is founded by the ”Formación de Profesorado Universitario” (FPU) grant from the Spanish Ministry of Universities to Marti ˜no Rivera Dourado (Grant FPU21/04519)

    RGen: Data Generator for Benchmarking Big Data Workloads

    No full text
    This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021

    Applying Artificial Intelligence for Operating System Fingerprinting

    No full text
    In the field of computer security, the possibility of knowing which specific version of an operating system is running behind a machine can be useful, to assist in a penetration test or monitor the devices connected to a specific network. One of the most widespread tools that better provides this functionality is Nmap, which follows a rule-based approach for this process. In this context, applying machine learning techniques seems to be a good option for addressing this task. The present work explores the strengths of different machine learning algorithms to perform operating system fingerprinting, using for that, the Nmap reference database. Moreover, some optimizations were applied to the method which brought the best results, random forest, obtaining an accuracy higher than 96%
    corecore