420 research outputs found

    Bio-inspired computation for big data fusion, storage, processing, learning and visualization: state of the art and future directions

    Get PDF
    This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated Research Group MATHMODE (IT1294-19), EMAITEK and ELK ARTEK programs. D. Camacho also acknowledges support from the Spanish Ministry of Science and Education under PID2020-117263GB-100 grant (FightDIS), the Comunidad Autonoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and the CHIST ERA 2017 BDSI PACMEL Project (PCI2019-103623, Spain)

    Extracción y análisis de características para identificación, agrupamiento y modificación de la fuente de imágenes generadas por dispositivos móviles

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Ingeniería del Software e Inteligencia Artificial, leída el 02/10/2017.Nowadays, digital images play an important role in our society. The presence of mobile devices with integrated cameras is growing at an unrelenting pace, resulting in the majority of digital images coming from this kind of device. Technological development not only facilitates the generation of these images, but also the malicious manipulation of them. Therefore, it is of interest to have tools that allow the device that has generated a certain digital image to be identified. The digital image source can be identified through the features that the generating device permeates it with during the creation process. In recent years most research on techniques for identifying the source has focused solely on traditional cameras. The forensic analysis techniques of digital images generated by mobile devices are therefore of particular importance since they have specific characteristics which allow for better results, and forensic techniques for digital images generated by another kind of device are often not valid. This thesis provides various contributions in two of the main research lines of forensic analysis, the field of identification techniques and the counter-forensics or attacks on these techniques. In the field of digital image source acquisition identification techniques, both closed and open scenarios are addressed. In closed scenarios, the images whose acquisition source are to be determined belong to a group of devices known a priori. Meanwhile, an open scenario is one in which the images under analysis belong to a set of devices that is not known a priori by the fo rensic analyst. In this case, the objective is not t he concrete image acquisition source identification, but their classification into groups whose images all belong to the same mobile device. The image clustering t echniques are of particular interest in real situations since in many cases the forensic analyst does not know a priori which devices have generated certain images. Firstly, techniques for identifying the device type (computer, scanner or digital camera of the mobile device) or class (make and model) of the image acquisition source in mobile devices are proposed, which are two relevant branches of forensic analysis of mobile device images. An approach based on different types of image features and Support Vector Machine as a classifier is presented. Secondly, a technique for the ident ification in open scenarios that consists of grouping digital images of mobile devices according to the acquisition source is developed, that is to say, a class-grouping of all input images is performed. The proposal is based on the combination of hierarchical grouping and flat grouping using the Sensor Pattern Noise. Lastly, in the area of att acks on forensic t echniques, topics related to the robustness of the image source identificat ion forensic techniques are addressed. For this, two new algorithms based on the sensor noise and the wavelet transform are designed, one for the destruction of t he image identity and another for its fo rgery. Results obtained by the two algorithms were compared with other tools designed for the same purpose. It is worth mentioning that the solution presented in this work requires less amount and complexity of input data than the tools to which it was compared. Finally, these identification t echniques have been included in a tool for the forensic analysis of digital images of mobile devices called Theia. Among the different branches of forensic analysis, Theia focuses mainly on the trustworthy identification of make and model of the mobile camera that generated a given image. All proposed algorithms have been implemented and integrated in Theia thus strengthening its functionality.Actualmente las imágenes digitales desempeñan un papel importante en nuestra sociedad. La presencia de dispositivos móviles con cámaras fotográficas integradas crece a un ritmo imparable, provocando que la mayoría de las imágenes digitales procedan de este tipo de dispositivos. El desarrollo tecnológico no sólo facilita la generación de estas imágenes, sino también la manipulación malintencionada de éstas. Es de interés, por tanto, contar con herramientas que permitan identificar al dispositivo que ha generado una cierta imagen digital. La fuente de una imagen digital se puede identificar a través de los rasgos que el dispositivo que la genera impregna en ella durante su proceso de creación. La mayoría de las investigaciones realizadas en los últimos años sobre técnicas de identificación de la fuente se han enfocado únicamente en las cámaras tradicionales. Las técnicas de análisis forense de imágenes generadas por dispositivos móviles cobran, pues, especial importancia, ya que éstos presentan características específicas que permiten obtener mejores resultados, no siendo válidas muchas veces además las técnicas forenses para imágenes digitales generadas por otros tipos de dispositivos. La presente Tesis aporta diversas contribuciones en dos de las principales líneas del análisis forense: el campo de las t écnicas de identificación de la fuente de adquisición de imágenes digitales y las contramedidas o at aques a est as técnicas. En el primer campo se abordan tanto los escenarios cerrados como los abiertos. En el escenario denominado cerrado las imágenes cuya fuente de adquisición hay que determinar pertenecen a un grupo de dispositivos conocidos a priori. Por su parte, un escenario abierto es aquel en el que las imágenes pertenecen a un conjunto de dispositivos que no es conocido a priori por el analista forense. En este caso el obj etivo no es la identificación concreta de la fuente de adquisición de las imágenes, sino su clasificación en grupos cuyas imágenes pertenecen todas al mismo dispositivo móvil. Las técnicas de agrupamiento de imágenes son de gran interés en situaciones reales, ya que en muchos casos el analist a forense desconoce a priori cuáles son los dispositivos que generaron las imágenes. En primer lugar se presenta una técnica para la identificación en escenarios cerrados del tipo de dispositivo (computador, escáner o cámara digital de dispositivo móvil) o la marca y modelo de la fuente en dispositivos móviles, que son dos problemáticas relevantes del análisis forense de imágenes digitales. La propuesta muestra un enfoque basado en distintos tipos de características de la imagen y en una clasificación mediante máquinas de soporte vectorial. En segundo lugar se diseña una técnica para la identificación en escenarios abiertos que consiste en el agrupamiento de imágenes digitales de dispositivos móviles según la fuente de adquisición, es decir, se realiza un agrupamiento en clases de todas las imágenes de ent rada. La propuesta combina agrupamiento jerárquico y agrupamiento plano con el uso del patrón de ruido del sensor. Por último, en el área de los ataques a las técnicas fo renses se tratan temas relacionados con la robustez de las técnicas forenses de identificación de la fuente de adquisición de imágenes. Se especifican dos algoritmos basados en el ruido del sensor y en la transformada wavelet ; el primero destruye la identidad de una imagen y el segundo falsifica la misma. Los resultados obtenidos por estos dos algoritmos se comparan con otras herramientas diseñadas para el mismo fin, observándose que la solución aquí presentada requiere de menor cantidad y complejidad de datos de entrada. Finalmente, estas técnicas de identificación han sido incluidas en una herramienta para el análisis forense de imágenes digitales de dispositivos móviles llamada Theia. Entre las diferentes ramas del análisis forense, Theia se centra principalmente en la identificación confiable de la marca y el modelo de la cámara móvil que generó una imagen dada. Todos los algoritmos desarrollados han sido implementados e integrados en Theia, reforzando así su funcionalidad.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    Trading the stock market : hybrid financial analyses and evolutionary computation

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 02-07-2014Esta tesis presenta la implementación de un innovador sistema de comercio automatizado que utiliza tres importantes análisis para determinar lugares y momentos de inversión. Para ello, este trabajo profundiza en sistemas automáticos de comercio y estudia series temporales de precios históricos pertenecientes a empresas que cotizan en el mercado bursátil. Estudiamos y clasifcamos las series temporales mediante el uso de una novedosa metodología basada en compresores de software. Este nuevo enfoque permite un estudio teórico de la formación de precios que demuestra resultados de divergencia entre precios reales de mercado y precios modelados mediante paseos aleatorios, apoyando así el desarrollo de modelos predictivos basados en el análisis de patrones históricos como los descritos en este documento. Además, esta metodología nos permite estudiar el comportamiento de series temporales de precios históricos en distintos sectores industriales mediante la búsqueda de patrones en empresas pertenecientes al mismo sector. Los resultados muestran agrupaciones que indican tendencias de mercado compartidas y ,por tanto, señalan que la inclusión de un análisis industrial puede reportar ventajas en la toma de decisiones de inversión. Comprobada la factibilidad de un sistema de predicción basado en series temporales y demostrada la existencia de tendencias macroeconómicas en las diferentes industrias, proponemos el desarrollo del sistema completo a través de diferentes etapas. Iterativamente y mediante varias aproximaciones, testeamos y analizamos las piezas que componen el sistema nal. Las primeras fases describen un sistema de comercio automatizado, basado en análisis técnico y fundamental de empresas, que presenta altos rendimientos y reduce el riesgo de pérdidas. El sistema utiliza un motor de optimización guiado por una versión modi cada de un algoritmo genético el la que presentamos operadores innovadores que proporcionan mecanismos para evitar una convergencia prematura del algoritmo y mejorar los resultados de rendimiento nales. Utilizando este mismo sistema de comercio automático proponemos técnicas de optimización novedosas en relación a uno de los problemas más característicos de estos sistemas, el tiempo de ejecución. Presentamos la paralelización del sistema de comercio automatizado mediante dos técnicas de computación paralela, computación distribuida y procesamiento grá co. Ambas arquitecturas presentan aceleraciones elevadas alcanzando los x50 y x256 respectivamente. Estápas posteriores presentan un cambio de metodologia de optimización, algoritmos genéticos por evolución gramatical, que nos permite comparar ambas estrategias e implementar características más avanzadas como reglas más complejas o la auto-generación de nuevos indicadores técnicos. Testearemos, con datos nancieros recientes, varios sistemas de comercio basados en diferentes funciones de aptitud, incluyendo una innovadora versión multi-objetivo, que nos permitirán analizar las ventajas de cada función de aptitud. Finalmente, describimos y testeamos la metodología del sistema de comercio automatizado basado en una doble capa de gramáticas evolutivas y que combina un análisis técnico, fundamental y macroeconómico en un análisis top-down híbrido. Los resultados obtenidos muestran rendimientos medios del 30% con muy pocas operaciones de perdidas.This thesis concerns to the implementation of a complex and pioneering automated trading system which uses three critical analysis to determine time-decisions and portfolios for investments. To this end, this work delves into automated trading systems and studies time series of historical prices related to companies listed in stock markets. Time series are studied using a novel methodology based on clusterings by software compressors. This new approach allows a theoretical study of price formation which shows results of divergence between market prices and prices modelled by random walks, thus supporting the implementation of predictive models based on the analysis of historical patterns. Furthermore, this methodology also provides us the tool to study behaviours of time series of historical prices from di erent industrial sectors seeking patterns among companies in the same industry. Results show clusters of companies pointing out market trends among companies developing similar activities, and suggesting a macroeconomic analysis to take advantage of investment decisions. Tested the feasibility of prediction systems based on analyses related to time series of historical prices and tested the existence of macroeconomic trends in the industries, we propose the implementation of a hybrid automated trading system through several stages which iteratively describe and test the components of the nal system. In the early stages, we implement an automated trading system based on technical and fundamental analysis of companies, it presents high returns and reducing losses. The implementation uses a methodology guided by a modi ed version of a genetic algorithm which presents novel genetic operators avoiding the premature convergence and improving nal results. Using the same automated trading system we propose novel optimization techniques related to one of the characteristic problems of these systems: the execution time. We present the parallelisation of the system using two parallel computing techniques, rst using distributed computation and, second, implementing a version for graphics processors. Both architectures achieve high speed-ups, reaching 50x and 256x respectively, thus, they present the necessary speed-ups required by systems analysing huge amount of nancial data. Subsequent stages present a transformation in the methodology, genetic algorithms for grammatical evolution, which allows us to compare the two evolutionary strategies and to implement more advanced features such as more complex rules or the self-generation of new technical indicators. In this context, we describe several automated trading system versions guided by di erent tness functions, including an innovative multi-objective version that we test with recent nancial data analysing the advantages of each tness function. Finally, we describe and test the methodology of an automated trading system based on a double layer of grammatical evolution combining technical, fundamental and macroeconomic analysis on a hybrid topdown analysis. The results show average returns of 30% with low number of negative operations.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    New internal and external validation indices for clustering in Big Data

    Get PDF
    Esta tesis, presentada como un compendio de artículos de investigación, analiza el concepto de índices de validación de clustering y aporta nuevas medidas de bondad para conjuntos de datos que podrían considerarse Big Data debido a su volumen. Además, estas medidas han sido aplicadas en proyectos reales y se propone su aplicación futura para mejorar algoritmos de clustering. El clustering es una de las técnicas de aprendizaje automático no supervisado más usada. Esta técnica nos permite agrupar datos en clusters de manera que, aquellos datos que pertenezcan al mismo cluster tienen características o atributos con valores similares, y a su vez esos datos son disimilares respecto a aquellos que pertenecen a los otros clusters. La similitud de los datos viene dada normalmente por la cercanía en el espacio, teniendo en cuenta una función de distancia. En la literatura existen los llamados índices de validación de clustering, los cuales podríamos definir como medidas para cuantificar la calidad de un resultado de clustering. Estos índices se dividen en dos tipos: índices de validación internos, que miden la calidad del clustering en base a los atributos con los que se han construido los clusters; e índices de validación externos, que son aquellos que cuantifican la calidad del clustering a partir de atributos que no han intervenido en la construcción de los clusters, y que normalmente son de tipo nominal o etiquetas. En esta memoria se proponen dos índices de validación internos para clustering basados en otros índices existentes en la literatura, que nos permiten trabajar con grandes cantidades de datos, ofreciéndonos los resultados en un tiempo razonable. Los índices propuestos han sido testeados en datasets sintéticos y comparados con otros índices de la literatura. Las conclusiones de este trabajo indican que estos índices ofrecen resultados muy prometedores frente a sus competidores. Por otro lado, se ha diseñado un nuevo índice de validación externo de clustering basado en el test estadístico chi cuadrado. Este índice permite medir la calidad del clustering basando el resultado en cómo han quedado distribuidos los clusters respecto a una etiqueta dada en la distribución. Los resultados de este índice muestran una mejora significativa frente a otros índices externos de la literatura y en datasets de diferentes dimensiones y características. Además, estos índices propuestos han sido aplicados en tres proyectos con datos reales cuyas publicaciones están incluidas en esta tesis doctoral. Para el primer proyecto se ha desarrollado una metodología para analizar el consumo eléctrico de los edificios de una smart city. Para ello, se ha realizado un análisis de clustering óptimo aplicando los índices internos mencionados anteriormente. En el segundo proyecto se ha trabajado tanto los índices internos como con los externos para realizar un análisis comparativo del mercado laboral español en dos periodos económicos distintos. Este análisis se realizó usando datos del Ministerio de Trabajo, Migraciones y Seguridad Social, y los resultados podrían tenerse en cuenta para ayudar a la toma de decisión en mejoras de políticas de empleo. En el tercer proyecto se ha trabajado con datos de los clientes de una compañía eléctrica para caracterizar los tipos de consumidores que existen. En este estudio se han analizado los patrones de consumo para que las compañías eléctricas puedan ofertar nuevas tarifas a los consumidores, y éstos puedan adaptarse a estas tarifas con el objetivo de optimizar la generación de energía eliminando los picos de consumo que existen la actualidad.This thesis, presented as a compendium of research articles, analyses the concept of clustering validation indices and provides new measures of goodness for datasets that could be considered Big Data. In addition, these measures have been applied in real projects and their future application is proposed for the improvement of clustering algorithms. Clustering is one of the most popular unsupervised machine learning techniques. This technique allows us to group data into clusters so that the instances that belong to the same cluster have characteristics or attributes with similar values, and are dissimilar to those that belong to the other clusters. The similarity of the data is normally given by the proximity in space, which is measured using a distance function. In the literature, there are so-called clustering validation indices, which can be defined as measures for the quantification of the quality of a clustering result. These indices are divided into two types: internal validation indices, which measure the quality of clustering based on the attributes with which the clusters have been built; and external validation indices, which are those that quantify the quality of clustering from attributes that have not intervened in the construction of the clusters, and that are normally of nominal type or labels. In this doctoral thesis, two internal validation indices are proposed for clustering based on other indices existing in the literature, which enable large amounts of data to be handled, and provide the results in a reasonable time. The proposed indices have been tested with synthetic datasets and compared with other indices in the literature. The conclusions of this work indicate that these indices offer very promising results in comparison with their competitors. On the other hand, a new external clustering validation index based on the chi-squared statistical test has been designed. This index enables the quality of the clustering to be measured by basing the result on how the clusters have been distributed with respect to a given label in the distribution. The results of this index show a significant improvement compared to other external indices in the literature when used with datasets of different dimensions and characteristics. In addition, these proposed indices have been applied in three projects with real data whose corresponding publications are included in this doctoral thesis. For the first project, a methodology has been developed to analyse the electrical consumption of buildings in a smart city. For this study, an optimal clustering analysis has been carried out by applying the aforementioned internal indices. In the second project, both internal and external indices have been applied in order to perform a comparative analysis of the Spanish labour market in two different economic periods. This analysis was carried out using data from the Ministry of Labour, Migration, and Social Security, and the results could be taken into account to help decision-making for the improvement of employment policies. In the third project, data from the customers of an electric company has been employed to characterise the different types of existing consumers. In this study, consumption patterns have been analysed so that electricity companies can offer new rates to consumers. Conclusions show that consumers could adapt their usage to these rates and hence the generation of energy could be optimised by eliminating the consumption peaks that currently exist

    Machine Learning Algorithms for Robotic Navigation and Perception and Embedded Implementation Techniques

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Actas do 10º Encontro Português de Computação Gráfica

    Get PDF
    Actas do 10º Encontro Portugês de Computação Gráfica, Lisboa, 1-3 de Outubro de 2001A investigação, o desenvolvimento e o ensino na área da Computação Gráfica constituem, em Portugal, uma realidade positiva e de largas tradições. O Encontro Português de Computação Gráfica (EPCG), realizado no âmbito das actividades do Grupo Português de Computação Gráfica (GPCG), tem permitido reunir regularmente, desde o 1º EPCG realizado também em Lisboa, mas no já longínquo mês de Julho de 1988, todos os que trabalham nesta área abrangente e com inúmeras aplicações. Pela primeira vez no historial destes Encontros, o 10º EPCG foi organizado em ligação estreita com as comunidades do Processamento de Imagem e da Visão por Computador, através da Associação Portuguesa de Reconhecimento de Padrões (APRP), salientando-se, assim, a acrescida colaboração, e a convergência, entre essas duas áreas e a Computação Gráfica. Este é o livro de actas deste 10º EPCG.INSATUniWebIcep PortugalMicrografAutodes

    Development of novel hybrid method and geometrical configuration-based active noise control system for circular cylinder and slat noise prediction and reduction applications

    Get PDF
    This thesis presents a study about the application of a geometrical configuration-based feedforward adaptive active noise control (ANC) system in the low-frequency range of flow-induced (aeroacoustics) noise cancellation and the investigation on the effects of different geometrical configurations on the cancellation performance in the presence of the residual noise signal magnitude (in decibel) or the average amount of cancellation (in decibel). The first motivation is that according to the literature review, the passive flow control is limited in the practical consideration and the active flow control performs better than the passive flow control, especially for the low-frequency range. Consider the principle of the active flow control is the same as the ANC technique, therefore, it is feasible to apply the ANC technique in cancelling the low-frequency range of the far-field (aeroacoustics) noise, which provides instructions on the future practical experiments. The second motivation is that we want to explore the effects of different geometrical configurations on the cancellation performance and it provides instructions on the implementation in future practical experiments. To predict the far-field (aeroacoustics) noise, the computational fluid dynamics (CFD) and the Ffowcs Williams and Hawkings (FW-H) equations are used separately for unsteady flow calculation and far-field (aeroacoustics) noise prediction. The proposed ANC system is used for the low-frequency range of the far-field (aeroacoustics) noise cancellation. Soft computing techniques and evolutionary-computing-based techniques are employed as the parameter adjustment mechanism to deal with nonlinearities existed in microphones and loudspeakers. The case study about the landing gear noise cancellation in the two-dimensional computational domain is completed. Simulation results validate the accuracy of the obtained acoustic spectrum with reasonable error because of the mesh resolution and computer capacity. It is observed that the two-dimensional approach can only predict discrete values of sound pressure level (SPL) associated with the fundamental frequency (Strouhal number) and its harmonics. Cancellation results demonstrate the cancellation capability of the proposed ANC system for the low-frequency range of far-field (aeroacoustics) noise and reflect that within the reasonable physical distance range, the cancellation performance will be better when the detector is placed closer to the secondary source in comparison with the primary source. This conclusion is the main innovative contribution of this thesis and it provides useful instructions on future practical experiments, but detailed physical distance values must be dependent on individual cases
    corecore