323 research outputs found

    Gaining deep knowledge of Android malware families through dimensionality reduction techniques

    Get PDF
    [Abstract] This research proposes the analysis and subsequent characterisation of Android malware families by means of low dimensional visualisations using dimensional reduction techniques. The well-known Malgenome data set, coming from the Android Malware Genome Project, has been thoroughly analysed through the following six dimensionality reduction techniques: Principal Component Analysis, Maximum Likelihood Hebbian Learning, Cooperative Maximum Likelihood Hebbian Learning, Curvilinear Component Analysis, Isomap and Self Organizing Map. Results obtained enable a clear visual analysis of the structure of this high-dimensionality data set, letting us gain deep knowledge about the nature of such Android malware families. Interesting conclusions are obtained from the real-life data set under analysis

    Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: a machine learning approach

    Get PDF
    The existence of multiple subclasses of type Ia supernovae (SNeIa) has been the subject of great debate in the last decade. One major challenge inevitably met when trying to infer the existence of one or more subclasses is the time consuming, and subjective, process of subclass definition. In this work, we show how machine learning tools facilitate identification of subtypes of SNeIa through the establishment of a hierarchical group structure in the continuous space of spectral diversity formed by these objects. Using Deep Learning, we were capable of performing such identification in a 4 dimensional feature space (+1 for time evolution), while the standard Principal Component Analysis barely achieves similar results using 15 principal components. This is evidence that the progenitor system and the explosion mechanism can be described by a small number of initial physical parameters. As a proof of concept, we show that our results are in close agreement with a previously suggested classification scheme and that our proposed method can grasp the main spectral features behind the definition of such subtypes. This allows the confirmation of the velocity of lines as a first order effect in the determination of SNIa subtypes, followed by 91bg-like events. Given the expected data deluge in the forthcoming years, our proposed approach is essential to allow a quick and statistically coherent identification of SNeIa subtypes (and outliers). All tools used in this work were made publicly available in the Python package Dimensionality Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA) and can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).Comment: 16 pages, 12 figures, accepted for publication in MNRA

    Machine learning methods for the characterization and classification of complex data

    Get PDF
    This thesis work presents novel methods for the analysis and classification of medical images and, more generally, complex data. First, an unsupervised machine learning method is proposed to order anterior chamber OCT (Optical Coherence Tomography) images according to a patient's risk of developing angle-closure glaucoma. In a second study, two outlier finding techniques are proposed to improve the results of above mentioned machine learning algorithm, we also show that they are applicable to a wide variety of data, including fraud detection in credit card transactions. In a third study, the topology of the vascular network of the retina, considering it a complex tree-like network is analyzed and we show that structural differences reveal the presence of glaucoma and diabetic retinopathy. In a fourth study we use a model of a laser with optical injection that presents extreme events in its intensity time-series to evaluate machine learning methods to forecast such extreme events.El presente trabajo de tesis desarrolla nuevos métodos para el análisis y clasificación de imágenes médicas y datos complejos en general. Primero, proponemos un método de aprendizaje automático sin supervisión que ordena imágenes OCT (tomografía de coherencia óptica) de la cámara anterior del ojo en función del grado de riesgo del paciente de padecer glaucoma de ángulo cerrado. Luego, desarrollamos dos métodos de detección automática de anomalías que utilizamos para mejorar los resultados del algoritmo anterior, pero que su aplicabilidad va mucho más allá, siendo útil, incluso, para la detección automática de fraudes en transacciones de tarjetas de crédito. Mostramos también, cómo al analizar la topología de la red vascular de la retina considerándola una red compleja, podemos detectar la presencia de glaucoma y de retinopatía diabética a través de diferencias estructurales. Estudiamos también un modelo de un láser con inyección óptica que presenta eventos extremos en la serie temporal de intensidad para evaluar diferentes métodos de aprendizaje automático para predecir dichos eventos extremos.Aquesta tesi desenvolupa nous mètodes per a l’anàlisi i la classificació d’imatges mèdiques i dades complexes. Hem proposat, primer, un mètode d’aprenentatge automàtic sense supervisió que ordena imatges OCT (tomografia de coherència òptica) de la cambra anterior de l’ull en funció del grau de risc del pacient de patir glaucoma d’angle tancat. Després, hem desenvolupat dos mètodes de detecció automàtica d’anomalies que hem utilitzat per millorar els resultats de l’algoritme anterior, però que la seva aplicabilitat va molt més enllà, sent útil, fins i tot, per a la detecció automàtica de fraus en transaccions de targetes de crèdit. Mostrem també, com en analitzar la topologia de la xarxa vascular de la retina considerant-la una xarxa complexa, podem detectar la presència de glaucoma i de retinopatia diabètica a través de diferències estructurals. Finalment, hem estudiat un làser amb injecció òptica, el qual presenta esdeveniments extrems en la sèrie temporal d’intensitat. Hem avaluat diferents mètodes per tal de predir-los.Postprint (published version

    Neural manifold analysis of brain circuit dynamics in health and disease

    Get PDF
    Recent developments in experimental neuroscience make it possible to simultaneously record the activity of thousands of neurons. However, the development of analysis approaches for such large-scale neural recordings have been slower than those applicable to single-cell experiments. One approach that has gained recent popularity is neural manifold learning. This approach takes advantage of the fact that often, even though neural datasets may be very high dimensional, the dynamics of neural activity tends to traverse a much lower-dimensional space. The topological structures formed by these low-dimensional neural subspaces are referred to as “neural manifolds”, and may potentially provide insight linking neural circuit dynamics with cognitive function and behavioral performance. In this paper we review a number of linear and non-linear approaches to neural manifold learning, including principal component analysis (PCA), multi-dimensional scaling (MDS), Isomap, locally linear embedding (LLE), Laplacian eigenmaps (LEM), t-SNE, and uniform manifold approximation and projection (UMAP). We outline these methods under a common mathematical nomenclature, and compare their advantages and disadvantages with respect to their use for neural data analysis. We apply them to a number of datasets from published literature, comparing the manifolds that result from their application to hippocampal place cells, motor cortical neurons during a reaching task, and prefrontal cortical neurons during a multi-behavior task. We find that in many circumstances linear algorithms produce similar results to non-linear methods, although in particular cases where the behavioral complexity is greater, non-linear methods tend to find lower-dimensional manifolds, at the possible expense of interpretability. We demonstrate that these methods are applicable to the study of neurological disorders through simulation of a mouse model of Alzheimer’s Disease, and speculate that neural manifold analysis may help us to understand the circuit-level consequences of molecular and cellular neuropathology

    Gaining deep knowledge of Android malware families through dimensionality reduction techniques

    Get PDF
    This research proposes the analysis and subsequent characterisation of Android malware families by means of low dimensional visualisations using dimensional reduction techniques. The well-known Malgenome data set, coming from the Android Malware Genome Project, has been thoroughly analysed through the following six dimensionality reduction techniques: Principal Component Analysis, Maximum Likelihood Hebbian Learning, Cooperative Maximum Likelihood Hebbian Learning, Curvilinear Component Analysis, Isomap and Self Organizing Map. Results obtained enable a clear visual analysis of the structure of this high-dimensionality data set, letting us gain deep knowledge about the nature of such Android malware families. Interesting conclusions are obtained from the real-life data set under analysis

    Preface

    Get PDF
    DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018.DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018

    Object recognition in infrared imagery using appearance-based methods

    Get PDF
    Abstract unavailable please refer to PD

    Artificial Intelligence and Dimensionality Reduction: Tools for Approaching Future Communications

    Get PDF
    ACKNOWLEDGMENT The authors would like to thank the Fraunhofer-Heinrich- Hertz-Institut for acquiring and sharing the data associated to the rooftop and auditorium communication scenarios, the NextG Channel Model Alliance for creating a space to share public databases of propagation measurements, José Francisco Cortés-Gómez for the graphical support, Carmelo García-García for his help in the measurements acquisition, and Sohrab Vafa, Pablo Padilla and Francisco Luna-Valero for their valuable comments.This article presents a novel application of the t-distributed Stochastic Neighbor Embedding (t-SNE) clustering algorithm to the telecommunication field. t-SNE is a dimensionality reduction algorithm that allows the visualization of large dataset into a 2D plot. We present the applicability of this algorithm in a communication channel dataset formed by several scenarios (anechoic, reverberation, indoor and outdoor), and by using six channel features. Applying this artificial intelligence (AI) technique, we are able to separate different environments into several clusters allowing a clear visualization of the scenarios. Throughout the article, it is proved that t-SNE has the ability to cluster into several subclasses, obtaining internal classifications within the scenarios themselves. t-SNE comparison with different dimensionality reduction techniques (PCA, Isomap) is also provided throughout the paper. Furthermore, post-processing techniques are used to modify communication scenarios, recreating a real communication scenario from measurements acquired in an anechoic chamber. The dimensionality reduction and classification by using t-SNE and Variational AutoEncoders show good performance distinguishing between the recreation and the real communication scenario. The combination of these two techniques opens up the possibility for new scenario recreations for future mobile communications. This work shows the potential of AI as a powerful tool for clustering, classification and generation of new 5G propagation scenarios.Spanish Program of Research, Development, and Innovation under Project RTI2018-102002-A-I00Junta de Andalucía under Project B-TIC-402-UGR18 and Project P18.RT.4830Ministerio de Universidades, Gobierno de España under Predoctoral Grant FPU19/0125

    Spectral Target Detecting Using Schroedinger Eigenmaps

    Get PDF
    Applications of optical remote sensing processes include environmental monitoring, military monitoring, meteorology, mapping, surveillance, etc. Many of these tasks include the detection of specific objects or materials, usually few or small, which are surrounded by other materials that clutter the scene and hide the relevant information. This target detection process has been boosted lately by the use of hyperspectral imagery (HSI) since its high spectral dimension provides more detailed spectral information that is desirable in data exploitation. Typical spectral target detectors rely on statistical or geometric models to characterize the spectral variability of the data. However, in many cases these parametric models do not fit well HSI data that impacts the detection performance. On the other hand, non-linear transformation methods, mainly based on manifold learning algorithms, have shown a potential use in HSI transformation, dimensionality reduction and classification. In target detection, non-linear transformation algorithms are used as preprocessing techniques that transform the data to a more suitable lower dimensional space, where the statistical or geometric detectors are applied. One of these non-linear manifold methods is the Schroedinger Eigenmaps (SE) algorithm that has been introduced as a technique for semi-supervised classification. The core tool of the SE algorithm is the Schroedinger operator that includes a potential term that encodes prior information about the materials present in a scene, and enables the embedding to be steered in some convenient directions in order to cluster similar pixels together. A completely novel target detection methodology based on SE algorithm is proposed for the first time in this thesis. The proposed methodology does not just include the transformation of the data to a lower dimensional space but also includes the definition of a detector that capitalizes on the theory behind SE. The fact that target pixels and those similar pixels are clustered in a predictable region of the low-dimensional representation is used to define a decision rule that allows one to identify target pixels over the rest of pixels in a given image. In addition, a knowledge propagation scheme is used to combine spectral and spatial information as a means to propagate the \potential constraints to nearby points. The propagation scheme is introduced to reinforce weak connections and improve the separability between most of the target pixels and the background. Experiments using different HSI data sets are carried out in order to test the proposed methodology. The assessment is performed from a quantitative and qualitative point of view, and by comparing the SE-based methodology against two other detection methodologies that use linear/non-linear algorithms as transformations and the well-known Adaptive Coherence/Cosine Estimator (ACE) detector. Overall results show that the SE-based detector outperforms the other two detection methodologies, which indicates the usefulness of the SE transformation in spectral target detection problems
    corecore