1,519 research outputs found

    Analysis and Knowledge Discovery by Means of Self-Organizing Maps for Gaia Data Releases

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-319-46681-1_17Versión final aceptada de: Álvarez, M.A., Dafonte, C., Garabato, D., Manteiga, M. (2016). Analysis and Knowledge Discovery by Means of Self-Organizing Maps for Gaia Data Releases. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9950. Springer, Cham. https://doi.org/10.1007/978-3-319-46681-1_17[Abstract]: A billion stars: this is the approximate amount of visible objects estimated to be observed by the Gaia satellite, representing roughly 1 % of the objects in the Galaxy. It constitutes the biggest amount of data gathered to date: by the end of the mission, the data archive will exceed 1 Petabyte. Now, in order to process this data, the Gaia mission conceived the Data Processing and Analysis Consortium, which will apply data mining techniques such as Self-Organizing Maps. This paper shows a useful technique for source clustering, focusing on the development of an advanced visualization tool based on this technique

    On-line relational and multiple relational SOM

    No full text
    International audienceIn some applications and in order to address real-world situations better, data may be more complex than simple numerical vectors. In some examples, data can be known only through their pairwise dissimilarities or through multiple dissimilarities, each of them describing a particular feature of the data set. Several variants of the Self Organizing Map (SOM) algorithm were introduced to generalize the original algorithm to the framework of dissimilarity data. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual linear combination of all elements in the data set, referring to a pseudo-euclidean framework. In the present article, an on-line version of relational SOM is introduced and studied. Similarly to the situation in the Euclidean framework, this on-line algorithm provides a better organization and is much less sensible to prototype initialization than standard (batch) relational SOM. In a more general case, this stochastic version allows us to integrate an additional stochastic gradient descent step in the algorithm which can tune the respective weights of several dissimilarities in an optimal way: the resulting \emph{multiple relational SOM} thus has the ability to integrate several sources of data of different types, or to make a consensus between several dissimilarities describing the same data. The algorithms introduced in this manuscript are tested on several data sets, including categorical data and graphs. On-line relational SOM is currently available in the R package SOMbrero that can be downloaded at http://sombrero.r-forge.r-project.org or directly tested on its Web User Interface at http://shiny.nathalievilla.org/sombrero

    Classification of Ground-Truth Fire Debris Samples Using Neural Networks

    Get PDF
    Fire debris samples are currently analyzed according to ASTM E1618-19, which is the Standard Test Method for Ignitable Liquid Residues in Extracts from Fire Debris Samples by Gas Chromatography-Mass Spectrometry. This method requires that an analyst make a visual comparison to an appropriate reference sample using the total ion and the extracted ion chromatograms. The analyst then provides an opinion about whether an ignitable liquid residue is present in the sample. The method is inherently subjective due to the visual interpretation that is needed. In order to automate this process, this work uses neural networks and a subset of the ions specified in ASTM E1618-19, which represent many of the compounds present in ignitable liquids, to cluster and classify ground-truth fire debris samples. The first part of this work demonstrates that these ions provide sufficient information to allow for the clustering of the ignitable liquid classes defined in ASTM E1618-19 and substrate pyrolysis extracts using self-organizing maps. Classification using self-organizing maps resulted in a 96% correct classification rate on an independent test set. The latter portion of this work demonstrates the use of the ASTM ions in conjunction with feedforward neural networks to evaluate laboratory prepared ground-truth fire debris samples. An optimal neural network model was selected from a set of candidate models that were trained on in-silico fire debris samples. Receiver operating characteristic curves were used to select an optimal decision threshold for classifying a fire debris sample as positive or negative for ignitable liquid residues using a false positive to false negative cost ratio of 10. The use of this threshold for classification resulted in a somewhat conservative model with a true positive rate of 0.59 and a false positive rate of 0.07 for a set of laboratory-generated ground-truth fire debris samples

    Low Cost Automated Security Audit System

    Get PDF
    Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] In recent years, a quick transition towards digitization has been observed in most organizations. Along with it, certain inherent problems have appeared, such as the increase in cyber threats. Large organizations are able to adapt easily, but this does not happen with small and medium-sized companies. Currently, there are very few solutions aimed at fulfilling the needs of these small enterprises, so we have worked on a tool for them. Our tool is capable of displaying key, easy-to-interpret information related to each organization’s network assets. To achieve this, we used passive and active analysis techniques and successfully evaluated the viability of using machine learning techniques to get more meaningful information. All of the information obtained is displayed in a simple web application, which is designed to be used by managers in organizations without them needing to handle complex concepts and vocabulary.CITIC, as a Research Center accredited by the Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades (Grant ED431G 2019/01). This work was also funded by the research consolidation grant ED431B 2021/36, Art.83 collaboration F19/17, the Ministry of Economy and Competitiveness of Spain, and the FEDER funds of the European Union (Project PID2019-111388GB-I00)Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431B 2021/3

    Neural Networks for Hyperspectral Imaging of Historical Paintings: A Practical Review

    Get PDF
    Hyperspectral imaging (HSI) has become widely used in cultural heritage (CH). This very efficient method for artwork analysis is connected with the generation of large amounts of spectral data. The effective processing of such heavy spectral datasets remains an active research area. Along with the firmly established statistical and multivariate analysis methods, neural networks (NNs) represent a promising alternative in the field of CH. Over the last five years, the application of NNs for pigment identification and classification based on HSI datasets has drastically expanded due to the flexibility of the types of data they can process, and their superior ability to extract structures contained in the raw spectral data. This review provides an exhaustive analysis of the literature related to NNs applied for HSI data in the CH field. We outline the existing data processing workflows and propose a comprehensive comparison of the applications and limitations of the various input dataset preparation methods and NN architectures. By leveraging NN strategies in CH, the paper contributes to a wider and more systematic application of this novel data analysis method

    k-Means

    Get PDF
    The k-means clustering algorithm (k-means for short) provides a method offinding structure in input examples. It is also called the Lloyd–Forgy algorithm as it was independently introduced by both Stuart Lloyd and Edward Forgy. k-means, like other algorithms you will study in this part of the book, is an unsupervised learning algorithm and, as such, does not require labels associated with input examples. Recall that unsupervised learning algorithms provide a way of discovering some inherent structure in the input examples. This is in contrast with supervised learning algorithms, which require input examples and associated labels so as to fit a hypothesis function that maps input examples to one or more output variables

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
    corecore