49 research outputs found

    The MUSE-Wide Survey: A first catalogue of 831 emission line galaxies

    Get PDF
    We present a first instalment of the MUSE-Wide survey, covering an area of 22.2 arcmin2^2 (corresponding to ∌\sim20% of the final survey) in the CANDELS/Deep area of the Chandra Deep Field South. We use the MUSE integral field spectrograph at the ESO VLT to conduct a full-area spectroscopic mapping at a depth of 1h exposure time per 1 arcmin2^2 pointing. We searched for compact emission line objects using our newly developed LSDCat software based on a 3-D matched filtering approach, followed by interactive classification and redshift measurement of the sources. Our catalogue contains 831 distinct emission line galaxies with redshifts ranging from 0.04 to 6. Roughly one third (237) of the emission line sources are Lyman α\alpha emitting galaxies with 3<z<63 < z < 6, only four of which had previously measured spectroscopic redshifts. At lower redshifts 351 galaxies are detected primarily by their [OII] emission line (0.3â‰Čzâ‰Č1.50.3 \lesssim z \lesssim 1.5), 189 by their [OIII] line (0.21â‰Čzâ‰Č0.850.21 \lesssim z \lesssim 0.85), and 46 by their Hα\alpha line (0.04â‰Čzâ‰Č0.420.04 \lesssim z \lesssim 0.42). Comparing our spectroscopic redshifts to photometric redshift estimates from the literature, we find excellent agreement for z<1.5z<1.5 with a median Δz\Delta z of only ∌4×10−4\sim 4 \times 10^{-4} and an outlier rate of 6%, however a significant systematic offset of Δz=0.26\Delta z = 0.26 and an outlier rate of 23% for Lyα\alpha emitters at z>3z>3. Together with the catalogue we also release 1D PSF-weighted extracted spectra and small 3D datacubes centred on each of the 831 sources.Comment: 24 pages, 14 figures, accepted for publication in A&A, data products are available for download from http://muse-vlt.eu/science/muse-wide-survey/ and later via the CD

    LSDCat: Detection and cataloguing of emission-line sources in integral-field spectroscopy datacubes

    Full text link
    We present a robust, efficient, and user-friendly algorithm for detecting faint emission-line sources in large integral-field spectroscopic datacubes together with the public release of the software package LSDCat (Line Source Detection and Cataloguing). LSDCat uses a 3-dimensional matched filter approach, combined with thresholding in signal-to-noise, to build a catalogue of individual line detections. In a second pass, the detected lines are grouped into distinct objects, and positions, spatial extents, and fluxes of the detected lines are determined. LSDCat requires only a small number of input parameters, and we provide guidelines for choosing appropriate values. The software is coded in Python and capable to process very large datacubes in a short time. We verify the implementation with a source insertion and recovery experiment utilising a real datacube taken with the MUSE instrument at the ESO Very Large Telescope.Comment: 14 pages. Accepted for publication in Astronomy & Astrophysics. The LSDCat software is available at https://bitbucket.org/Knusper2000/lsdcat, v2 corrected typos and language editin

    Cubes convexes

    Full text link
    In various approaches, data cubes are pre-computed in order to answer efficiently OLAP queries. The notion of data cube has been declined in various ways: iceberg cubes, range cubes or differential cubes. In this paper, we introduce the concept of convex cube which captures all the tuples of a datacube satisfying a constraint combination. It can be represented in a very compact way in order to optimize both computation time and required storage space. The convex cube is not an additional structure appended to the list of cube variants but we propose it as a unifying structure that we use to characterize, in a simple, sound and homogeneous way, the other quoted types of cubes. Finally, we introduce the concept of emerging cube which captures the significant trend inversions. characterizations

    Developing a model and a language to identify and specify the integrity constraints in spatial datacubes

    Get PDF
    La qualité des données dans les cubes de données spatiales est importante étant donné que ces données sont utilisées comme base pour la prise de décision dans les grandes organisations. En effet, une mauvaise qualité de données dans ces cubes pourrait nous conduire à une mauvaise prise de décision. Les contraintes d'intégrité jouent un rÎle clé pour améliorer la cohérence logique de toute base de données, l'un des principaux éléments de la qualité des données. Différents modÚles de cubes de données spatiales ont été proposés ces derniÚres années mais aucun n'inclut explicitement les contraintes d'intégrité. En conséquence, les contraintes d'intégrité de cubes de données spatiales sont traitées de façon non-systématique, pragmatique, ce qui rend inefficace le processus de vérification de la cohérence des données dans les cubes de données spatiales. Cette thÚse fournit un cadre théorique pour identifier les contraintes d'intégrité dans les cubes de données spatiales ainsi qu'un langage formel pour les spécifier. Pour ce faire, nous avons d'abord proposé un modÚle formel pour les cubes de données spatiales qui en décrit les différentes composantes. En nous basant sur ce modÚle, nous avons ensuite identifié et catégorisé les différents types de contraintes d'intégrité dans les cubes de données spatiales. En outre, puisque les cubes de données spatiales contiennent typiquement à la fois des données spatiales et temporelles, nous avons proposé une classification des contraintes d'intégrité des bases de données traitant de l'espace et du temps. Ensuite, nous avons présenté un langage formel pour spécifier les contraintes d'intégrité des cubes de données spatiales. Ce langage est basé sur un langage naturel contrÎlé et hybride avec des pictogrammes. Plusieurs exemples de contraintes d'intégrité des cubes de données spatiales sont définis en utilisant ce langage. Les designers de cubes de données spatiales (analystes) peuvent utiliser le cadre proposé pour identifier les contraintes d'intégrité et les spécifier au stade de la conception des cubes de données spatiales. D'autre part, le langage formel proposé pour spécifier des contraintes d'intégrité est proche de la façon dont les utilisateurs finaux expriment leurs contraintes d'intégrité. Par conséquent, en utilisant ce langage, les utilisateurs finaux peuvent vérifier et valider les contraintes d'intégrité définies par l'analyste au stade de la conception

    Discovering correlated parameters in Semiconductor Manufacturing processes: a Data Mining approach

    Get PDF
    International audienceData mining tools are nowadays becoming more and more popular in the semiconductor manufacturing industry, and especially in yield-oriented enhancement techniques. This is because conventional approaches fail to extract hidden relationships between numerous complex process control parameters. In order to highlight correlations between such parameters, we propose in this paper a complete knowledge discovery in databases (KDD) model. The mining heart of the model uses a new method derived from association rules programming, and is based on two concepts: decision correlation rules and contingency vectors. The first concept results from a cross fertilization between correlation and decision rules. It enables relevant links to be highlighted between sets of values of a relation and the values of sets of targets belonging to the same relation. Decision correlation rules are built on the twofold basis of the chi-squared measure and of the support of the extracted values. Due to the very nature of the problem, levelwise algorithms only allow extraction of results with long execution times and huge memory occupation. To offset these two problems, we propose an algorithm based both on the lectic order and contingency vectors, an alternate representation of contingency tables. This algorithm is the basis of our KDD model software, called MineCor. An overall presentation of its other functions, of some significant experimental results, and of associated performances are provided and discussed

    Contributions Ă  l’Optimisation de RequĂȘtes Multidimensionnelles

    Get PDF
    Analyser les donnĂ©es consiste Ă  choisir un sous-ensemble des dimensions qui les dĂ©criventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intĂ©ressantes". L'analyse se transforme alors en une activitĂ© exploratoire oĂč chaque passe traduit par une requĂȘte. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requĂȘtes qui ont une vision globale du processus plutĂŽt que de chercher Ă  optimiser chaque requĂȘteindĂ©pendamment les unes des autres. Nous prĂ©sentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requĂȘtes: (i) le calcul de bordures,(ii) les requĂȘtes dites OLAP (On Line Analytical Processing) dans les cubes de donnĂ©es et (iii) les requĂȘtesde prĂ©fĂ©rence type skyline

    Interactive exploration of millions of healthcare records in Brazil

    Get PDF
    A anĂĄlise de dados de saĂșde Ă© desafiadora devido ao seu grande volume, complexidade e heterogeneidade. TĂ©cnicas de visualização de dados interativos sĂŁo indispensĂĄveis para auxiliar a anĂĄlise desses grandes sistemas de saĂșde. Nesta dissertação, propomos estudos de caso desenvolvidos a partir de dados do SUS, o Sistema Único de SaĂșde, um dos maiores sistemas pĂșblicos de saĂșde do mundo. Apresentamos protĂłtipos de anĂĄlise visual em uma estrutura de cubos de dados de Ășltima geração que oferece suporte Ă  exploração visual interativa de milhĂ”es de registros. Demonstramos como a exploração de dados fornecida por nossos protĂłtipos pode auxiliar as tarefas essenciais na anĂĄlise de grandes dados de assistĂȘncia mĂ©dica, incluindo dados da COVID-19 no Brasil.The analysis of healthcare data is challenging due to its large volume, complexity, and heterogeneity. Interactive data visualization techniques are indispensable to support the analysis of such large healthcare systems. In this dissertation, we propose case studies developed for data from SUS, the Brazilian Unified Healthcare System, one of the largest public healthcare systems in the world. We present visual analytics prototypes on a state of-the-art datacube structure that supports the interactive visual exploration of millions of records. We demonstrate how the data exploration provided by our prototypes can help the essential tasks in analyzing big healthcare data, including data from COVID-19 in Brazil

    The Need for Accurate Pre-processing and Data Integration for the Application of Hyperspectral Imaging in Mineral Exploration

    Get PDF
    Die hyperspektrale Bildgebung stellt eine SchlĂŒsseltechnologie in der nicht-invasiven Mineralanalyse dar, sei es im Labormaßstab oder als fernerkundliche Methode. Rasante Entwicklungen im Sensordesign und in der Computertechnik hinsichtlich Miniaturisierung, Bildauflösung und DatenqualitĂ€t ermöglichen neue Einsatzgebiete in der Erkundung mineralischer Rohstoffe, wie die drohnen-gestĂŒtzte Datenaufnahme oder digitale Aufschluss- und Bohrkernkartierung. AllgemeingĂŒltige Datenverarbeitungsroutinen fehlen jedoch meist und erschweren die Etablierung dieser vielversprechenden AnsĂ€tze. Besondere Herausforderungen bestehen hinsichtlich notwendiger radiometrischer und geometrischer Datenkorrekturen, der rĂ€umlichen Georeferenzierung sowie der Integration mit anderen Datenquellen. Die vorliegende Arbeit beschreibt innovative ArbeitsablĂ€ufe zur Lösung dieser Problemstellungen und demonstriert die Wichtigkeit der einzelnen Schritte. Sie zeigt das Potenzial entsprechend prozessierter spektraler Bilddaten fĂŒr komplexe Aufgaben in Mineralexploration und Geowissenschaften.Hyperspectral imaging (HSI) is one of the key technologies in current non-invasive material analysis. Recent developments in sensor design and computer technology allow the acquisition and processing of high spectral and spatial resolution datasets. In contrast to active spectroscopic approaches such as X-ray fluorescence or laser-induced breakdown spectroscopy, passive hyperspectral reflectance measurements in the visible and infrared parts of the electromagnetic spectrum are considered rapid, non-destructive, and safe. Compared to true color or multi-spectral imagery, a much larger range and even small compositional changes of substances can be differentiated and analyzed. Applications of hyperspectral reflectance imaging can be found in a wide range of scientific and industrial fields, especially when physically inaccessible or sensitive samples and processes need to be analyzed. In geosciences, this method offers a possibility to obtain spatially continuous compositional information of samples, outcrops, or regions that might be otherwise inaccessible or too large, dangerous, or environmentally valuable for a traditional exploration at reasonable expenditure. Depending on the spectral range and resolution of the deployed sensor, HSI can provide information about the distribution of rock-forming and alteration minerals, specific chemical compounds and ions. Traditional operational applications comprise space-, airborne, and lab-scale measurements with a usually (near-)nadir viewing angle. The diversity of available sensors, in particular the ongoing miniaturization, enables their usage from a wide range of distances and viewing angles on a large variety of platforms. Many recent approaches focus on the application of hyperspectral sensors in an intermediate to close sensor-target distance (one to several hundred meters) between airborne and lab-scale, usually implying exceptional acquisition parameters. These comprise unusual viewing angles as for the imaging of vertical targets, specific geometric and radiometric distortions associated with the deployment of small moving platforms such as unmanned aerial systems (UAS), or extreme size and complexity of data created by large imaging campaigns. Accurate geometric and radiometric data corrections using established methods is often not possible. Another important challenge results from the overall variety of spatial scales, sensors, and viewing angles, which often impedes a combined interpretation of datasets, such as in a 2D geographic information system (GIS). Recent studies mostly referred to work with at least partly uncorrected data that is not able to set the results in a meaningful spatial context. These major unsolved challenges of hyperspectral imaging in mineral exploration initiated the motivation for this work. The core aim is the development of tools that bridge data acquisition and interpretation, by providing full image processing workflows from the acquisition of raw data in the field or lab, to fully corrected, validated and spatially registered at-target reflectance datasets, which are valuable for subsequent spectral analysis, image classification, or fusion in different operational environments at multiple scales. I focus on promising emerging HSI approaches, i.e.: (1) the use of lightweight UAS platforms, (2) mapping of inaccessible vertical outcrops, sometimes at up to several kilometers distance, (3) multi-sensor integration for versatile sample analysis in the near-field or lab-scale, and (4) the combination of reflectance HSI with other spectroscopic methods such as photoluminescence (PL) spectroscopy for the characterization of valuable elements in low-grade ores. In each topic, the state of the art is analyzed, tailored workflows are developed to meet key challenges and the potential of the resulting dataset is showcased on prominent mineral exploration related examples. Combined in a Python toolbox, the developed workflows aim to be versatile in regard to utilized sensors and desired applications

    Visualization of large amounts of multidimensional multivariate business-oriented data

    Get PDF
    Many large businesses store large amounts of business-oriented data in data warehouses. These data warehouses contain fact tables, which themselves contain rows representing business events, such as an individual sale or delivery. This data contains multiple dimensions (independent variables that are categorical) and very often also contains multiple measures (dĂ©pendent variables that are usually continuous), which makes it complex for casual business users to analyze and visualize. We propose two techniques, GPLOM and VisReduce, that respectively handle the visualization front-end of complex datasets and the back-end processing necessary to visualize large datasets. Scatterplot matrices (SPLOMs), parallel coordinates, and glyphs can all be used to visualize the multiple measures in multidimensional multivariate data. However, these techniques are not well suited to visualizing many dimensions. To visualize multiple dimensions, “hierarchical axes” that “stack dimensions” have been used in systems like Polaris and Tableau. However, this approach does not scale well beyond a small number of dimensions. Emerson et al. (2013) extend the matrix paradigm of the SPLOM to simultaneously visualize several categorical and continuous variables, displaying many kinds of charts in the matrix depending on the kinds of variables involved. We propose a variant of their technique, called the Generalized Plot Matrix (GPLOM). The GPLOM restricts Emerson et al. (2013)’s technique to only three kinds of charts (scatterplots for pairs of continuous variables, heatmaps for pairs of categorical variables, and barcharts for pairings of categorical and continuous variable), in an effort to make it easier to understand by casual business users. At the same time, the GPLOM extends Emerson et al. (2013)’s work by demonstrating interactive techniques suited to the matrix of charts. We discuss the visual design and interactive features of our GPLOM prototype, including a textual search feature allowing users to quickly locate values or variables by name. We also present a user study that compared performance with Tableau and our GPLOM prototype, that found that GPLOM is significantly faster in certain cases, and not significantly slower in other cases. Also, performance and responsiveness of visual analytics systems for exploratory data analysis of large datasets has been a long standing problem, which GPLOM also encounters. We propose a method called VisReduce that incrementally computes visualizations in a distributed fashion by combining a modified MapReduce-style algorithm with a compressed columnar data store, resulting in significant improvements in performance and responsiveness for constructing commonly encountered information visualizations, e.g., bar charts, scatterplots, heat maps, cartograms and parallel coordinate plots. We compare our method with one that queries three other readily available database and data warehouse systems — PostgreSQL, Cloudera Impala and the MapReduce-based Apache Hive — in order to build visualizations. We show that VisReduce’s end-to-end approach allows for greater speed and guaranteed end-user responsiveness, even in the face of large, long-running queries

    Earth Observation Open Science and Innovation

    Get PDF
    geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
    corecore