25 research outputs found

    The GalMer database: Galaxy Mergers in the Virtual Observatory

    Full text link
    We present the GalMer database, a library of galaxy merger simulations, made available to users through tools compatible with the Virtual Observatory (VO) standards adapted specially for this theoretical database. To investigate the physics of galaxy formation through hierarchical merging, it is necessary to simulate galaxy interactions varying a large number of parameters: morphological types, mass ratios, orbital configurations, etc. On one side, these simulations have to be run in a cosmological context, able to provide a large number of galaxy pairs, with boundary conditions given by the large-scale simulations, on the other side the resolution has to be high enough at galaxy scales, to provide realistic physics. The GalMer database is a library of thousands simulations of galaxy mergers at moderate spatial resolution and it is a compromise between the diversity of initial conditions and the details of underlying physics. We provide all coordinates and data of simulated particles in FITS binary tables. The main advantages of the database are VO access interfaces and value-added services which allow users to compare the results of the simulations directly to observations: stellar population modelling, dust extinction, spectra, images, visualisation using dedicated VO tools. The GalMer value-added services can be used as virtual telescope producing broadband images, 1D spectra, 3D spectral datacubes, thus making our database oriented towards the usage by observers. We present several examples of the GalMer database scientific usage obtained from the analysis of simulations and modelling their stellar population properties, including: (1) studies of the star formation efficiency in interactions; (2) creation of old counter-rotating components; (3) reshaping metallicity profiles in elliptical galaxies; (4) orbital to internal angular momentum transfer; (5) reproducing observed colour bimodality of galaxies.Comment: 15 pages, 11 figures, 10 tables accepted to A&A. Visualisation of GalMer simulations, access to snapshot files and value-added tools described in the paper are available at http://galmer.obspm.fr

    Improving knowledge about the risks of inappropriate uses of geospatial data by introducing a collaborative approach in the design of geospatial databases

    Get PDF
    La disponibilité accrue de l’information géospatiale est, de nos jours, une réalité que plusieurs organisations, et même le grand public, tentent de rentabiliser; la possibilité de réutilisation des jeux de données est désormais une alternative envisageable par les organisations compte tenu des économies de coûts qui en résulteraient. La qualité de données de ces jeux de données peut être variable et discutable selon le contexte d’utilisation. L’enjeu d’inadéquation à l’utilisation de ces données devient d’autant plus important lorsqu’il y a disparité entre les nombreuses expertises des utilisateurs finaux de la donnée géospatiale. La gestion des risques d’usages inappropriés de l’information géospatiale a fait l’objet de plusieurs recherches au cours des quinze dernières années. Dans ce contexte, plusieurs approches ont été proposées pour traiter ces risques : parmi ces approches, certaines sont préventives et d’autres sont plutôt palliatives et gèrent le risque après l'occurrence de ses conséquences; néanmoins, ces approches sont souvent basées sur des initiatives ad-hoc non systémiques. Ainsi, pendant le processus de conception de la base de données géospatiale, l’analyse de risque n’est pas toujours effectuée conformément aux principes d’ingénierie des exigences (Requirements Engineering) ni aux orientations et recommandations des normes et standards ISO. Dans cette thèse, nous émettons l'hypothèse qu’il est possible de définir une nouvelle approche préventive pour l’identification et l’analyse des risques liés à des usages inappropriés de la donnée géospatiale. Nous pensons que l’expertise et la connaissance détenues par les experts (i.e. experts en geoTI), ainsi que par les utilisateurs professionnels de la donnée géospatiale dans le cadre institutionnel de leurs fonctions (i.e. experts du domaine d'application), constituent un élément clé dans l’évaluation des risques liés aux usages inadéquats de ladite donnée, d’où l’importance d’enrichir cette connaissance. Ainsi, nous passons en revue le processus de conception des bases de données géospatiales et proposons une approche collaborative d’analyse des exigences axée sur l’utilisateur. Dans le cadre de cette approche, l’utilisateur expert et professionnel est impliqué dans un processus collaboratif favorisant l’identification a priori des cas d’usages inappropriés. Ensuite, en passant en revue la recherche en analyse de risques, nous proposons une intégration systémique du processus d’analyse de risque au processus de la conception de bases de données géospatiales et ce, via la technique Delphi. Finalement, toujours dans le cadre d’une approche collaborative, un référentiel ontologique de risque est proposé pour enrichir les connaissances sur les risques et pour diffuser cette connaissance aux concepteurs et utilisateurs finaux. L’approche est implantée sous une plateforme web pour mettre en œuvre les concepts et montrer sa faisabilité.Nowadays, the increased availability of geospatial information is a reality that many organizations, and even the general public, are trying to transform to a financial benefit. The reusability of datasets is now a viable alternative that may help organizations to achieve cost savings. The quality of these datasets may vary depending on the usage context. The issue of geospatial data misuse becomes even more important because of the disparity between the different expertises of the geospatial data end-users. Managing the risks of geospatial data misuse has been the subject of several studies over the past fifteen years. In this context, several approaches have been proposed to address these risks, namely preventive approaches and palliative approaches. However, these approaches are often based on ad-hoc initiatives. Thus, during the design process of the geospatial database, risk analysis is not always carried out in accordance neither with the principles/guidelines of requirements engineering nor with the recommendations of ISO standards. In this thesis, we suppose that it is possible to define a preventive approach for the identification and analysis of risks associated to inappropriate use of geospatial data. We believe that the expertise and knowledge held by experts and users of geospatial data are key elements for the assessment of risks of geospatial data misuse of this data. Hence, it becomes important to enrich that knowledge. Thus, we review the geospatial data design process and propose a collaborative and user-centric approach for requirements analysis. Under this approach, the user is involved in a collaborative process that helps provide an a priori identification of inappropriate use of the underlying data. Then, by reviewing research in the domain of risk analysis, we propose to systematically integrate risk analysis – using the Delphi technique – through the design of geospatial databases. Finally, still in the context of a collaborative approach, an ontological risk repository is proposed to enrich the knowledge about the risks of data misuse and to disseminate this knowledge to the design team, developers and end-users. The approach is then implemented using a web platform in order to demonstrate its feasibility and to get the concepts working within a concrete prototype

    Towards development of fuzzy spatial datacubes : fundamental concepts with example for multidimensional coastal erosion risk assessment and representation

    Get PDF
    Les systèmes actuels de base de données géodécisionnels (GeoBI) ne tiennent généralement pas compte de l'incertitude liée à l'imprécision et le flou des objets; ils supposent que les objets ont une sémantique, une géométrie et une temporalité bien définies et précises. Un exemple de cela est la représentation des zones à risque par des polygones avec des limites bien définies. Ces polygones sont créés en utilisant des agrégations d'un ensemble d'unités spatiales définies sur soit des intérêts des organismes responsables ou les divisions de recensement national. Malgré la variation spatio-temporelle des multiples critères impliqués dans l’analyse du risque, chaque polygone a une valeur unique de risque attribué de façon homogène sur l'étendue du territoire. En réalité, la valeur du risque change progressivement d'un polygone à l'autre. Le passage d'une zone à l'autre n'est donc pas bien représenté avec les modèles d’objets bien définis (crisp). Cette thèse propose des concepts fondamentaux pour le développement d'une approche combinant le paradigme GeoBI et le concept flou de considérer la présence de l’incertitude spatiale dans la représentation des zones à risque. En fin de compte, nous supposons cela devrait améliorer l’analyse du risque. Pour ce faire, un cadre conceptuel est développé pour créer un model conceptuel d’une base de donnée multidimensionnelle avec une application pour l’analyse du risque d’érosion côtier. Ensuite, une approche de la représentation des risques fondée sur la logique floue est développée pour traiter l'incertitude spatiale inhérente liée à l'imprécision et le flou des objets. Pour cela, les fonctions d'appartenance floues sont définies en basant sur l’indice de vulnérabilité qui est un composant important du risque. Au lieu de déterminer les limites bien définies entre les zones à risque, l'approche proposée permet une transition en douceur d'une zone à une autre. Les valeurs d'appartenance de plusieurs indicateurs sont ensuite agrégées basées sur la formule des risques et les règles SI-ALORS de la logique floue pour représenter les zones à risque. Ensuite, les éléments clés d'un cube de données spatiales floues sont formalisés en combinant la théorie des ensembles flous et le paradigme de GeoBI. En plus, certains opérateurs d'agrégation spatiale floue sont présentés. En résumé, la principale contribution de cette thèse se réfère de la combinaison de la théorie des ensembles flous et le paradigme de GeoBI. Cela permet l’extraction de connaissances plus compréhensibles et appropriées avec le raisonnement humain à partir de données spatiales et non-spatiales. Pour ce faire, un cadre conceptuel a été proposé sur la base de paradigme GéoBI afin de développer un cube de données spatiale floue dans le system de Spatial Online Analytical Processing (SOLAP) pour évaluer le risque de l'érosion côtière. Cela nécessite d'abord d'élaborer un cadre pour concevoir le modèle conceptuel basé sur les paramètres de risque, d'autre part, de mettre en œuvre l’objet spatial flou dans une base de données spatiales multidimensionnelle, puis l'agrégation des objets spatiaux flous pour envisager à la représentation multi-échelle des zones à risque. Pour valider l'approche proposée, elle est appliquée à la région Perce (Est du Québec, Canada) comme une étude de cas.Current Geospatial Business Intelligence (GeoBI) systems typically do not take into account the uncertainty related to vagueness and fuzziness of objects; they assume that the objects have well-defined and exact semantics, geometry, and temporality. Representation of fuzzy zones by polygons with well-defined boundaries is an example of such approximation. This thesis uses an application in Coastal Erosion Risk Analysis (CERA) to illustrate the problems. CERA polygons are created using aggregations of a set of spatial units defined by either the stakeholders’ interests or national census divisions. Despite spatiotemporal variation of the multiple criteria involved in estimating the extent of coastal erosion risk, each polygon typically has a unique value of risk attributed homogeneously across its spatial extent. In reality, risk value changes gradually within polygons and when going from one polygon to another. Therefore, the transition from one zone to another is not properly represented with crisp object models. The main objective of the present thesis is to develop a new approach combining GeoBI paradigm and fuzzy concept to consider the presence of the spatial uncertainty in the representation of risk zones. Ultimately, we assume this should improve coastal erosion risk assessment. To do so, a comprehensive GeoBI-based conceptual framework is developed with an application for Coastal Erosion Risk Assessment (CERA). Then, a fuzzy-based risk representation approach is developed to handle the inherent spatial uncertainty related to vagueness and fuzziness of objects. Fuzzy membership functions are defined by an expert-based vulnerability index. Instead of determining well-defined boundaries between risk zones, the proposed approach permits a smooth transition from one zone to another. The membership values of multiple indicators (e.g. slop and elevation of region under study, infrastructures, houses, hydrology network and so on) are then aggregated based on risk formula and Fuzzy IF-THEN rules to represent risk zones. Also, the key elements of a fuzzy spatial datacube are formally defined by combining fuzzy set theory and GeoBI paradigm. In this regard, some operators of fuzzy spatial aggregation are also formally defined. The main contribution of this study is combining fuzzy set theory and GeoBI. This makes spatial knowledge discovery more understandable with human reasoning and perception. Hence, an analytical conceptual framework was proposed based on GeoBI paradigm to develop a fuzzy spatial datacube within Spatial Online Analytical Processing (SOLAP) to assess coastal erosion risk. This necessitates developing a framework to design a conceptual model based on risk parameters, implementing fuzzy spatial objects in a spatial multi-dimensional database, and aggregating fuzzy spatial objects to deal with multi-scale representation of risk zones. To validate the proposed approach, it is applied to Perce region (Eastern Quebec, Canada) as a case study

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

    Scientific data mining, integration, and visualization

    Get PDF
    This report summarises the workshop on Scientific Data Mining, Integration and Visualization (SDMIV) held at the e-Science Institute, Edinburgh (eSI[1] ) on 24-25 October 2002, and presents a set of recommendations arising from the discussion that took place there. The aims of the workshop were threefold: (A) To inform researchers in the SDMIV communities of the infrastructural advances being made by computing initiatives, such as the Grid; (B) To feed back requirements from the SDMIV areas to those developing the computational infrastructure; and (C) To foster interaction among all these communities, since the coordinated efforts of all of them will be required to realise the potential for scientific knowledge extraction offered by e-science initiatives worldwide

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Abstract-Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Using Semantic Web technologies in the development of data warehouses: A systematic mapping

    Get PDF
    The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043

    QB4OLAP : Enabling business intelligence over semantic web data

    Get PDF
    Premio Primer puesto otorgado por la Academia Nacional de Ingeniería.The World-Wide Web was initially conceived as a repository of information tailored for human consumption. In the last decade, the idea of transforming the web into a machine-understandable web of data, has gained momentum. To this end, the World Wide Web Consortium (W3C) maintains a set of standards, referred to as the Semantic Web (SW), which allow to openly share data and metadata. Among these is the Resource Description Framework (RDF), which represents data as graphs, RDF-S and OWL to describe the data structure via ontologies or vocabularies, and SPARQL, the RDF query language. On top of the RDF data model, standards and recommendations can be built to represent data that adheres to other models. The multidimensional (MD) model views data in an n-dimensional space, usually called a data cube, composed of dimensions and facts. The former reflect the perspectives from which data are viewed, and the latter correspond to points in this space, associated with (usually) quantitative data (also known as measures). Facts can be aggregated, disaggregated, and filtered using the dimensions. This process is called Online Analytical Processing (OLAP). Despite the RDF Data Cube Vocabulary (QB) is the W3C standard to represent statistical data, which resembles MD data, it does not include key features needed for OLAP analysis, like dimension hierarchies, dimension level attributes, and aggregate functions. To enable this kind of analysis over SW data cubes, in this thesis we propose the QB4 OLAP vocabulary, an extension of QB. A problem remains, however: writing efficient analytical queries over SW data cubes requires a deep knowledge of RDF and SPARQL, unlikely to be found in typical OLAP users. We address this problem in this thesis. Our approach is based on allowing analytical users to write queries using what they know best: OLAP operations over data cubes, without dealing with SW technicalities. For this, we devised CQL, a simple, high-level query language over data cubes. Then we make use of the structural metadata provided by QB4 OLAP to translate CQL queries into SPARQL ones. We adapt general-purpose SPARQL query optimization techniques, and propose query improvement strategies to produce efficient SPARQL queries. We evaluate our implementation tailoring the well known Star-Schema benchmark, which allows us to compare our proposal against existing ones in a fair way. We show that our approach outperforms other ones. Finally, as another result, our experiments allow us to study which combinations of improvement strategies fits better to an analytical scenario.La World-Wide Web fue concebida como un repositorio de informa- ción a ser procesada y consumida por humanos. Pero en la última década ha ganado impulso la idea de transformar a la Web en una gran base de datos procesables por máquinas. Con este fin, el World Wide Web Consortium (W3C) ha establecido una serie de estándares también conocidos como estándares para la Web Semántica (WS), los cuales permiten compartir datos y metadatos en formatos abiertos. Entre estos estándares se destacan: el Resource Description Framework (RDF), un modelo de datos basado en grafos para representar datos y relaciones entre ellos, RDF-S y OWL que permiten describir la estructura y el significado de los datos por medio de ontologías o vocabu- larios, y el lenguaje de consultas SPARQL. Estos estándares pueden ser utilizados para construir representaciones de otros modelos de datos, por ejemplo datos tabulares o datos relacionales. El modelo de datos multidimensional (MD) representa a los datos dentro de un espacio n-dimensional, usualmente denominado cubo de datos, que se compone de dimensiones y hechos. Las primeras reflejan las perspectivas desde las cuales interesa analizar los datos, mientras que las segundas corresponden a puntos en este espacio n- dimensional, a los cuales se asocian valores usualmente numéricos, conocidos como medidas. Los hechos pueden ser agregados y resumidos, desagregados, y filtrados utilizando las dimensiones. Este pro- ceso es conocido como Online Analytical Processing (OLAP). Pese a que la W3C ha establecido un estándar que puede ser utilizado para publicación de datos multidimensionales, conocido como el RDF Data Cube Vocabulary (QB), éste no incluye algunos aspectos del modelo MD que son imprescindibles para realizar análisis tipo OLAP como son las jerarquías de dimensión, los atributos en los niveles de dimensión, y las funciones de agregaciónpara resumir valores de medidas. Para permitir este tipo de análisis sobre cubos en la SW, en esta tesis se propone un vocabulario que extiende el vocabulario QB denominado QB4OLAP. Sin embargo, para realizar análisis tipo OLAP en forma eficiente sobre cubos QB4OLAP es necesario un conocimiento profundo de RDF y SPARQL, los cuales distan de ser populares entre los usuarios OLAP típicos. Esta tesis también aborda este problema. Nuestro enfoque consiste en brindar un conjunto de operaciones clásicas para los usuarios OLAP, y luego realizar la traducción en forma automática de estas operaciones en consultas SPARQL. Comenzamos definiendo un lenguaje de consultas para cubos en alto nivel: Cube Query Language (CQL), y luego explotamos la metadata representada mediante QB4OLAP para realizar la traducción a SPARQL. Asimismo, mejoramos el rendimiento de las consultas obtenidas, adaptando y aplicando técnicas existentes de optimización de consultas SPARQL. Para evaluar nuestra propuesta adaptamos a los estándares de la SW el Star Schema benchmark, el cual es el estándar para la evaluación de sistemas tipo OLAP. Esto permite comparar nuestro enfoque con otras propuestas existentes, asi como evaluar el impacto de nuestras estrategias de mejoras de consultas SPARQL. De esta comparación podemos concluir que nuestro enfoque supera a otras propuestas existentes, y que nuestras técnicas de mejoras logran incrementar en 10 veces el rendimiento del sistema
    corecore