681 research outputs found

    Big Data Guided Resources Businesses – Leveraging Location Analytics and Managing Geospatial-temporal Knowledge

    Get PDF
    Location data rapidly grow with fast-changing logistics and business rules. Due to fast-growing business ventures and their diverse operations locally and globally, location-based information systems are in demand in resource industries. Data sources in these industries are spatial-temporal, with petabytes in size. Managing volumes and various data in periodic and geographic dimensions using the existing modelling methods is challenging. The current relational database models have implementation challenges, including the interpretation of data views. Multidimensional models are articulated to integrate resource databases with spatial-temporal attribute dimensions. Location and periodic attribute dimensions are incorporated into various schemas to minimise ambiguity during database operations, ensuring resource data's uniqueness and monotonic characteristics. We develop an integrated framework compatible with the multidimensional repository and implement its metadata in resource industries. The resources’ metadata with spatial-temporal attributes enables business research analysts a scope for data views’ interpretation in new geospatial knowledge domains for financial decision support

    A quality-aware spatial data warehouse for querying hydroecological data

    Get PDF
    International audienceAddressing data quality issues in information systems remains a challenging task. Many approaches only tackle this issue at the extract, transform and load steps. Here we define a comprehensive method to gain greater insight into data quality characteristics within data warehouse. Our novel architecture was implemented for an hydroecological case study where massive French watercourse sampling data are collected. The method models and makes effective use of spatial, thematic and temporal accuracy, consistency and completeness for multidimensional data in order to offer analysts a âdata qualityâ oriented framework. The results obtained in experiments carried out on the Saône River dataset demonstrated the relevance of our approac

    Automating the multidimensional design of data warehouses

    Get PDF
    Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version

    Easier surveillance of climate-related health vulnerabilities through a Web-based spatial OLAP application

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Climate change has a significant impact on population health. Population vulnerabilities depend on several determinants of different types, including biological, psychological, environmental, social and economic ones. Surveillance of climate-related health vulnerabilities must take into account these different factors, their interdependence, as well as their inherent spatial and temporal aspects on several scales, for informed analyses. Currently used technology includes commercial off-the-shelf Geographic Information Systems (GIS) and Database Management Systems with spatial extensions. It has been widely recognized that such OLTP (On-Line Transaction Processing) systems were not designed to support complex, multi-temporal and multi-scale analysis as required above. On-Line Analytical Processing (OLAP) is central to the field known as BI (Business Intelligence), a key field for such decision-support systems. In the last few years, we have seen a few projects that combine OLAP and GIS to improve spatio-temporal analysis and geographic knowledge discovery. This has given rise to SOLAP (Spatial OLAP) and a new research area. This paper presents how SOLAP and climate-related health vulnerability data were investigated and combined to facilitate surveillance.</p> <p>Results</p> <p>Based on recent spatial decision-support technologies, this paper presents a spatio-temporal web-based application that goes beyond GIS applications with regard to speed, ease of use, and interactive analysis capabilities. It supports the multi-scale exploration and analysis of integrated socio-economic, health and environmental geospatial data over several periods. This project was meant to validate the potential of recent technologies to contribute to a better understanding of the interactions between public health and climate change, and to facilitate future decision-making by public health agencies and municipalities in Canada and elsewhere. The project also aimed at integrating an initial collection of geo-referenced multi-scale indicators that were identified by Canadian specialists and end-users as relevant for the surveillance of the public health impacts of climate change. This system was developed in a multidisciplinary context involving researchers, policy makers and practitioners, using BI and web-mapping concepts (more particularly SOLAP technologies), while exploring new solutions for frequent automatic updating of data and for providing contextual warnings for users (to minimize the risk of data misinterpretation). According to the project participants, the final system succeeds in facilitating surveillance activities in a way not achievable with today's GIS. Regarding the experiments on frequent automatic updating and contextual user warnings, the results obtained indicate that these are meaningful and achievable goals but they still require research and development for their successful implementation in the context of surveillance and multiple organizations.</p> <p>Conclusion</p> <p>Surveillance of climate-related health vulnerabilities may be more efficiently supported using a combination of BI and GIS concepts, and more specifically, SOLAP technologies (in that it facilitates and accelerates multi-scale spatial and temporal analysis to a point where a user can maintain an uninterrupted train of thought by focussing on "what" she/he wants (not on "how" to get it) and always obtain instant answers, including to the most complex queries that take minutes or hours with OLTP systems (e.g., aggregated, temporal, comparative)). The developed system respects Newell's cognitive band of 10 seconds when performing knowledge discovery (exploring data, looking for hypotheses, validating models). The developed system provides new operators for easily and rapidly exploring multidimensional data at different levels of granularity, for different regions and epochs, and for visualizing the results in synchronized maps, tables and charts. It is naturally adapted to deal with multiscale indicators such as those used in the surveillance community, as confirmed by this project's end-users.</p

    Multidimensional query recommendation

    Get PDF
    In this master thesis we will first summarize the recent efforts to support analytical tasks over relational sources. These efforts have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decision oriented processes (such as query recommendation or similar). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. This thesis discusses how a SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e. collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical connections. Afterwards, we talk about how this characterization can be exploited in a wide range of decisional tasks such as query recommendation and others. Finally, we present an implementation example of this approach with limiting it with normalization phase because it surprisingly turned out that it is hard enough to achieve with regards to time available. This implementation may later be upgraded to demonstrate the full potential of this novel approach. 1.3 Scop
    • …
    corecore