1,524 research outputs found
Automating the multidimensional design of data warehouses
Les experiències prèvies en l'Ă mbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament hĂbrid; això Ă©s, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procĂ©s de disseny.Com a qualsevol altre sistema, els requeriments sĂłn necessaris per garantir que el sistema desenvolupat satisfĂ les necessitats de l'usuari. A mĂ©s, essent aquest un procĂ©s de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot Ă©sser poblat amb dades de l'organitzaciĂł, i, a mĂ©s, (ii) descobrir capacitats d'anĂ lisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procĂ©s de modelatge del magatzem de dades. No obstant això, les propostes basades en un anĂ lisi dels requeriments assumeixen que aquestos sĂłn exhaustius, i no consideren que pot haver-hi informaciĂł rellevant amagada a les fonts de dades. ContrĂ riament, les propostes basades en un anĂ lisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqĂĽentment, generen massa resultats. En aquest escenari, l'automatitzaciĂł del disseny del magatzem de dades Ă©s essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar Ăşnicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A mĂ©s, l'automatitzaciĂł de la tasca allibera al dissenyador del sempre complex i costĂłs anĂ lisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anĂ lisi dels requeriments no consideren l'automatitzaciĂł del procĂ©s, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situaciĂł es dona en els mètodes hĂbrids actual, que proposen un enfocament seqĂĽencial, on l'anĂ lisi de les dades es complementa amb l'anĂ lisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a mĂ©s, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clĂ ssic, en el que els requeriments d'usuari sĂłn coneguts d'avantmĂ . Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procĂ©s des dels requeriments i, conseqĂĽentment, Ă©s capaç de treballar sobre fonts de dades semĂ nticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semĂ nticament riques. Per aquest motiu, dirigeix el procĂ©s de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semĂ nticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmĂ .Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament Ă©s mĂ©s adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments sĂłn ben coneguts d'avantmĂ i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situaciĂł Ă©s quan l'usuari no tĂ© clares les capacitats d'anĂ lisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmĂ esmorteeix la necessitat de disposar de fonts de dades semĂ nticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no sĂłn necessaris d'avantmĂ . Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version
On the mismatch between multidimensionality and SQL
ROLAP tools are intended to ease information analysis and navigation through the whole Data Warehouse. These tools automatically generate a query according to the multidimensional operations performed by the end-user, using the relational database technology to implement multidimensionality and consequently, automatically translating multidimensional operations to SQL. In this paper, we consider this automatic translation process in detail and to do so, we present an exhaustive comparative (both theoretical and practical) between the multidimensional algebra and the relational one. Firstly, we discuss about the necessity of a multidimensional algebra with regard to the relational one and later, we thoroughly study those considerations to be made to guarantee the correctness of a cube-query (an SQL query making multidimensional sense). With this aim, we analyze the multidimensional algebra expressiveness with regard to SQL pointing out the features a query must satisfy to make multidimensional sense and we also focus on those problems that can arise in a cube-query due to SQL intrinsic restrictions. The SQL translation of an isolated operation does not represent a problem, but when mixing up the modifications brought about by a set of operations in a single cube-query, some conflicts derived from SQL could emerge depending on the operations involved. Therefore, if these problems are not detected and treated appropriately, the automatic translation can retrieve unexpected results.Postprint (published version
SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries
One of the existing query recommendation strategies for unknown datasets is "by example", i.e. based on a query that the user already knows how to formulate on another dataset within a similar domain. In this paper we measure what contribution a structural analysis of the query and the datasets can bring to a recommendation strategy, to go alongside approaches that provide a semantic analysis. Here we concentrate on the case of star-shaped SPARQL queries over RDF datasets.
The illustrated strategy performs a least general generalization on the given query, computes the specializations of it that are satisfiable by the target dataset, and organizes them into a graph. It then visits the graph to recommend first the reformulated queries that reflect the original query as closely as possible. This approach does not rely upon a semantic mapping between the two datasets. An implementation as part of the SQUIRE query recommendation library is discussed
Evaluating FAIR Digital Object and Linked Data as distributed object systems
FAIR Digital Object (FDO) is an emerging concept that is highlighted by
European Open Science Cloud (EOSC) as a potential candidate for building a
ecosystem of machine-actionable research outputs. In this work we
systematically evaluate FDO and its implementations as a global distributed
object system, by using five different conceptual frameworks that cover
interoperability, middleware, FAIR principles, EOSC requirements and FDO
guidelines themself.
We compare the FDO approach with established Linked Data practices and the
existing Web architecture, and provide a brief history of the Semantic Web
while discussing why these technologies may have been difficult to adopt for
FDO purposes. We conclude with recommendations for both Linked Data and FDO
communities to further their adaptation and alignment.Comment: 40 pages, submitted to PeerJ C
Supporting authoring of adaptive hypermedia
It is well-known that students benefit from personalised attention. However, frequently
teachers are unable to provide this, most often due to time constraints. An Adaptive
Hypermedia (AH) system can offer a richer learning experience, by giving personalised
attention to students. The authoring process, however, is time consuming and cumbersome.
Our research explores the two main aspects to authoring of AH: authoring of content and
adaptive behaviour. The research proposes possible solutions, to overcome the hurdles
towards acceptance of AH in education.
Automation methods can help authors, for example, teachers could create linear lessons and
our prototype can add content alternatives for adaptation.
Creating adaptive behaviour is more complex. Rule-based systems, XML-based conditional
inclusion, Semantic Web reasoning and reusable, portable scripting in a programming
language have been proposed. These methods all require specialised knowledge. Hence
authoring of adaptive behaviour is difficult and teachers cannot be expected to create such
strategies. We investigate three ways to address this issue.
1. Reusability: We investigate limitations regarding adaptation engines, which
influence the authoring and reuse of adaptation strategies. We propose a metalanguage,
as a supplement to the existing LAG adaptation language, showing how
it can overcome such limitations.
2. Standardisation: There are no widely accepted standards for AH. The IMSLearning
Design (IMS-LD) specification has similar goals to Adaptive
Educational Hypermedia (AEH). Investigation shows that IMS-LD is more limited
in terms of adaptive behaviour, but the authoring process focuses more on learning
sequences and outcomes.
3. Visualisation: Another way is to simplify the authoring process of strategies using
a visual tool. We define a reference model and a tool, the Conceptual Adaptation
Model (CAM) and GRAPPLE Authoring Tool (GAT), which allow specification
of an adaptive course in a graphical way. A key feature is the separation between
content, strategy and adaptive course, which increases reusability compared to
approaches that combine all factors in one model
A conceptual framework and a risk management approach for interoperability between geospatial datacubes
De nos jours, nous observons un intérêt grandissant pour les bases de données géospatiales multidimensionnelles. Ces bases de données sont développées pour faciliter la prise de décisions stratégiques des organisations, et plus spécifiquement lorsqu’il s’agit de données de différentes époques et de différents niveaux de granularité. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de données géospatiales multidimensionnelles. Ces bases de données peuvent être sémantiquement hétérogènes et caractérisées par différent degrés de pertinence par rapport au contexte d’utilisation. Résoudre les problèmes sémantiques liés à l’hétérogénéité et à la différence de pertinence d’une manière transparente aux utilisateurs a été l’objectif principal de l’interopérabilité au cours des quinze dernières années. Dans ce contexte, différentes solutions ont été proposées pour traiter l’interopérabilité. Cependant, ces solutions ont adopté une approche non systématique. De plus, aucune solution pour résoudre des problèmes sémantiques spécifiques liés à l’interopérabilité entre les bases de données géospatiales multidimensionnelles n’a été trouvée. Dans cette thèse, nous supposons qu’il est possible de définir une approche qui traite ces problèmes sémantiques pour assurer l’interopérabilité entre les bases de données géospatiales multidimensionnelles. Ainsi, nous définissons tout d’abord l’interopérabilité entre ces bases de données. Ensuite, nous définissons et classifions les problèmes d’hétérogénéité sémantique qui peuvent se produire au cours d’une telle interopérabilité de différentes bases de données géospatiales multidimensionnelles. Afin de résoudre ces problèmes d’hétérogénéité sémantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents système représentant les bases de données géospatiales multidimensionnelles impliquées dans un processus d’interopérabilité. Cette communication vise à échanger de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents à prendre des décisions appropriées au cours du processus d’interopérabilité, nous évaluons un ensemble d’indicateurs de la qualité externe (fitness-for-use) des schémas et du contexte de production (ex., les métadonnées). Finalement, nous mettons en œuvre l’approche afin de montrer sa faisabilité.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility
- …