Search CORE

3,427 research outputs found

The potential of semantic paradigm in warehousing of big data

Author: Boris Vrdoljak
Marina Ptiček
Marko Gulić
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

An overview of decision table literature 1982-1995.

Author: Vanthienen Jan
Verhelle M
Publication venue
Publication date
Field of study

This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

Research Papers in Economics

Enabling Ubiquitous OLAP Analyses

Author: Graziani Simone <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 20/04/2018
Field of study

An OLAP analysis session is carried out as a sequence of OLAP operations applied to multidimensional cubes. At each step of a session, an operation is applied to the result of the previous step in an incremental fashion. Due to its simplicity and flexibility, OLAP is the most adopted paradigm used to explore the data stored in data warehouses. With the goal of expanding the fruition of OLAP analyses, in this thesis we touch several critical topics. We first present our contributions to deal with data extractions from service-oriented sources, which are nowadays used to provide access to many databases and analytic platforms. By addressing data extraction from these sources we make a step towards the integration of external databases into the data warehouse, thus providing richer data that can be analyzed through OLAP sessions. The second topic that we study is that of visualization of multidimensional data, which we exploit to enable OLAP on devices with limited screen and bandwidth capabilities (i.e., mobile devices). Finally, we propose solutions to obtain multidimensional schemata from unconventional sources (e.g., sensor networks), which are crucial to perform multidimensional analyses

AMS Tesi di Dottorato

05441 Abstracts Collection -- Managing and Mining Genome Information: Frontiers in Bioinformatics

Author: Blazewicz Jacek
Freytag Johann Christoph
Vingron Martin
Publication venue: Dagstuhl Seminar Proceedings. 05441 - Managing and Mining Genome Information: Frontiers in Bioinformatics
Publication date: 01/01/2006
Field of study

From 30.10.05 to 04.11.05, the Dagstuhl Seminar 05441 ``Managing and Mining Genome Information: Frontiers in Bioinformatics\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Une nouvelle approche mixte d'enrichissement de dimensions dans un schéma multidimensionnel en constellation Application à la biodiversité des oiseaux

Author: Bimonte Sandro
Faivre Bruno
Journaux Ludovic
Sautot Lucile
Publication venue: HAL CCSD
Publication date: 02/04/2015
Field of study

International audienceLes entrepôts de données (DW) et les systèmes OLAP sont des technologies d'analyse en ligne pour de grands volumes de données, basés sur les be-soins des utilisateurs. Leur succès dépend essentiellement de la phase de conception où les exigences fonctionnelles sont confrontées aux sources de données (méthodologie de conception mixte). Cependant, les méthodes de conception existantes semblent parfois inefficaces, lorsque les décideurs définissent des exi-gences fonctionnelles qui ne peuvent être déduites à partir des sources de don-nées (approche centrée sur les données), ou lorsque le décideur n'a pas intégré tous ces besoins durant la phase de conception (approche centrée sur l'utilisa-teur). Cet article propose une nouvelle méthodologie mixte d'enrichissement de schémas en constellation, où l'approche classique de conception est améliorée grâce à la fouille de données dans le but de créer de nouvelles hiérarchies au sein d'une dimension. Un prototype associé est également présenté

HAL-uB

Hal-Diderot

Automating the multidimensional design of data warehouses

Author: Romero Moral Óscar
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

SDK development for bridging heterogeneous data sources through connect bridge platform

Author: Nóbrega Filipa José Faria
Publication venue
Publication date: 14/12/2018
Field of study

Nesta dissertação apresentou-se um SDK para a criação de conectores a integrar com o CB Server, que pretende: acelerar o desenvolvimento, garantir melhores práticas e simplificar as diversas atividades e tarefas no processo de desenvolvimento. O SDK fornece uma API pública e simples, suportada por um conjunto de ferramentas, que facilitam o processo de desenvolvimento, explorando as facilidades disponibilizadas através da API. Para analisar a exatidão, viabilidade, integridade e acessibilidade da solução apresentam-se dois exemplos e casos de estudo. Através dos casos de estudo foi possível identificar uma lista de problemas, de pontos sensíveis e melhorias na solução proposta. Para avaliar a usabilidade da API, uma metodologia baseada em vários métodos de avaliação de usabilidade foi estabelecida. O múltiplo caso de estudo funciona como o principal método de avaliação, combinando vários métodos de pesquisa. O caso de estudo consiste em três fases de avaliação: um workshop, uma avaliação heurística e uma análise subjetiva. O caso de estudo envolveu três engenheiros de software (incluindo programadores e avaliadores). A metodologia aplicada gerou resultados com base num método de inspeção, testes de utilizador e entrevistas. Identificou-se não só pontos sensíveis e falhas no código-fonte, mas também problemas estruturais, de documentação e em tempo de execução, bem como problemas relacionados com a experiência do utilizador. O contexto do estudo é apresentado de modo a tirar conclusões acerca dos resultados obtidos. O trabalho futuro incluirá o desenvolvimento de novas funcionalidades. Adicionalmente, pretende-se resolver problemas encontrados na metodologia aplicada para avaliar a usabilidade da API, nomeadamente problemas e falhas no código fonte (por exemplo, validações) e problemas estruturais.In this dissertation, we present an SDK for the creation of connectors to integrate with CB Server which accelerates deployment, ensures best practices and simplifies the various activities and tasks in the development process. The SDK provides a public and simple API leveraged by a set of tools around the API developed which facilitate the development process by exploiting the API facilities. To analyse the correctness, feasibility, completeness, and accessibility of our solution, we presented two examples and case studies. From the case studies, we derived a list of issues found in our solution and a set of proposals for improvement. To evaluate the usability of the API, a methodology based on several usability evaluation methods has been established. Multiple case study works as the main evaluation method, combining several research methods. The case study consists of three evaluation phases – a hands-on workshop, a heuristic evaluation and subjective analysis. The case study involved three computer science engineers (including novice and expert developers and evaluators). The applied methodology generated insights based on an inspection method, a user test, and interviews. We identify not only problems and flaws in the source code, but also runtime, structural and documentation problems, as well as problems related to user experience. To help us draw conclusion from the results, we point out the context of the study. Future work will include the development of new functionalities. Additionally, we aim to solve problems found in the applied methodology to evaluate the usability of the API, namely problems and flaws in the source code (e.g. validations) and structural problems

Repositório Digital da Universidade da Madeira

Recommended from our members

Controversy Analysis and Detection

Author: Dori-Hacohen Shiri
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the filter bubble effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple list of 10 links . This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two themes: computational models of controversy, and controversies occurring in neighborhoods of topics. Our broad contributions are: (1) Presenting a theoretical framework for modeling controversy as contention among populations; (2) Constructing the first automated approach to detecting controversy on the web, using a KNN classifier that maps from the web to similar Wikipedia articles; and (3) Proposing a novel controversy detection in Wikipedia by employing a stacked model using a combination of link structure and similarity. We conclude this work by discussing the challenging technical, societal and ethical implications of this emerging research area and proposing avenues for future work

ScholarWorks@UMass Amherst