118 research outputs found
Adding semantic modules to improve goal-oriented analysis of data warehouses using I-star
The success rate of data warehouse (DW) development is improved by performing a requirements elicitation stage in which the users’ needs are modeled. Currently, among the different proposals for modeling requirements, there is a special focus on goal-oriented models, and in particular on the i* framework. In order to adapt this framework for DW development, we previously developed a UML profile for DWs. However, as the general i* framework, the proposal lacks modularity. This has a specially negative impact for DW development, since DW requirement models tend to include a huge number of elements with crossed relationships between them. In turn, the readability of the models is decreased, harming their utility and increasing the error rate and development time. In this paper, we propose an extension of our i* profile for DWs considering the modularization of goals. We provide a set of guidelines in order to correctly apply our proposal. Furthermore, we have performed an experiment in order to assess the validity our proposal. The benefits of our proposal are an increase in the modularity and scalability of the models which, in turn, increases the error correction capability, and makes complex models easier to understand by DW developers and non expert users.This work has been partially supported by the ProS-Req (TIN2010-19130-C02-01) and by the MESOLAP (TIN2010-14860) and SERENIDAD (PEII-11-0327-7035) projects from the Spanish Ministry of Education and the Junta de Comunidades de Castilla La Mancha respectively. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)
Public universities employees perception of electronic information sharing between universities and the Ministry of Higher Education and Scientific Research
Electronic information sharing benefits organizations and institutions in various aspects including increasing the level of information accuracy and timeliness, improving the accountability and decision making, and minimizing the cost of information management.
There is a high degree of information sharing between Iraqi public universities and Ministry of Higher Education and Scientific Research (MOHESR), however, limited electronic information sharing exists between them, which brings difficulties and delay in making decisions. This limitation also creates challenges and barriers in supporting the
decentralization principle taken by the public universities in universities’ governance.
Thus, there is a need to conduct a study to identify the possible steps and strategies to
increase electronic information sharing between the ministry and universities. The main objective of this study is to propose a model of electronic information sharing between Iraqi public universities and MOHESR. Social Exchange Theory, Critical Mass Theory and Transactive Memory System Theory have been used to solve the problem and achieve the objectives. Purposive sampling has been used and multiple linear regression analyses were applied for data analysis. A total of 660 questionnaires have been distributed in five universities in Iraq and the returned response was 274 (42%). From the 16 factors proposed, ten factors are found to be significance which are IT capability, information quality, compatibility, complexity, data warehouse, top management, policy/legal
framework, interagency trust, upper level leadership and social network. Based on the
results obtained, the study presents a model of electronic information sharing between
public universities in Iraq and MOHESR. A comprehensive understanding of this model
will contribute to the improvement of the planning and implementation of three dimensions; technological, organizational and environmental of the public universities in their way forward to improvise electronic information sharing in the future. According to the findings, it can be concluded that three dimensions and ten factors can essentially increase the electronic information sharing among public universities and MOHESR
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Automatic physical database design : recommending materialized views
This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency
Recommended from our members
Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.
Data warehouse technology has been successfully integrated into the information
infrastructure of major organizations as potential solution for eliminating redundancy and
providing for comprehensive data integration. Realizing the importance of a data
warehouse as the main data repository within an organization, this dissertation addresses
different aspects related to the data warehouse architecture and performance issues.
Many data warehouse architectures have been presented by industry analysts and
research organizations. These architectures vary from the independent and physical
business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse.
The operational data store is a third tier which was offered later to address the business
requirements for inter-day data loading. While the industry-available architectures are all
valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity).
In this dissertation, I am advocating a new architecture (The Hybrid Architecture)
which encompasses the industry advocated architecture. The hybrid architecture demands
the acquisition, loading and consolidation of enterprise atomic and detailed data into a
single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit
centric Data Marts and Operational Data Stores (ODS) are built in the same instance
of the Enterprise Data Warehouse.
For the purpose of highlighting the role of data warehouses for different
applications, we describe an effort to develop a data warehouse for a geographical
information system (GIS). We further study the importance of data practices, quality and
governance for financial institutions by commenting on the RBC Financial Group case.
v
The development and deployment of the Enterprise Data Warehouse based on the
Hybrid Architecture spawned its own issues and challenges. Organic data growth and
business requirements to load additional new data significantly will increase the amount
of stored data. Consequently, the number of users will increase significantly. Enterprise
data warehouse obesity, performance degradation and navigation difficulties are chief
amongst the issues and challenges.
Association rules mining and social networks have been adopted in this thesis to
address the above mentioned issues and challenges. We describe an approach that uses
frequent pattern mining and social network techniques to discover different communities
within the data warehouse. These communities include sets of tables frequently accessed
together, sets of tables retrieved together most of the time and sets of attributes that
mostly appear together in the queries. We concentrate on tables in the discussion;
however, the model is general enough to discover other communities. We first build a
frequent pattern mining model by considering each query as a transaction and the tables
as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables
that are mostly accessed together and hence should be treated as one unit in storage and
retrieval for better overall performance. We utilize social network construction and
analysis to find maximum-sized sets of related tables; this is a more robust approach as
opposed to a union of overlapping itemsets. We derive the Jaccard distance between the
closed itemsets and construct the social network of tables by adding links that represent
distance above a given threshold. The constructed network is analyzed to discover
communities of tables that are mostly accessed together. The reported test results are
promising and demonstrate the applicability and effectiveness of the developed approach
Cloud BI: A Multi-party Authentication Framework for Securing Business Intelligence on the Cloud
Business intelligence (BI) has emerged as a key technology to be hosted on Cloud computing. BI offers a method to analyse data thereby enabling informed decision making to improve business performance and profitability. However, within the shared domains of Cloud computing, BI is exposed to increased security and privacy threats because an unauthorised user may be able to gain access to highly sensitive, consolidated business information. The business process contains collaborating services and users from multiple Cloud systems in different security realms which need to be engaged dynamically at runtime. If the heterogamous Cloud systems located in different security realms do not have direct authentication relationships then it is technically difficult to enable a secure collaboration. In order to address these security challenges, a new authentication framework is required to establish certain trust relationships among these BI service instances and users by distributing a common session secret to all participants of a session. The author addresses this challenge by designing and implementing a multiparty authentication framework for dynamic secure interactions when members of different security realms want to access services. The framework takes advantage of the trust relationship between session members in different security realms to enable a user to obtain security credentials to access Cloud resources in a remote realm. This mechanism can help Cloud session users authenticate their session membership to improve the authentication processes within multi-party sessions. The correctness of the proposed framework has been verified by using BAN Logics. The performance and the overhead have been evaluated via simulation in a dynamic environment. A prototype authentication system has been designed, implemented and tested based on the proposed framework. The research concludes that the proposed framework and its supporting protocols are an effective functional basis for practical implementation testing, as it achieves good scalability and imposes only minimal performance overhead which is comparable with other state-of-art methods
Research and Development of a General Purpose Instrument DAQ-Monitoring Platform applied to the CLOUD/CERN experiment
The current scientific environment has experimentalists and system administrators allocating large amounts of time for data access, parsing and gathering as well as instrument management. This is a growing challenge since there is an increasing number of large collaborations with significant amount of instrument resources, remote instrumentation sites and continuously improved and upgraded scientific instruments. DAQBroker is a new software designed to monitor networks of scientific instruments while also providing simple data access methods for any user. Data can be stored in one or several local or remote databases running on any of the most popular relational databases (MySQL, PostgreSQL, Oracle). It also provides the necessary tools for creating and editing the metadata associated with different instruments, perform data manipulation and generate events based on instrument measurements, regardless of the user’s know-how of individual instruments. Time series stored in a DAQBroker database also benefit from several statistical methods for time series classification, comparison and event detection as well as multivariate time series analysis methods to determine the most statistically relevant time series, rank the most influential time series and also determine the periods of most activity during specific experimental periods. This thesis presents the architecture behind the framework, assesses the performance under controlled conditions and presents a use-case under the CLOUD experiment at CERN, Switzerland. The univariate and multivariate time series statistical methods applied to this framework are also studied.O processo de investigação científica moderno requer que tanto experimentalistas como administradores de sistemas dediquem uma parte significativa do seu tempo a criar estratégias para aceder, armazenar e manipular instrumentos científicos e os dados que estes produzem. Este é um desafio crescente considerando o aumento de colaborações que necessitam de vários instrumentos, investigação em áreas remotas e instrumentos científicos com constantes alterações. O DAQBroker é uma nova plataforma desenhada para a monitorização de instrumentos científicos e ao mesmo tempo fornece métodos simples para qualquer utilizador aceder aos seus dados. Os dados podem ser guardados em uma ou várias bases de dados locais ou remotas utilizando os gestores de bases de dados mais comuns (MySQL, PostgreSQL, Oracle). Esta plataforma também fornece as ferramentas necessárias para criar e editar versões virtuais de instrumentos científicos e manipular os dados recolhidos dos instrumentos, independentemente do grau de conhecimento que o utilizador tenha com o(s) instrumento(s) utilizado(s). Séries temporais guardadas numa base de dados DAQBroker beneficiam de um conjunto de métodos estatísticos para a classificação, comparação e detecção de eventos, determinação das séries com maior influência e os sub-períodos experimentais com maior actividade. Esta tese apresenta a arquitectura da plataforma, os resultados de diversos testes de esforço efectuados em ambientes controlados e um caso real da sua utilização na experiência CLOUD, no CERN, Suíça. São estudados também os métodos de análise de séries temporais, tanto singulares como multivariadas aplicados na plataforma
- …