473 research outputs found

    Cloud BI: Future of business intelligence in the Cloud

    Get PDF
    In self-hosted environments it was feared that business intelligence (BI) will eventually face a resource crunch situation due to the never ending expansion of data warehouses and the online analytical processing (OLAP) demands on the underlying networking. Cloud computing has instigated a new hope for future prospects of BI. However, how will BI be implemented on Cloud and how will the traffic and demand profile look like? This research attempts to answer these key questions in regards to taking BI to the Cloud. The Cloud hosting of BI has been demonstrated with the help of a simulation on OPNET comprising a Cloud model with multiple OLAP application servers applying parallel query loads on an array of servers hosting relational databases. The simulation results reflected that extensible parallel processing of database servers on the Cloud can efficiently process OLAP application demands on Cloud computing

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Towards hypermedia support in database systems

    Get PDF
    The general goal of our research is to automatically generate links and other hypermedia related services to analytical applications. Using a dynamic hypermedia engine (DHE), the following features have been automated for database systems. Based on the database\u27s relational (physical) schema and its original (non-normalized) entity-relationship specification links are generated, database application developers may also specify the relationship between different classes of database elements. These elements can be controlled by the same or different database application, or even by another software system. A DHE prototype has been developed and illustrates the above for a relational database management system. The DHE is the only approach to automated linking that specializes in adding a hyperlinks automatically to analytical applications that generate their displays dynamically (e.g., as the result of a user query). The DHE\u27s linking is based on the structure of the application, not keyword search or lexical analysis based on the display values within its screens and documents. The DHE aims to provide hypermedia functionality without altering applications by building application wrappers as an intermediary between the applications and the engine

    A framework to evaluate big data fabric tools

    Get PDF
    A huge growth in data and information needs has led organizations to search for the most appropriate data integration tools for different types of business. The management of a large dataset requires the exploitation of appropriate resources, new methods, as well as the possession of powerful technologies. That led the surge of numerous ideas, technologies, and tools offered by different suppliers. For this reason, it is important to understand the key factors that determine the need to invest in a big data project and then categorize these technologies to simplify the choice that best fits the context of their problem. The objective of this study is to create a model that will serve as a basis for evaluating the different alternatives and solutions capable of overcoming the major challenges of data integration. Finally, a brief analysis of three major data fabric solutions available on the market is also carried out, including Talend Data Fabric, IBM Infosphere, and Informatica Platform

    From access and integration to mining of secure genomic data sets across the grid

    Get PDF
    The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS

    Online View Selection for the Web

    Get PDF
    View materialization has been shown to ameliorate the scalability problem of data-intensive web servers. However, unlike data warehouses which are off-line during updates, most web servers maintain their back-end databases online and perform updates concurrently with user accesses. In such environments, the selection of views to materialize must be performed online; both performance and data freshness should be considered. In this paper, we discuss the Online View Selection problem: select which views to materialize in order to maximize performance while maintaining freshness at acceptable levels. We define Quality of Service and Quality of Data metrics and present OVIS(theta), an adaptive algorithm for the Online View Selection problem. OVIS(theta) evolves the materialization decisions to match the constantly changing access/update patterns on the Web. The algorithm is also able to identify infeasible freshness levels, effectively avoiding saturation at the server. We performed extensive experiments under various workloads, which showed that our online algorithm comes close to the optimal off-line selection algorithm. Also UMIACS-TR-2002-2

    Design and Development of a Software Module for Minimizing Transportation Cost

    Get PDF
    The goal of this project is to design and develop a software module to solve a transportation problem, relating to minimizing costs to transport finished goods from multiple origins to multiple destinations. The transportation problem will be modeled as a linear programming model, using AMPL linear programming (LP) software. A graphical user interface (GUI) will be developed to enable the user to enter the data and parameters for the transportation problem. The GUI will be developed using C# programming language within the Microsoft© .NET framework. The GUI will also enable the user to launch the AMPL module to solve the transportation problem to calculate optimum transportation costs. A relational database will be designed and developed to store the parameters and data for the AMPL LP module. Both the AMPL LP model and the GUI will be interfaced with the relational database

    Handling Live Sensor Data on the Semantic Web

    Get PDF
    The increased linking of objects in the Internet of Things and the ubiquitous flood of data and information require new technologies in data processing and data storage in particular in the Internet and the Semantic Web. Because of human limitations in data collection and analysis, more and more automatic methods are used. Above all, these sensors or similar data producers are very accurate, fast and versatile and can also provide continuous monitoring even places that are hard to reach by people. The traditional information processing, however, has focused on the processing of documents or document-related information, but they have different requirements compared to sensor data. The main focus is static information of a certain scope in contrast to large quantities of live data that is only meaningful when combined with other data and background information. The paper evaluates the current status quo in the processing of sensor and sensor-related data with the help of the promising approaches of the Semantic Web and Linked Data movement. This includes the use of the existing sensor standards such as the Sensor Web Enablement (SWE) as well as the utilization of various ontologies. Based on a proposed abstract approach for the development of a semantic application, covering the process from data collection to presentation, important points, such as modeling, deploying and evaluating semantic sensor data, are discussed. Besides the related work on current and future developments on known diffculties of RDF/OWL, such as the handling of time, space and physical units, a sample application demonstrates the key points. In addition, techniques for the spread of information, such as polling, notifying or streaming are handled to provide examples of data stream management systems (DSMS) for processing real-time data. Finally, the overview points out remaining weaknesses and therefore enables the improvement of existing solutions in order to easily develop semantic sensor applications in the future

    Support for taxonomic data in systematics

    Get PDF
    The Systematics community works to increase our understanding of biological diversity through identifying and classifying organisms and using phylogenies to understand the relationships between those organisms. It has made great progress in the building of phylogenies and in the development of algorithms. However, it has insufficient provision for the preservation of research outcomes and making those widely accessible and queriable, and this is where database technologies can help. This thesis makes a contribution in the area of database usability, by addressing the query needs present in the community, as supported by the analysis of query logs. It formulates clearly the user requirements in the area of phylogeny and classification queries. It then reports on the use of warehousing techniques in the integration of data from many sources, to satisfy those requirements. It shows how to perform query expansion with synonyms and vernacular names, and how to implement hierarchical query expansion effectively. A detailed analysis of the improvements offered by those query expansion techniques is presented. This is supported by the exposition of the database techniques underlying this development, and of the user and programming interfaces (web services) which make this novel development available to both end-users and programs
    corecore