6,955 research outputs found

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Query Optimizationin Teradata Warehouse, Journal of Telecommunications and Information Technology, 2010, nr 3

    Get PDF
    The time necessary for data processing is becoming shorter and shorter nowadays. This thesis presents a definition of the active data warehousing (ADW) paradigm. One of the data warehouses which is consistent with this paradigm is teradata warehouse. Therefore, the basic elements of the teradata architecture are described, such as processors parsing engine (PE) and access module processor (AMP). Emphasis was put on the analysis of query optimization methods. There is presented the impact of a primary index on the time of query execution. Furthermore, this paper shows different methods of optimization of data selection, data joins and data aggregation. All these methods can help to minimize the time for data processing. This paper presents experiments which show the usage of different methods of query optimization. At the end some conclusions about different index usage are included

    Comprehensive characterization of an open source document search engine

    Get PDF
    This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

    Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

    Get PDF
    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

    Some Considerations about Modern Database Machines

    Get PDF
    Optimizing the two computing resources of any computing system - time and space - has al-ways been one of the priority objectives of any database. A current and effective solution in this respect is the computer database. Optimizing computer applications by means of database machines has been a steady preoccupation of researchers since the late seventies. Several information technologies have revolutionized the present information framework. Out of these, those which have brought a major contribution to the optimization of the databases are: efficient handling of large volumes of data (Data Warehouse, Data Mining, OLAP – On Line Analytical Processing), the improvement of DBMS – Database Management Systems facilities through the integration of the new technologies, the dramatic increase in computing power and the efficient use of it (computer networks, massive parallel computing, Grid Computing and so on). All these information technologies, and others, have favored the resumption of the research on database machines and the obtaining in the last few years of some very good practical results, as far as the optimization of the computing resources is concerned.Database Optimization, Database Machines, Data Warehouse, OLAP – On Line Analytical Processing, OLTP – On Line Transaction Processing, Parallel Processing

    Database Systems - Present and Future

    Get PDF
    The database systems have nowadays an increasingly important role in the knowledge-based society, in which computers have penetrated all fields of activity and the Internet tends to develop worldwide. In the current informatics context, the development of the applications with databases is the work of the specialists. Using databases, reach a database from various applications, and also some of related concepts, have become accessible to all categories of IT users. This paper aims to summarize the curricular area regarding the fundamental database systems issues, which are necessary in order to train specialists in economic informatics higher education. The database systems integrate and interfere with several informatics technologies and therefore are more difficult to understand and use. Thus, students should know already a set of minimum, mandatory concepts and their practical implementation: computer systems, programming techniques, programming languages, data structures. The article also presents the actual trends in the evolution of the database systems, in the context of economic informatics.database systems - DBS, database management systems – DBMS, database – DB, programming languages, data models, database design, relational database, object-oriented systems, distributed systems, advanced database systems
    corecore