3,175 research outputs found
Water for People, Water for Life
This report documents the serious water crisis we are facing at the beginning of the 21st century. This crisis is one of water governance, essentially caused by the ways in which we mismanage water. But the real tragedy is the effect it has on the everyday lives of poor people, who are blighted by the burden of water-related disease, living in degraded and often dangerous environments, struggling to get an education for their children and to earn a living, and to get enough to eat. The executive summary offers an analysis of the problem as well as pilot case studies for water management and recommendations for future action
Design and Implementation of an Enterprise Data Warehouse
The reporting and sharing of information has been synonymous with databases as long as there have been systems to host them. Now more than ever, users expect the sharing of information in an immediate, efficient, and secure manner. However, due to the sheer number of databases within the enterprise, getting the data in an effective fashion requires a coordinated effort between the existing systems. There is a very real need today to have a single location for the storage and sharing of data that users can easily utilize to make improved business decisions, rather than trying to traverse the multiple databases that exist today and can do so by using an enterprise data warehouse.
The Thesis involves a description of data warehousing techniques, design, expectations, and challenges regarding data cleansing and transforming existing data, as well as other challenges associated with extracting from transactional databases. The Thesis also includes a technical piece discussing database requirements and technologies used to create and refresh the data warehouse. The Thesis discusses how data from databases and other data warehouses could integrate. In addition, there is discussion of specific data marts within the warehouse to satisfy a specific need. Finally, there are explanations for how users will consume the data in the enterprise data warehouse, such as through reporting and other business intelligence.
This discussion also includes the topics of system architecture of how data from databases and other data warehouses from different departments could integrate. An Enterprise Data Warehouse prototype developed will show how a pair of different databases undergoes the Extract, Transform and Load (ETL) process and loaded into an actual set of star schemas then makes the reporting easier. Separately, an important piece of this thesis takes an actual example of data and compares the performance between them by running the same queries against separate databases, one transactional and one data warehouse. As the queries expand in difficulty, larger grows the gap between the actual recorded times of running that same query in the different environments
Big Data, Cloud, and Earth Science
Given the reality of a Big Data future, we need to reevaluate our ability to quickly process immense amounts of data while maintaining our responsibility of stewarding our archives. At National Aeronautics and Space Administration (NASA), our Science Missions Directorate and Earth Science Data Systems have been exploring the commercial cloud as a potential mechanism for data ingest, archive, and distribution since 2015. This talk will discuss our timeline and strategies being employed and will discuss the process of introducing commercial cloud entities into a largely on-premise hardware-reliant approach to ingest, archive, and distribution. We will address topics such as vendor lock-in, varying compute and costing strategies, and end-user adoption. Please join us for an open and authentic conversation about the opportunities we are embracing and the risks we are undertaking as a part of this evolution
Business Intelligence and Analytics in Small and Medium-Sized Enterprises
This thesis presents a study of Business Intelligence and Analytics (BI&A) adoption in small and medium-sized enterprises (SMEs). Although the importance of BI&A is widely accepted, empirical research shows SMEs still lag in BI&A proliferation. Thus, it is crucial to understand the phenomenon of BI&A adoption in SMEs.
This thesis will investigate and explore BI&A adoption in SMEs, addressing the main research question: How can we understand the phenomenon of BI&A adoption in SMEs? The adoption term in this thesis refers to all the IS adoption stages, including investment, implementation, utilization, and value creation. This research uses a combination of a literature review, a qualitive exploratory approach, and a ranking-type Delphi study with a grounded Delphi approach. The empirical part includes interviews with 38 experts and Delphi surveys with 39 experts from various Norwegian industries.
The research strategy investigates the factors influencing BI&A adoption in SMEs. The study examined the investment, implementation, utilization, and value creation of BI&A technologies in SMEs. A thematic analysis was adopted to collate the qualitative expert interview data and search for potential themes. The Delphi survey findings were further examined using the grounded Delphi method. To better understand the study’s findings, three theoretical perspectives were applied: resource-based view theory, dynamic capabilities, and IS value process models.
The thesis’ research findings are presented in five articles published in international conference proceedings and journals. This thesis summary will coherently integrate and discuss these results.publishedVersio
Meeting Funders’ Data Policies: Blueprint for a Research Data Management Service Group (RDMSG)
This report summarizes the elements that we expect to be required in data management plans, describes Cornell’s current capabilities and needs in meeting such requirements, and proposes a structure for a virtual organization that builds on the collaboration between the DRSG, CAC, CUL and CISER. The proposed organization also includes Cornell Information Technologies (CIT) and Weill Cornell Medical College Information Technologies and Services (WCMC-ITS) to further develop and provide this support
Recommended from our members
Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.
Data warehouse technology has been successfully integrated into the information
infrastructure of major organizations as potential solution for eliminating redundancy and
providing for comprehensive data integration. Realizing the importance of a data
warehouse as the main data repository within an organization, this dissertation addresses
different aspects related to the data warehouse architecture and performance issues.
Many data warehouse architectures have been presented by industry analysts and
research organizations. These architectures vary from the independent and physical
business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse.
The operational data store is a third tier which was offered later to address the business
requirements for inter-day data loading. While the industry-available architectures are all
valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity).
In this dissertation, I am advocating a new architecture (The Hybrid Architecture)
which encompasses the industry advocated architecture. The hybrid architecture demands
the acquisition, loading and consolidation of enterprise atomic and detailed data into a
single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit
centric Data Marts and Operational Data Stores (ODS) are built in the same instance
of the Enterprise Data Warehouse.
For the purpose of highlighting the role of data warehouses for different
applications, we describe an effort to develop a data warehouse for a geographical
information system (GIS). We further study the importance of data practices, quality and
governance for financial institutions by commenting on the RBC Financial Group case.
v
The development and deployment of the Enterprise Data Warehouse based on the
Hybrid Architecture spawned its own issues and challenges. Organic data growth and
business requirements to load additional new data significantly will increase the amount
of stored data. Consequently, the number of users will increase significantly. Enterprise
data warehouse obesity, performance degradation and navigation difficulties are chief
amongst the issues and challenges.
Association rules mining and social networks have been adopted in this thesis to
address the above mentioned issues and challenges. We describe an approach that uses
frequent pattern mining and social network techniques to discover different communities
within the data warehouse. These communities include sets of tables frequently accessed
together, sets of tables retrieved together most of the time and sets of attributes that
mostly appear together in the queries. We concentrate on tables in the discussion;
however, the model is general enough to discover other communities. We first build a
frequent pattern mining model by considering each query as a transaction and the tables
as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables
that are mostly accessed together and hence should be treated as one unit in storage and
retrieval for better overall performance. We utilize social network construction and
analysis to find maximum-sized sets of related tables; this is a more robust approach as
opposed to a union of overlapping itemsets. We derive the Jaccard distance between the
closed itemsets and construct the social network of tables by adding links that represent
distance above a given threshold. The constructed network is analyzed to discover
communities of tables that are mostly accessed together. The reported test results are
promising and demonstrate the applicability and effectiveness of the developed approach
Personal Health Train Architecture with Dynamic Cloud Staging
Scientific advances, especially in the healthcare domain, can be accelerated by making data available for analysis. However, in traditional data analysis systems, data need to be moved to a central processing unit that performs analyses, which may be undesirable, e.g. due to privacy regulations in case these data contain personal information. This paper discusses the Personal Health Train (PHT) approach in which data processing is brought to the (personal health) data rather than the other way around, allowing (private) data accessed to be controlled, and to observe ethical and legal concerns. This paper introduces the PHT architecture and discusses the data staging solution that allows processing to be delegated to components spawned in a private cloud environment in case the (health) organisation hosting the data has limited resources to execute the required processing. This paper shows the feasibility and suitability of the solution with a relatively simple, yet representative, case study of data analysis of Covid-19 infections, which is performed by components that are created on demand and run in the Amazon Web Services platform. This paper also shows that the performance of our solution is acceptable, and that our solution is scalable. This paper demonstrates that the PHT approach enables data analysis with controlled access, preserving privacy and complying with regulations such as GDPR, while the solution is deployed in a private cloud environment
Review of modern business intelligence and analytics in 2015: How to tame the big data in practice?: Case study - What kind of modern business intelligence and analytics strategy to choose?
The objective of this study was to find out the state of art architecture of modern business intelligence and analytics. Furthermore the status quo of business intelligence and analytics' architecture in an anonymous case company was examined. Based on these findings a future strategy was designed to guide the case company towards a better business intelligence and analytics environment. This objective was selected due to an increasing interest on big data topic. Thus the understanding on how to move on from traditional business intelligence practices to modern ones and what are the available options were seen as the key questions to be solved in order to gain competitive advantage for any company in near future.
The study was conducted as a qualitative single-case study. The case study included two parts: an analytics maturity assessment, and an analysis of business intelligence and analytics' architecture. The survey included over 30 questions and was sent to 25 analysts and other individuals who were using a significant time to deal with or read financial reports like for example managers. The architecture analysis was conducted by gathering relevant information on high level. Furthermore a big picture was drawn to illustrate the architecture. The two parts combined were used to construct the actual current maturity level of business intelligence and analytics in the case company. Three theoretical frameworks were used: first framework regarding the architecture, second framework regarding the maturity level and third framework regarding reporting tools. The first higher level framework consisted of the modern data warehouse architecture and Hadoop solution from D'Antoni and Lopez (2014). The second framework included the analytics maturity assessment from the data warehouse institute (2015). Finally the third framework analyzed the advanced analytics tools from Sallam et al. (2015).
The findings of this study suggest that modern business intelligence and analytics solution can include both data warehouse and Hadoop components. These two components are not mutually exclusive. Instead Hadoop is actually augmenting data warehouse to another level. This thesis shows how companies can evaluate their current maturity level and design a future strategy by benchmarking their own actions against the state of art solution. To keep up with the fast pace of development, research must be continuous. Therefore in future for example a study regarding a detailed path of implementing Hadoop would be a great addition to this field
- …