1,910 research outputs found

    Data Mining

    Get PDF

    Proximal business intelligence on the semantic web

    Get PDF
    This is the post-print version of this article. The official version can be accessed from the link below - Copyright @ 2010 Springer.Ubiquitous information systems (UBIS) extend current Information System thinking to explicitly differentiate technology between devices and software components with relation to people and process. Adapting business data and management information to support specific user actions in context is an ongoing topic of research. Approaches typically focus on providing mechanisms to improve specific information access and transcoding but not on how the information can be accessed in a mobile, dynamic and ad-hoc manner. Although web ontology has been used to facilitate the loading of data warehouses, less research has been carried out on ontology based mobile reporting. This paper explores how business data can be modeled and accessed using the web ontology language and then re-used to provide the invisibility of pervasive access; uncovering more effective architectural models for adaptive information system strategies of this type. This exploratory work is guided in part by a vision of business intelligence that is highly distributed, mobile and fluid, adapting to sensory understanding of the underlying environment in which it operates. A proof-of concept mobile and ambient data access architecture is developed in order to further test the viability of such an approach. The paper concludes with an ontology engineering framework for systems of this type – named UBIS-ONTO

    ETL for data science?: A case study

    Get PDF
    Big data has driven data science development and research over the last years. However, there is a problem - most of the data science projects don't make it to production. This can happen because many data scientists don’t use a reference data science methodology. Another aggravating element is data itself, its quality and processing. The problem can be mitigated through research, progress and case studies documentation about the topic, fostering knowledge dissemination and reuse. Namely, data mining can benefit from other mature fields’ knowledge that explores similar matters, like data warehousing. To address the problem, this dissertation performs a case study about the project “IA-SI - Artificial Intelligence in Incentives Management”, which aims to improve the management of European grant funds through data mining. The key contributions of this study, to the academia and to the project’s development and success are: (1) A combined process model of the most used data mining process models and their tasks, extended with the ETL’s subsystems and other selected data warehousing best practices. (2) Application of this combined process model to the project and all its documentation. (3) Contribution to the project’s prototype implementation, regarding the data understanding and data preparation tasks. This study concludes that CRISP-DM is still a reference, as it includes all the other data mining process models’ tasks and detailed descriptions, and that its combination with the data warehousing best practices is useful to the project IA-SI and potentially to other data mining projects.A big data tem impulsionado o desenvolvimento e a pesquisa da ciência de dados nos últimos anos. No entanto, há um problema - a maioria dos projetos de ciência de dados não chega à produção. Isto pode acontecer porque muitos deles não usam uma metodologia de ciência de dados de referência. Outro elemento agravador são os próprios dados, a sua qualidade e o seu processamento. O problema pode ser mitigado através da documentação de estudos de caso, pesquisas e desenvolvimento da área, nomeadamente o reaproveitamento de conhecimento de outros campos maduros que exploram questões semelhantes, como data warehousing. Para resolver o problema, esta dissertação realiza um estudo de caso sobre o projeto “IA-SI - Inteligência Artificial na Gestão de Incentivos”, que visa melhorar a gestão dos fundos europeus de investimento através de data mining. As principais contribuições deste estudo, para a academia e para o desenvolvimento e sucesso do projeto são: (1) Um modelo de processo combinado dos modelos de processo de data mining mais usados e as suas tarefas, ampliado com os subsistemas de ETL e outras recomendadas práticas de data warehousing selecionadas. (2) Aplicação deste modelo de processo combinado ao projeto e toda a sua documentação. (3) Contribuição para a implementação do protótipo do projeto, relativamente a tarefas de compreensão e preparação de dados. Este estudo conclui que CRISP-DM ainda é uma referência, pois inclui todas as tarefas dos outros modelos de processos de data mining e descrições detalhadas e que a sua combinação com as melhores práticas de data warehousing é útil para o projeto IA-SI e potencialmente para outros projetos de data mining

    Recent Developments in Data Warehousing

    Get PDF
    Data warehousing is a strategic business and IT initiative in many organizations today. Data warehouses can be developed in two alternative ways -- the data mart and the enterprise-wide data warehouse strategies -- and each has advantages and disadvantages. To create a data warehouse, data must be extracted from source systems, transformed, and loaded to an appropriate data store. Depending on the business requirements, either relational or multidimensional database technology can be used for the data stores. To provide a multidimensional view of the data using a relational database, a star schema data model is used. Online analytical processing can be performed on both kinds of database technology. Metadata about the data in the warehouse is important for IT and end users. A variety of data access tools and applications can be used with a data warehouse - SQL queries, management reporting systems, managed query environments, DSS/EIS, enterprise intelligence portals, data mining, and customer relationship management. A data warehouse can be used to support a variety of users - executive, managers, analysts, operational personnel, customers, and suppliers. Data warehousing concepts are brought to life through a case study of Harrah\u27s Entertainment, a firm that became a leader in the gaming industry with its CRM business strategy supported by data warehousing

    UVM Big Data? Aggregating Campus Databases and Creating a Data Warehouse to Improve Student Retention Rates at the University of Vermont

    Get PDF
    One of the biggest concerns of universities across the United States is the student retention rate. Because it is much more cost effective to keep an existing student enrolled than to enroll a new student, improving a university’s retention rate translates to a saving in costs for that institution. UVM’s first-year retention rate is currently 85.8%, which places them above many other public universities, but below most of UVM’s aspirant schools. UVM conducted a study in 2011 in an effort to determine causes of students leaving after their first year, but retention rates since the study have only marginally increased. Some universities have been using data mining techniques to determine factors correlated with student retention, such as living off campus or an income level below the poverty line. This thesis recommends that UVM create a data warehouse aggregating all student-related data from across campus in an attempt to improve student retention. There is currently no central repository of student-related data from sources such as Residential Life, Blackboard, Student Health Services, and Undergraduate Admissions. Data mining techniques could be used with this data warehouse to discover patterns between different fields of data and a student’s likelihood to withdraw from UVM. For example, what if there is a correlation between a student’s dorm view room and their likelihood to leave UVM? How does a student’s frequency of Blackboard use impact their chance of staying enrolled? This thesis explores the technical and logistical considerations involved in a large data warehousing project. While building a data warehouse may seem operationally daunting, the insights it could generate would be very beneficial for decision support for many years
    corecore