3,558 research outputs found

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    Collaborative analysis of multi-gigapixel imaging data using Cytomine

    Get PDF
    Motivation: Collaborative analysis of massive imaging datasets is essential to enable scientific discoveries. Results: We developed Cytomine to foster active and distributed collaboration of multidisciplinary teams for large-scale image-based studies. It uses web development methodologies and machine learning in order to readily organize, explore, share and analyze (semantically and quantitatively) multi-gigapixel imaging data over the internet. We illustrate how it has been used in several biomedical applications

    FIN-DM: finantsteenuste andmekaeve protsessi mudel

    Get PDF
    Andmekaeve hõlmab reeglite kogumit, protsesse ja algoritme, mis võimaldavad ettevõtetel iga päev kogutud andmetest rakendatavaid teadmisi ammutades suurendada tulusid, vähendada kulusid, optimeerida tooteid ja kliendisuhteid ning saavutada teisi eesmärke. Andmekaeves ja -analüütikas on vaja hästi määratletud metoodikat ja protsesse. Saadaval on mitu andmekaeve ja -analüütika standardset protsessimudelit. Kõige märkimisväärsem ja laialdaselt kasutusele võetud standardmudel on CRISP-DM. Tegu on tegevusalast sõltumatu protsessimudeliga, mida kohandatakse sageli sektorite erinõuetega. CRISP-DMi tegevusalast lähtuvaid kohandusi on pakutud mitmes valdkonnas, kaasa arvatud meditsiini-, haridus-, tööstus-, tarkvaraarendus- ja logistikavaldkonnas. Seni pole aga mudelit kohandatud finantsteenuste sektoris, millel on omad valdkonnapõhised erinõuded. Doktoritöös käsitletakse seda lünka finantsteenuste sektoripõhise andmekaeveprotsessi (FIN-DM) kavandamise, arendamise ja hindamise kaudu. Samuti uuritakse, kuidas kasutatakse andmekaeve standardprotsesse eri tegevussektorites ja finantsteenustes. Uurimise käigus tuvastati mitu tavapärase raamistiku kohandamise stsenaariumit. Lisaks ilmnes, et need meetodid ei keskendu piisavalt sellele, kuidas muuta andmekaevemudelid tarkvaratoodeteks, mida saab integreerida organisatsioonide IT-arhitektuuri ja äriprotsessi. Peamised finantsteenuste valdkonnas tuvastatud kohandamisstsenaariumid olid seotud andmekaeve tehnoloogiakesksete (skaleeritavus), ärikesksete (tegutsemisvõime) ja inimkesksete (diskrimineeriva mõju leevendus) aspektidega. Seejärel korraldati tegelikus finantsteenuste organisatsioonis juhtumiuuring, mis paljastas 18 tajutavat puudujääki CRISP- DMi protsessis. Uuringu andmete ja tulemuste abil esitatakse doktoritöös finantsvaldkonnale kohandatud CRISP-DM nimega FIN-DM ehk finantssektori andmekaeve protsess (Financial Industry Process for Data Mining). FIN-DM laiendab CRISP-DMi nii, et see toetab privaatsust säilitavat andmekaevet, ohjab tehisintellekti eetilisi ohte, täidab riskijuhtimisnõudeid ja hõlmab kvaliteedi tagamist kui osa andmekaeve elutsüklisData mining is a set of rules, processes, and algorithms that allow companies to increase revenues, reduce costs, optimize products and customer relationships, and achieve other business goals, by extracting actionable insights from the data they collect on a day-to-day basis. Data mining and analytics projects require well-defined methodology and processes. Several standard process models for conducting data mining and analytics projects are available. Among them, the most notable and widely adopted standard model is CRISP-DM. It is industry-agnostic and often is adapted to meet sector-specific requirements. Industry- specific adaptations of CRISP-DM have been proposed across several domains, including healthcare, education, industrial and software engineering, logistics, etc. However, until now, there is no existing adaptation of CRISP-DM for the financial services industry, which has its own set of domain-specific requirements. This PhD Thesis addresses this gap by designing, developing, and evaluating a sector-specific data mining process for financial services (FIN-DM). The PhD thesis investigates how standard data mining processes are used across various industry sectors and in financial services. The examination identified number of adaptations scenarios of traditional frameworks. It also suggested that these approaches do not pay sufficient attention to turning data mining models into software products integrated into the organizations' IT architectures and business processes. In the financial services domain, the main discovered adaptation scenarios concerned technology-centric aspects (scalability), business-centric aspects (actionability), and human-centric aspects (mitigating discriminatory effects) of data mining. Next, an examination by means of a case study in the actual financial services organization revealed 18 perceived gaps in the CRISP-DM process. Using the data and results from these studies, the PhD thesis outlines an adaptation of CRISP-DM for the financial sector, named the Financial Industry Process for Data Mining (FIN-DM). FIN-DM extends CRISP-DM to support privacy-compliant data mining, to tackle AI ethics risks, to fulfill risk management requirements, and to embed quality assurance as part of the data mining life-cyclehttps://www.ester.ee/record=b547227

    Modelling of E-Governance Framework for Mining Knowledge from Massive Grievance Redressal Data

    Get PDF
    With the massive proliferation of online applications for the citizens with abundant resources, there is a tremendous hike in usage of e-governance platforms. Right from entrepreneur, players, politicians, students, or anyone who are highly depending on web-based grievance redressal networking sites, which generates loads of massive grievance data that are not only challenging but also highly impossible to understand. The prime reason behind this is grievance data is massive in size and they are highly unstructured. Because of this fact, the proposed system attempts to understand the possibility of performing knowledge discovery process from grievance Data using conventional data mining algorithms. Designed in Java considering massive number of online e-governance framework from civilian’s grievance discussion forums, the proposed system evaluates the effectiveness of performing datamining for Big data

    A service oriented architecture to provide data mining services for non-expert data miners

    Get PDF
    In today's competitive market, companies need to use discovery knowledge techniques to make better, more informed decisions. But these techniques are out of the reach of most users as the knowledge discovery process requires an incredible amount of expertise. Additionally, business intelligence vendors are moving their systems to the cloud in order to provide services which offer companies cost-savings, better performance and faster access to new applications. This work joins both facets. It describes a data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service. Its main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process. © 2012 Elsevier B.V. All rights reserved

    Customer lifetime value: a framework for application in the insurance industry - building a business process to generate and maintain an automatic estimation agent

    Get PDF
    Research Project submited as partial fulfilment for the Master Degree in Statistics and Information Management, specialization in Knowledge Management and Business IntelligenceIn recent years the topic of Customer Lifetime Value (CLV) or in its expanded version, Customer Equity (CE) has become popular as a strategic tool across several industries, in particular in retail and services. Although the core concepts of CLV modelling have been studied for several years and the mathematics that underpins the concept is well understood, the application to specific industries is not trivial. The complexities associated with the development of a CLV programme as a business process are not insignificant causing a myriad of obstacles to its implementation. This research project builds a framework to develop and implement the CLV concept as maintainable business process with the focus on the Insurance Industry, in particular for the nonlife line of business. Key concepts, as churn modelling, portfolio stationary premiums, fiscal policies and balance sheet information must be integrated into the CLV framework. In addition, an automatic estimation machine (AEM) is developed to standardize CLV calculations. The concept of AEM is important, given that CLV information “must be fit for purpose”, when used in other business processes. The field work is carried out in a Portuguese Bancassurance Company which is part of an important Portuguese financial Group. Firstly this is done by investigating how to translate and apply the known CLV concepts into the insurance industry context. Secondly, a sensitivity study is done to establish the optimum parameters strategy. This is done by incorporating and comparing several Datamining concepts applied to churn prediction and customer base segmentation. Scenarios for balance sheet information usage and others actuarial concepts are analyzed to calibrate the Cash Flow component of the CLV framework. Thirdly, an Automatic Estimation Agent is defined for application to the current or the expanding firm portfolio, the advantages of using the SOA approach for deployment is also verified. Additionally a comparative impact study is done between two valuation views: the Premium/Cost driven versus the CLV driven. Finally a framework for a BPM is presented, not only for building the AEM but also for its maintenance according to an explicit performance threshold.O tema do valor embebido do Cliente (Customer Lifetime Value ou CLV), ou na sua versão expandida, valoração patrimonial do Cliente (Customer Equity), adquiriu alguma relevância como ferramenta estratégica em várias indústrias, em particular na Distribuição e Serviços. Embora os principais conceitos subjacentes ao CLV tenham sido já desenvolvidos e a matemática financeira possa ser considerada trivial, a sua aplicação prática não o é. As complexidades associadas ao desenvolvimento de um programa de CLV, especialmente na forma de Processo de Negócio não são insignificantes, existindo uma miríade de obstáculos à sua implementação. Este projecto de pesquisa desenvolve o enquadramento de adaptação, actividades e processos necessários para a aplicação do conceito à Industria de Seguros, especificamente para uma empresa que actue no Sector Não Vida. Conceitos-chave, como a modelação da erosão das carteiras, a estacionaridade dos prémios, as políticas fiscais e informação de balanço terão de ser integrados no âmbito do programa de modelação do valor embebido do Cliente. Um dos entregáveis será uma “máquina automática de estimação” do valor embebido, essa ferramenta servirá para padronizar os cálculos do CLV, para além disso é importante, dado que a informação do CLV será utilizada noutros processos de negócio, como por exemplo a distribuição ou vendas. O trabalho de campo é realizado numa empresa de Seguros tipo Bancassurance pertença de um Grupo Financeiro Português relevante. O primeiro passo do trabalho será a compressão do conceito do CLV e como aplicá-lo aos Seguros. Em segundo lugar, será feito um estudo de sensibilidade para determinar a estratégia óptima de parâmetros através de aplicação de técnicas de modelação. Em terceiro lugar serão abordados alguns detalhes da máquina automática de estimação e a sua utilização do ponto de vista dos Serviços e Sistemas de Negócio ( e.g. via SOA). Em paralelo será realizado um estudo de impacto comparativo entre as duas visões de avaliação do negócio: Rácio de Sinistralidade vs CLV. Por último será apresentado um desenho de processo para a manutenção continuada da utilização deste conceito no suporte ao negócio

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    Applying case based reasoning for prioritizing areas of business management

    Get PDF
    Determining the importance of different management areas in a company provides guidance about the needs of increasing the analysis and actions focuses in particular topic. To do it, it is necessary to decompose the management in a coherent set of specific management areas and provide a way that allows the company to determine the importance of these areas for them. This paper presents a novel system that guides companies to obtain a classification of important management areas for them. It is focused on the use of a case based reasoning system because the variability and the evolution of companies as time passes requires using techniques with learning capabilities. The proposed system provides an automatic self-assessment system that provides companies an ordered list of their most important management areas. This system was implemented a year ago for the evaluation of Spanish companies. Currently, it is in production providing relevant information about the management areas of these companies

    An adaptive methodology for the improvement of knowledge acquisition by a multimedia web tool

    Get PDF
    Adaptive learning is a method that personalizes the teaching-learning strategies in accordance with the needs and preferences of each student. This article describes the design, the implementation and the tests of a web application developed with adaptive learning in order to improve student knowledge acquisition and to simplify the teacher’s work. The tool uses EventSource technologies combined with heuristic functions to produce a predictive algorithm, which is capable of being adapted to the students in a customized way by presenting the content adjusted according to their cognitive needs. The design is based on the hypothesis that the acquisition of knowledge can be improved by using a computing application which presents a syllabus to be learned in various forms. In this way, the application determines students’ progress within the content of the material, which is classified by branches of knowledge. The tool was applied to one group of students and the data that we obtained was compared with the results of the rest, subject to the usual knowledge transmission system. The results obtained not only improve the academic results, but also enhance the heuristic decision-making about the content to be taught.Peer ReviewedPostprint (published version
    corecore