366,846 research outputs found

    KTA : a framework for integrating expert knowledge and experiment memory in transcriptome analysis

    Get PDF
    International audienceThis paper addresses the problem of the integration of expert knowledge in a data mining process. We present the KTA ( integrating expert Knowledge in Transcriptome analysis) framework which allows the mining process to be driven by prior knowledge on the application domain. KTA is embedded in the MEDIANTE project for evaluating and using DNA microarrays, the CORESE semantic search engine and the ANNOT module which annotates scientific publications.Cet article propose une solution au problĂšme de l’intĂ©gration de la connaissance experte dans le processus de data mining. Nous prĂ©sentons KTA (integrating expert Knowledge in Transcriptome Analysis) qui permet au processus de data mining d’ĂȘtre conduit par des connaissances Ă  priori sur le domaine d’application. KTA est une composante du projet MEDIANTE dont le but est dÂ’Ă©valuer et utiliser des biopuces Ă  ADN

    Mining unexpected patterns using decision trees and interestingness measures: a case study of endometriosis

    Get PDF
    [[abstract]]Because clinical research is carried out in complex environments, prior domain knowledge, constraints, and expert knowledge can enhance the capabilities and performance of data mining. In this paper we propose an unexpected pattern mining model that uses decision trees to compare recovery rates of two different treatments, and to find patterns that contrast with the prior knowledge of domain users. In the proposed model we define interestingness measures to determine whether the patterns found are interesting to the domain. By applying the concept of domain-driven data mining, we repeatedly utilize decision trees and interestingness measures in a closed-loop, in-depth mining process to find unexpected and interesting patterns. We use retrospective data from transvaginal ultrasound-guided aspirations to show that the proposed model can successfully compare different treatments using a decision tree, which is a new usage of that tool. We believe that unexpected, interesting patterns may provide clinical researchers with different perspectives for future research.[[incitationindex]]SCI[[incitationindex]]EI[[booktype]]çŽ™æœŹ[[booktype]]電歐

    Unexpectedness as a Measure of Interestingness in Knowledge Discovery

    Get PDF
    Organizations are taking advantage of "data-mining" techniques to leverage the vast amounts of data captured as they process routine transactions. Data-mining is the process of discovering hidden structure or patterns in data. However several of the pattern discovery methods in datamining systems have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that managers have. This research addresses these drawbacks by developing ways to generate interesting patterns by incorporating managers' prior knowledge in the process of searching for patterns in data. Specifically we focus on providing methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Our approach should lead to the development of decision support systems that provide managers with more relevant patterns from data and aid in effective decision making.Information Systems Working Papers Serie

    A case of using formal concept analysis in combination with emergent self organizing maps for detecting domestic violence.

    Get PDF
    In this paper, we propose a framework for iterative knowledge discovery from unstructured text using Formal Concept Analysis and Emergent Self Organizing Maps. We apply the framework to a real life case study using data from the Amsterdam-Amstelland police. The case zooms in on the problem of distilling concepts for domestic violence from the unstructured text in police reports. Our human-centered framework facilitates the exploration of the data and allows for an efficient incorporation of prior expert knowledge to steer the discovery process. This exploration resulted in the discovery of faulty case labellings, common classification errors made by police officers, confusing situations, missing values in police reports, etc. The framework was also used for iteratively expanding a domain-specific thesaurus. Furthermore, we showed how the presented method was used to develop a highly accurate and comprehensible classification model that automatically assigns a domestic or non-domestic violence label to police reports.Formal concept analysis; Emergent self organizing map; Text mining; Actionable knowledge discovery; Domestic violence;

    Using metarules to integrate knowledge in knowledge based systems. An application in the woodworking industry

    Get PDF
    The current study addresses the integration of knowledge obtained from Data Mining structures and models into existing Knowledge Based solutions. It presents a technique adapted from commonKADS and spiral methodology to develop an initial knowledge solution using a traditional approach for requirement analysis, knowledge acquisition, and implementation. After an initial prototype is created and verified, the solution is enhanced incorporating new knowledge obtained from Online Analytical Processing, specifically from Data Mining models and structures using meta rules. Every meta rule is also verified prior to being included in the selection and translation of rules into the Expert System notation. Once an initial iteration was completed, responses from test cases were compared using an agreement index and kappa index. The problem domain was restricted to remake and rework operations in a cabinet making company. For Data Mining models, 8,674 cases of Price of Non Conformance (PONC) were used for a period of time of 3 months. Initial results indicated that the technique presented sufficient formalism to be used in the development of new systems, using Trillium scale. The use of 50 additional cases randomly selected from different departments indicated that responses from the original system and the solution that incorporated new knowledge from Data Mining differed significantly. Further inspection of responses indicated that the new solution with additional 68 rules was able to answer, although with an incorrect alternative in 28 additional cases that the initial solution was not able to provide a conclusion

    A COMPREHENSIVE GEOSPATIAL KNOWLEDGE DISCOVERY FRAMEWORK FOR SPATIAL ASSOCIATION RULE MINING

    Get PDF
    Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process. Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining. To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining

    Discrimination as a Field of Law

    Get PDF
    Introduction: Rheumatoid arthritis (RA) is a chronic inflammatory disease. Treatment strategies emphasize early multi-professional interventions to reduce disease activity and to prevent disability, but there is a lack of knowledge on how optimal treatment can be provided to each individual patient. Aim: To elucidate how clinical manifestations of early RA are associated to disease and disability outcomes, to strive for greater potential to establish prognosis in early RA, and to facilitate implementation of decision support through analyses of the decision-making environment in chronic care. Methods: Multivariate statistics and mathematical modelling, as well as field observations and focus group interviews. Results: Decision support: A prognostic tree that predicted patients with a poor prognosis (moderate or high levels of DAS-28) at one year after diagnosis had a performance of 25% sensitivity, 90% specificity and a positive predictive value of 76%. Implementation of a decision support application at a rheumatology unit should include taking into account incentive structures, workflow and awareness, as well as informal communication structures. Prognosis: A considerable part of the variance in disease activity at one year after diagnosis could be explained by disease progression during the first three months after diagnosis. Using different types of knowledge – different expertise – prior to standardized data mining methods was found to be a promising when mining (clinical) data for new patterns that elicit new knowledge. Disease and disability: Women report more fatigue than men in early RA, although the difference is not consistently significant. Fatigue in early RA is closely and rather consistently related to disease activity, pain and activity limitation, as well as to mental health and sleep disturbance. Conclusion: A decision tree was designed to identify patients at risk of poor prognosis at one year after the diagnosis of RA. When constructing prediction rules for good or poor prognosis, including more measures of disease and disability progressions showed promise. Using different types of knowledge – different lenses of expertise – prior to standardized data mining methods was also a promising method when mining (clinical) data for new patterns that elicit new knowledge.Introduktion: Reumatoid artrit (RA) Ă€r en kronisk inflammatorisk sjukdom. Dagens behandlingsstrategi bygger pĂ„ tidiga multiprofessionella insatser för att reducera sjukdomsaktivitet och minska risken för framtida funktionshinder. Idag finns stora datamĂ€ngder tillgĂ€ngliga gĂ€llande medicinering och utfall vid RA. Dessa data erbjuder möjligheter att generera ny kunskap som kan anvĂ€ndas för att forma beslutsstöd. Syfte: Att undersöka hur olika kliniska manifestationer vid tidig RA samvarierar med funktionshinder och sjukdomsaktivitet, att pröva metoder att stĂ€lla prognos vid tidig RA, och att analysera en kontext för beslutsfattande inom vĂ„rd av kroniskt sjuka. Metod: Multivariat statistik och matematisk modellering, samt observationsstudier och fokusgruppsintervjuer. Resultat: Beslutsstöd: Ett beslutstrĂ€d utformades för att bestĂ€mma vilka patienter som har dĂ„lig prognos (mĂ„ttlig eller hög DAS-28) ett Ă„r efter diagnos. BeslutstrĂ€det hade 25 % sensitivitet, 90 % specificitet och ett positivt prediktivt vĂ€rde pĂ„ 76 %. Vid införande av beslutsstöd pĂ„ en reumatologisk klinik befanns det nödvĂ€ndigt att hĂ€nsyn tas till incitamentsstrukturer, arbetsflöde och samarbetsformer. Informella kommunikationsstrukturer kan ocksĂ„ ha stort inflytande pĂ„ klinisk praxis. Prognos: En betydande del av variansen i sjukdomsaktivitet ett Ă„r efter diagnos kan förklaras av sjukdomsprogression första tre mĂ„naderna efter diagnos. Att formalisera olika experters erfarenheter före standardiserade ”data mining” metoder Ă€r en lovande ansats nĂ€r man letar efter mönster i (kliniska) databaser. Funktionshinder och sjukdomsaktivitet: Kvinnor rapporterar mer trötthet Ă€n mĂ€n vid tidig RA, men skillnaden Ă€r inte konsistent över tid. Trötthet vid tidig RA Ă€r nĂ€ra relaterat till sjukdomsaktivitet, smĂ€rta och aktivitets begrĂ€nsningar, men ocksĂ„ till mental hĂ€lsa och sömnstörningar. Slutsats: Ett beslutstrĂ€d har utformats för att predicera patienter med dĂ„lig prognos inom tidig RA. Studier av fler mĂ„tt pĂ„ sjukdoms- och funktionshindersprogression behövs vid konstruktion av prediktionsregler för god eller dĂ„lig prognos framledes. Att anvĂ€nda sig av kunskap frĂ„n olika experter – olika experters glasögon – vid sökandet efter mönster i stora datamĂ€ngder för att generera ny kunskap Ă€r en lovande metodik. Implementering av beslutsstöd bör göras under övervĂ€gande av incitamentsstrukturer, arbetsflöde och samarbetsformer

    Demand Forecasting: Evidence-Based Methods

    Get PDF
    In recent decades, much comparative testing has been conducted to determine which forecasting methods are more effective under given conditions. This evidence-based approach leads to conclusions that differ substantially from current practice. This paper summarizes the primary findings on what to do – and what not to do. When quantitative data are scarce, impose structure by using expert surveys, intentions surveys, judgmental bootstrapping, prediction markets, structured analogies, and simulated interaction. When quantitative data are abundant, use extrapolation, quantitative analogies, rule-based forecasting, and causal methods. Among causal methods, use econometrics when prior knowledge is strong, data are reliable, and few variables are important. When there are many important variables and extensive knowledge, use index models. Use structured methods to incorporate prior knowledge from experiments and experts’ domain knowledge as inputs to causal forecasts. Combine forecasts from different forecasters and methods. Avoid methods that are complex, that have not been validated, and that ignore domain knowledge; these include intuition, unstructured meetings, game theory, focus groups, neural networks, stepwise regression, and data mining
    • 

    corecore