17,184 research outputs found
Semantic process mining tools: core building blocks
Process mining aims at discovering new knowledge based on information hidden in event logs. Two important enablers for such analysis are powerful process mining techniques and the omnipresence of event logs in today's information systems. Most information systems supporting (structured) business processes (e.g. ERP, CRM, and workflow systems) record events in some form (e.g. transaction logs, audit trails, and database tables). Process mining techniques use event logs for all kinds of analysis, e.g., auditing, performance analysis, process discovery, etc. Although current process mining techniques/tools are quite mature, the analysis they support is somewhat limited because it is purely based on labels in logs. This means that these techniques cannot benefit from the actual semantics behind these labels which could cater for more accurate and robust analysis techniques. Existing analysis techniques are purely syntax oriented, i.e., much time is spent on filtering, translating, interpreting, and modifying event logs given a particular question. This paper presents the core building blocks necessary to enable semantic process mining techniques/tools. Although the approach is highly generic, we focus on a particular process mining technique and show how this technique can be extended and implemented in the ProM framework tool
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Recommended from our members
A mass spectrometry-guided genome mining approach for natural product peptidogenomics.
Peptide natural products show broad biological properties and are commonly produced by orthogonal ribosomal and nonribosomal pathways in prokaryotes and eukaryotes. To harvest this large and diverse resource of bioactive molecules, we introduce here natural product peptidogenomics (NPP), a new MS-guided genome-mining method that connects the chemotypes of peptide natural products to their biosynthetic gene clusters by iteratively matching de novo tandem MS (MS(n)) structures to genomics-based structures following biosynthetic logic. In this study, we show that NPP enabled the rapid characterization of over ten chemically diverse ribosomal and nonribosomal peptide natural products of previously unidentified composition from Streptomycete bacteria as a proof of concept to begin automating the genome-mining process. We show the identification of lantipeptides, lasso peptides, linardins, formylated peptides and lipopeptides, many of which are from well-characterized model Streptomycetes, highlighting the power of NPP in the discovery of new peptide natural products from even intensely studied organisms
Supporting Data mining of large databases by visual feedback queries
In this paper, we describe a query system that provides visual relevance feedback in querying large databases. Our goal is to support the process of data mining by representing as many data items as possible on the display. By arranging and coloring the data items as pixels according to their relevance for the query, the user gets a visual impression of the resulting data set. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. Furthermore, by using multiple windows for different parts of a complex query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. Our system allows to represent the largest amount of data that can be visualized on current display technology, provides valuable feedback in querying the database, and allows the user to find results which, otherwise, would remain hidden in the database
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
- …