229,328 research outputs found
A Framework for Enterprise Knowledge Discovery from Databases
Knowledge discovery from large databases has become an emerging research topic and application area in recent years primarily because of the successful introduction of large business information systems to enterprises in the electronic business era. However, transferring subjects/problems from managerial perspective to data mining tasks from information technology perspective requires multidisciplinary domain knowledge. This paper proposes a practical framework for enterprise knowledge discovery in a systematical manner. The six-step framework employs the cause-andeffect diagram to model enterprise processes, tasks and attributes corresponding diagram to define data mining tasks, and multi-criteria method to assess the mined results in the form of association rules. This research also applied the proposed framework to a real case study of knowledge discovery from service records. The mining results have been proven useful in product design and quality improvement and the framework has demonstrated its applicability of guiding an enterprise to discover knowledge from historical data to tackle existing problems
Sequential Patterns Post-processing for Structural Relation Patterns Mining
Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential
occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there
exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences.
This article begins with the introduction of a model for the representation of sequential patterns—Sequential
Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for
the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing
of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing
is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three
component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides
an efficient method for structural knowledge discover
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
Biomedical knowledge is growing in an astounding pace with a majority of this
knowledge is represented as scientific publications. Text mining tools and
methods represents automatic approaches for extracting hidden patterns and
trends from this semi structured and unstructured data. In Biomedical Text
mining, Literature Based Discovery (LBD) is the process of automatically
discovering novel associations between medical terms otherwise mentioned in
disjoint literature sets. LBD approaches proven to be successfully reducing the
discovery time of potential associations that are hidden in the vast amount of
scientific literature. The process focuses on creating concept profiles for
medical terms such as a disease or symptom and connecting it with a drug and
treatment based on the statistical significance of the shared profiles. This
knowledge discovery approach introduced in 1989 still remains as a core task in
text mining. Currently the ABC principle based two approaches namely open
discovery and closed discovery are mostly explored in LBD process. This review
starts with general introduction about text mining followed by biomedical text
mining and introduces various literature resources such as MEDLINE, UMLS, MESH,
and SemMedDB. This is followed by brief introduction of the core ABC principle
and its associated two approaches open discovery and closed discovery in LBD
process. This review also discusses the deep learning applications in LBD by
reviewing the role of transformer models and neural networks based LBD models
and its future aspects. Finally, reviews the key biomedical discoveries
generated through LBD approaches in biomedicine and conclude with the current
limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table
An Analysis on Different Methods of Data Mining Techniques with Its Purposes and Issues
It is not wrong to say that the advancement in the technology has a great impact on the human society. It has been altered the way of doing business, providing and receiving the services, managing the organizations etc. The most direct effect is the completed change of 3 information collection, conveying, and exchange. Data mining is an emerging domain that is specifically applied to extract the meaningful data from the large amount of available contents. Various authors have been developed the various prominent methods to for data mining. This study generates a review to the data mining and along with this a brief introduction to the knowledge discovery data base is also given in this work. Difference data extraction processes are cover under this by author
Diagnosis of diseases using data mining
Introduction: In the information age, data are the most important asset for health organizations. In the case of using data in useful and optimal manner, they can become financial resources for organization. Data mining is an appropriate method to transform this potential value into strategic information. Data mining means extraction of hidden information, recognition of hidden relationships and patterns, and in general, discovery of useful knowledge at high volume. The objective of this review paper was to evaluate using data mining in diagnoses of diseases.
Methods: This research is a review paper conducted based on a structured review of the papers published in Science Direct, Pubmed, Google Scholar, SID, Magiran (between years 2005 and 2015) and books related to using data mining in medical science and using it in diagnose of diseases with related keywords.
Results: Nowadays, data mining is used in many medical science studies, including diagnosis of diseases, discovering the hidden patterns in data, and so on. New ideas such as discovery of Knowledge from Discovery and Data Mining Database, which includes data mining techniques, have found more popularity and they has becomedesired research tool for researchers. Researchers can use them to identify patterns and relationshipsamong great number of variables. Using them, researchers have been able to predict theresults obtained from one disease by using information stores available in databases.
Several studies have indicated that data mining is used widely in diagnosis of diseases based on types of information (medical images, characteristics of patients, and so on), such as tuberculosis, types of cancers, infectious diseases, and diagnosis of anomalies rarely diagnosed by human (spots and particular points within aye, which is the symptom of onset of blindness resulting from diabetes), determining type of behavior with patients, and predicting the success rate of surgical surgeries, determining the success rate of therapeutic methods in coping with incurable diseases, and so on.
Conclusion: One of the most important challenging topics in healthcare is transformation of raw clinical data into meaningful information following continuous generation of great number of data. In current competitive environment, health organizations using technologies such as data mining to improve healthcare quality will achieve success faster. Many of research centers in Iran are faced with large volume of information, which is not analyzed at all or will be time-consuming due to using traditional methods, even in the case of using analysis and converting them to knowledge. In light of using data mining and its implementation, health organizations can transform the data into a powerful and competitive tool and take new steps in preventing, diagnosing, treating, and providing high-quality services for clients. 
Data mining predictive models for pervasive intelligent decision support in intensive care medicine
The introduction of an Intelligent Decision Support System (IDSS) in a critical area like the Intensive
Medicine is a complex and difficult process. In this area, their professionals don’t have much time to
document the cases, because the patient direct care is always first. With the objective to reduce significantly
the manual records and, enabling, at the same time, the possibility of developing an IDSS which can help in
the decision making process, all data acquisition process and knowledge discovery in database phases were
automated. From the data acquisition to the knowledge discovering, the entire process is autonomous and
executed in real-time. On-line induced data mining models were used to predict organ failure and outcome.
Preliminary results obtained with a limited population of patients showed that this approach can be applied
successfully.Fundação para a Ciência e a Tecnologia (FCT
Knowledge discovery methodology for medical reports
Medical reports contain valuable information, not only for the patient that waits for the results but also the latent knowledge that is possible to extract from them. The recent introduction of standard structured formats like the Digital Imaging and Communications in Medicine Structured Report and the Clinical Document Architecture Health Level Seven provide an efficient generation, distribution, and management mechanism. Also, they provide an intuitive and effective manner of information representation, unlike the traditional plain text format. In this paper we present a knowledge discovery methodology for structured report interchange based on plain text medical reports using YALE, a leading open-source data mining tool and Open-ESB platform that provides conversion, parsing, different protocols and message formats interchange capabilities.Centro de Imagiologia da Trindade (CIT
Multidimensional Prediction Models When the Resolution Context Changes
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23525-7_31Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension). While the domain and data are the same, the operating context is different. We explore several approaches for multidimensional data when predictions have to be made at different levels (or contexts) of aggregation. One method relies on the same resolution, another approach aggregates predictions bottom-up, a third approach disaggregates predictions top-down and a final technique corrects predictions using the relation between levels. We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains.This work was supported by the Spanish MINECO under grants TIN 2010-21062-C02-02 and TIN 2013-45732-C4-1-P, and the REFRAME project, granted by the European Coordinated Research on Longterm Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA), and funded by MINECO in Spain (PCIN-2013-037) and by Generalitat Valenciana PROMETEOII2015/013.MartÃnez Usó, A.; Hernández Orallo, J. (2015). Multidimensional Prediction Models When the Resolution Context Changes. En Machine Learning and Knowledge Discovery in Databases. Springer. 509-524. https://doi.org/10.1007/978-3-319-23525-7_31S509524Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE 1997, pp. 232–243. IEEE Computer Society (1997)Bella, A., Ferri, C., Hernández-Orallo, J., RamÃrez-Quintana, M.: Quantification via probability estimators. In: IEEE ICDM, pp. 737–742 (2010)Bella, A., Ferri, C., Hernández-Orallo, J., RamÃrez-Quintana, M.J.: Aggregative quantification for regression. DMKD 28(2), 475–518 (2014)Bickel, R.: Multilevel analysis for applied research: It’s just regression! Guilford Press (2012)Cabibbo, L., Torlone, R.: A logical approach to multidimensional databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, p. 183. Springer, Heidelberg (1998)Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM Sigmod Record 26(1), 65–74 (1997)Chen, B.C.: Cube-Space Data Mining. ProQuest (2008)Chen, B.C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 982–993 (2005)Datahub: Car fuel consumptions and emissions 2000–2013 (2013). http://datahub.io/dataset/car-fuel-consumptions-and-emissionsDhurandhar, A.: Using coarse information for real valued prediction. Data Mining and Knowledge Discovery 27(2), 167–192 (2013)Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008)Goldstein, H.: Multilevel Statistical Models, vol. 922. John Wiley & Sons (2011)Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Intl. J. of Coop. Information Systems 7, 215–247 (1998)Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. 11(1), 10–18 (2009)Hernández-Orallo, J.: Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data 8(3) (2014)IBM Corporation: Introduction to Aroma and SQL (2006). http://www.ibm.com/developerworks/data/tutorials/dm0607cao/dm0607cao.htmlKamber, M., Jenny, J.H., Chiang, Y., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: KDD, pp. 207–210 (1997)Lin, T., Yao, Y., Zadeh, L.: Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing. Physica-Verlag HD (2002)Páircéir, R., McClean, S., Scotney, B.: Discovery of multi-level rules and exceptions from a distributed database. In: Proc. of the 6th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pp. 523–532. ACM (2000)Pastor, O., Casamayor, J.C., Celma, M., Mota, L., Pastor, M.A., Levin, A.M.: Conceptual Modeling of Human Genome: Integration Challenges. In: Düsterhöft, A., Klettke, M., Schewe, K.-D. (eds.) Conceptual Modelling and Its Theoretical Foundations. LNCS, vol. 7260, pp. 231–250. Springer, Heidelberg (2012)Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1–2), 65–105 (2006)Team, R., et al.: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012)Ramakrishnan, R., Chen, B.C.: Exploratory mining in cube space. Data Mining and Knowledge Discovery 15(1), 29–54 (2007)Raudenbush, S.W., Bryk, A.S.: Hierarchical linear models: applications and data analysis methods, vol. 1. Sage (2002)UCI Repository: UJIIndoorLoc data set (2014). http://archive.ics.uci.edu/ml/datasets/UJIIndoorLocVassiliadis, P.: Modeling multidimensional databases, cubes and cube operations. In: Proc. of the 10th SSDBM Conference, pp. 53–62 (1998
Aplicación de los árboles de decisión en la identificación de patrones de lesiones fatales por causa externa en el municipio de Pasto, Colombia
Introduction: The Pan American Health Organization (PHO) and the World Health Organization (WHO) accepted, since the year 1993 and 1996 respectively, that violence is a public health problem, a situation that is corroborated in the report on violence and health, in which Latin America presented a homicide rate of 18 per 100,000 people, and it is considered one of the most violent regions in the world. Objective: To detect criminal patterns with data mining techniques in the Crime Observatory of the municipality of Pasto (Colombia). Materials and methods: Cross Industry Standard Process for Data Mining (CRISP-DM) was applied, which is one of the methodologies used in the development of data mining projects in academic and industrial environments. The source of information was the Crime Observatory of the municipality of Pasto, where the historical clean and transformed figures on the injuries of external cause (fatal and nonfatal) recorded in 11 years are stored. Results: A decision tree-based classification model was built that allowed the discovery of patterns of deaths from external causes. In the case of homicide, these happened mostly in the commune 5 in Pasto under the following circumstances: during the weekends, in the early morning, in the second semester of the year and in the public thoroughfare; besides, the victims were adult men of various professions; and the cause of the homicides were quarrels and they were produced with a fire gun. Conclusion: The generated knowledge will help government and security agencies make effective decisions regarding the implementation of crime prevention and citizen security plan
- …