Search CORE

3,479 research outputs found

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Author: Francesco Marcelloni
Michela Fazzolari
Pietro Ducange
Publication venue
Publication date: 10/03/2020
Field of study

AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

Open Access Repository

Encapsulation of Soft Computing Approaches within Itemset Mining a A Survey

Author: Dr. Jyothi Pillai
O.P.Vyas
Publication venue: Global Journals Inc. (US)
Publication date: 07/06/2012
Field of study

Data Mining discovers patterns and trends by extracting knowledge from large databases. Soft Computing techniques such as fuzzy logic, neural networks, genetic algorithms, rough sets, etc. aims to reveal the tolerance for imprecision and uncertainty for achieving tractability, robustness and low-cost solutions. Fuzzy Logic and Rough sets are suitable for handling different types of uncertainty. Neural networks provide good learning and generalization. Genetic algorithms provide efficient search algorithms for selecting a model, from mixed media data. Data mining refers to information extraction while soft computing is used for information processing. For effective knowledge discovery from large databases, both Soft Computing and Data Mining can be merged. Association rule mining (ARM) and Itemset mining focus on finding most frequent item sets and corresponding association rules, extracting rare itemsets including temporal and fuzzy concepts in discovered patterns. This survey paper explores the usage of soft computing approaches in itemset utility mining

Global Journal of Computer Science and Technology (GJCST)

Intelligent XML Tag Classification Techniques for XML Encryption Improvement

Author: Ammari Faisal
Joan Lu
Maher Abur-rous
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Flexibility, friendliness, and adaptability have been key components to use XML to exchange information across different networks providing the needed common syntax for various messaging systems. However excess usage of XML as a communication medium shed the light on security standards used to protect exchanged messages achieving data confidentiality and privacy. This research presents a novel approach to secure XML messages being used in various systems with efficiency providing high security measures and high performance. system model is based on two major modules, the first to classify XML messages and define which parts of the messages to be secured assigning an importance level for each tag presented in XML message and then using XML encryption standard proposed earlier by W3C [3] to perform a partial encryption on selected parts defined in classification stage. As a result, study aims to improve both the performance of XML encryption process and bulk message handling to achieve data cleansing efficiently

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

Data mining in manufacturing: a review based on the kind of knowledge

Author: Alok Choudhary (1251471)
Jennifer Harding (1258389)
Manoj K. Tiwari (7197308)
Publication venue
Publication date: 01/01/2009
Field of study

In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques

Loughborough University Institutional Repository

A study on the personalization methods of the web

Author: Deramgozin M.M.
Faridpour M
Hajighorbani M
Reza Hashemi S.M.
Publication venue: 'African Journals Online (AJOL)'
Publication date: 22/08/2016
Field of study

Search engine personalization is one of the various deep personalization methods. It can be said that personalization systems that help users find the information they need requires the use of contextual and semantic information analysis techniques that exist in the field of data recovery such as web personalization and the process of optimizing the methods to get to web pages in a way that are consistent with the needs of each user. What helps the current problem of search engines and accelerate their performance, is providing a proper framework for finding the correct pattern considering great items in history of users. This approach improves the advising process of the search engines as well. The aim of this paper is to introduce some process improvement methods of correct patterns and analyze them. Here we will discuss the basic concepts of web personalization and consider the three approaches of web personalization and we evaluated the methods belonging to each of them.Keywords: personalization, search engine, user preferences, data mining method

AJOL - African Journals Online

Big data analytics for preventive medicine

Author: Imran M
Razzak MI
Xu G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations

Deakin Research Online

OPUS - University of Technology Sydney

Federation ResearchOnline

Data Mining for Marketing

Author: Khan Rafi Ahmad
Publication venue: Journal of Marketing and Consumer Research
Publication date: 29/03/2015
Field of study

This paper gives a brief insight about data mining, its process and the various techniques used for it in the field of marketing. Data mining is the process of extracting hidden valuable information from the data in given data sets .In this paper cross industry standard procedure for data mining is explained along with the various techniques used for it. With growing volume of data every day, the need for data mining in marketing is also increasing day by day. It is a powerful technology to help companies focus on the most important information in their data warehouses. Data mining is actually the process of collecting data from different sources and then interpreting it and finally converting it into useful information which helps in increasing the revenue, curtailing costs thereby providing a competitive edge to the organisation

International Institute for Science, Technology and Education (IISTE): E-Journals

Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

Author: He Yuanchen
Publication venue: ScholarWorks @ Georgia State University
Publication date: 04/12/2006
Field of study

Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

ScholarWorks @ Georgia State University