2,143 research outputs found
An overview of recent distributed algorithms for learning fuzzy models in Big Data classification
AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability
Multi-view Fuzzy Representation Learning with Rules based Model
Unsupervised multi-view representation learning has been extensively studied
for mining multi-view data. However, some critical challenges remain. On the
one hand, the existing methods cannot explore multi-view data comprehensively
since they usually learn a common representation between views, given that
multi-view data contains both the common information between views and the
specific information within each view. On the other hand, to mine the nonlinear
relationship between data, kernel or neural network methods are commonly used
for multi-view representation learning. However, these methods are lacking in
interpretability. To this end, this paper proposes a new multi-view fuzzy
representation learning method based on the interpretable Takagi-Sugeno-Kang
(TSK) fuzzy system (MVRL_FS). The method realizes multi-view representation
learning from two aspects. First, multi-view data are transformed into a
high-dimensional fuzzy feature space, while the common information between
views and specific information of each view are explored simultaneously.
Second, a new regularization method based on L_(2,1)-norm regression is
proposed to mine the consistency information between views, while the geometric
structure of the data is preserved through the Laplacian graph. Finally,
extensive experiments on many benchmark multi-view datasets are conducted to
validate the superiority of the proposed method.Comment: This work has been accepted by IEEE Transactions on Knowledge and
Data Engineerin
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
Uncertainty and Interpretability Studies in Soft Computing with an Application to Complex Manufacturing Systems
In systems modelling and control theory, the benefits of applying neural networks have been extensively studied. Particularly in manufacturing processes, such as the prediction of mechanical properties of heat treated steels. However, modern industrial processes usually involve large amounts of data and a range of non-linear effects and interactions that might hinder their model interpretation. For example, in steel manufacturing the understanding of complex mechanisms that lead to the mechanical properties which are generated by the heat treatment process is vital. This knowledge is not available via numerical models, therefore an experienced metallurgist estimates the model parameters to obtain the required properties. This human knowledge and perception sometimes can be imprecise leading to a kind of cognitive uncertainty such as vagueness and ambiguity when making decisions. In system classification, this may be translated into a system deficiency - for example, small input changes in system attributes may result in a sudden and inappropriate change for class assignation.
In order to address this issue, practitioners and researches have developed systems that are functional equivalent to fuzzy systems and neural networks. Such systems provide a morphology that mimics the human ability of reasoning via the qualitative aspects of fuzzy information rather by its quantitative analysis. Furthermore, these models are able to learn from data sets and to describe the associated interactions and non-linearities in the data. However, in a like-manner to neural networks, a neural fuzzy system may suffer from a lost of interpretability and transparency when making decisions. This is mainly due to the application of adaptive approaches for its parameter identification.
Since the RBF-NN can be treated as a fuzzy inference engine, this thesis presents several methodologies that quantify different types of uncertainty and its influence on the model interpretability and transparency of the RBF-NN during its parameter identification. Particularly, three kind of uncertainty sources in relation to the RBF-NN are studied, namely: entropy, fuzziness and ambiguity.
First, a methodology based on Granular Computing (GrC), neutrosophic sets and the RBF-NN is presented. The objective of this methodology is to quantify the hesitation produced during the granular compression at the low level of interpretability of the RBF-NN via the use of neutrosophic sets. This study also aims to enhance the disitnguishability and hence the transparency of the initial fuzzy partition. The effectiveness of the proposed methodology is tested against a real case study for the prediction of the properties of heat-treated steels.
Secondly, a new Interval Type-2 Radial Basis Function Neural Network (IT2-RBF-NN) is introduced as a new modelling framework. The IT2-RBF-NN takes advantage of the functional equivalence between FLSs of type-1 and the RBF-NN so as to construct an Interval Type-2 Fuzzy Logic System (IT2-FLS) that is able to deal with linguistic uncertainty and perceptions in the RBF-NN rule base. This gave raise to different combinations when optimising the IT2-RBF-NN parameters.
Finally, a twofold study for uncertainty assessment at the high-level of interpretability of the RBF-NN is provided. On the one hand, the first study proposes a new methodology to quantify the a) fuzziness and the b) ambiguity at each RU, and during the formation of the rule base via the use of neutrosophic sets theory. The aim of this methodology is to calculate the associated fuzziness of each rule and then the ambiguity related to each normalised consequence of the fuzzy rules that result from the overlapping and to the choice with one-to-many decisions respectively. On the other hand, a second study proposes a new methodology to quantify the entropy and the fuzziness that come out from the redundancy phenomenon during the parameter identification.
To conclude this work, the experimental results obtained through the application of the proposed methodologies for modelling two well-known benchmark data sets and for the prediction of mechanical properties of heat-treated steels conducted to publication of three articles in two peer-reviewed journals and one international conference
Developed Clustering Algorithms for Engineering Applications: A Review
Clustering algorithms play a pivotal role in the field of engineering, offering valuable insights into complex datasets. This review paper explores the landscape of developed clustering algorithms with a focus on their applications in engineering. The introduction provides context for the significance of clustering algorithms, setting the stage for an in-depth exploration. The overview section delineates fundamental clustering concepts and elucidates the workings of these algorithms. Categorization of clustering algorithms into partitional, hierarchical, and density-based forms lay the groundwork for a comprehensive discussion. The core of the paper delves into an extensive review of clustering algorithms tailored for engineering applications. Each algorithm is scrutinized in dedicated subsections, unraveling their specific contributions, applications, and advantages. A comparative analysis assesses the performance of these algorithms, delineating their strengths and limitations. Trends and advancements in the realm of clustering algorithms for engineering applications are thoroughly examined. The review concludes with a reflection on the challenges faced by existing clustering algorithms and proposes avenues for future research. This paper aims to provide a valuable resource for researchers, engineers, and practitioners, guiding them in the selection and application of clustering algorithms for diverse engineering scenarios
Harnessing Deep Learning Techniques for Text Clustering and Document Categorization
This research paper delves into the realm of deep text clustering algorithms with the aim of enhancing the accuracy of document classification. In recent years, the fusion of deep learning techniques and text clustering has shown promise in extracting meaningful patterns and representations from textual data. This paper provides an in-depth exploration of various deep text clustering methodologies, assessing their efficacy in improving document classification accuracy. Delving into the core of deep text clustering, the paper investigates various feature representation techniques, ranging from conventional word embeddings to contextual embeddings furnished by BERT and GPT models.By critically reviewing and comparing these algorithms, we shed light on their strengths, limitations, and potential applications. Through this comprehensive study, we offer insights into the evolving landscape of document analysis and classification, driven by the power of deep text clustering algorithms.Through an original synthesis of existing literature, this research serves as a beacon for researchers and practitioners in harnessing the prowess of deep learning to enhance the accuracy of document classification endeavors
Application of Computational Intelligence Techniques to Process Industry Problems
In the last two decades there has been a large progress in the computational
intelligence research field. The fruits of the effort spent on the research in the discussed
field are powerful techniques for pattern recognition, data mining, data modelling, etc.
These techniques achieve high performance on traditional data sets like the UCI
machine learning database. Unfortunately, this kind of data sources usually represent
clean data without any problems like data outliers, missing values, feature co-linearity,
etc. common to real-life industrial data. The presence of faulty data samples can have
very harmful effects on the models, for example if presented during the training of the
models, it can either cause sub-optimal performance of the trained model or in the worst
case destroy the so far learnt knowledge of the model. For these reasons the application
of present modelling techniques to industrial problems has developed into a research
field on its own. Based on the discussion of the properties and issues of the data and the
state-of-the-art modelling techniques in the process industry, in this paper a novel
unified approach to the development of predictive models in the process industry is
presented
- …