4,472 research outputs found
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
A Comparison Analysis of Machine Learning Algorithms on Cardiovascular Disease Prediction
People nowadays are engrossed in their daily routines, concentrating on their jobs and other responsibilities while ignoring their health. Because of their hurried lifestyles and disregard for their health, the number of people becoming ill grows daily. Furthermore, most of the population suffers from a disease such as cardiovascular disease. Cardiovascular disease kills 35% of the world's population, according to W.H.O. A person's life can be saved if a heart disease diagnosis is made early enough. Still, it can also be lost if the diagnosis is constructed incorrectly. Therefore, predicting heart disease will become increasingly relevant in the medical sector. The volume of data collected by the medical industry or hospitals, on the other hand, can be overwhelming at times. Time-series forecasting and processing using machine learning algorithms can help healthcare practitioners become more efficient. In this study, we discussed heart disease and its risk factors and machine learning techniques and compared various heart disease prediction algorithms. Predicting and assessing heart problems is the goal of this research
A Survey on Feature Selection Algorithms
One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.
DOI: 10.17762/ijritcc2321-8169.16043
Analysis of Dimensionality Reduction Techniques on Big Data
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Detection of Epileptic Seizures on EEG Signals Using ANFIS Classifier, Autoencoders and Fuzzy Entropies
Epileptic seizures are one of the most crucial
neurological disorders, and their early diagnosis will help the
clinicians to provide accurate treatment for the patients. The
electroencephalogram (EEG) signals are widely used for epileptic
seizures detection, which provides specialists with substantial
information about the functioning of the brain. In this paper,
a novel diagnostic procedure using fuzzy theory and deep
learning techniques is introduced. The proposed method is
evaluated on the Bonn University dataset with six classification
combinations and also on the Freiburg dataset. The tunable-
Q wavelet transform (TQWT) is employed to decompose the
EEG signals into different sub-bands. In the feature extraction
step, 13 different fuzzy entropies are calculated from different
sub-bands of TQWT, and their computational complexities are
calculated to help researchers choose the best set for various
tasks. In the following, an autoencoder (AE) with six layers
is employed for dimensionality reduction. Finally, the standard
adaptive neuro-fuzzy inference system (ANFIS), and also its
variants with grasshopper optimization algorithm (ANFIS-GOA),
particle swarm optimization (ANFIS-PSO), and breeding swarm
optimization (ANFIS-BS) methods are used for classification.
Using our proposed method, ANFIS-BS method has obtained
an accuracy of 99.7
- …