1,221 research outputs found
Noise resistant generalized parametric validity index of clustering for gene expression data
This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements
Dual-layer network representation exploiting information characterization
In this paper, a logical dual-layer representation approach is proposed to facilitate the analysis of directed and weighted complex networks. Unlike the single logical layer structure, which was widely used for the directed and weighted flow graph, the proposed approach replaces the single layer with a dual-layer structure, which introduces a provider layer and a requester layer. The new structure provides the characterization of the nodes by the information, which they provide to and they request from the network. Its features are explained and its implementation and visualization are also detailed. We also design two clustering methods with different strategies respectively, which provide the analysis from different points of view. The effectiveness of the proposed approach is demonstrated using a simplified example. By comparing the graph layout with the conventional directed graph, the new dual-layer representation reveals deeper insight into the complex networks and provides more opportunities for versatile clustering analysis.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)
Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments
© 2013 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, provided the original author and source are credited.The binarization of consensus partition matrices (Bi-CoPaM) method has, among its unique features, the ability to perform ensemble clustering over the same set of genes from multiple microarray datasets by using various clustering methods in order to generate tunable tight clusters. Therefore, we have used the Bi-CoPaM method to the most synchronized 500 cell-cycle-regulated yeast genes from different microarray datasets to produce four tight, specific and exclusive clusters of co-expressed genes. We found 19 genes formed the tightest of the four clusters and this included the gene CMR1/YDL156W, which was an uncharacterized gene at the time of our investigations. Two very recent proteomic and biochemical studies have independently revealed many facets of CMR1 protein, although the precise functions of the protein remain to be elucidated. Our computational results complement these biological results and add more evidence to their recent findings of CMR1 as potentially participating in many of the DNA-metabolism processes such as replication, repair and transcription. Interestingly, our results demonstrate the close co-expressions of CMR1 and the replication protein A (RPA), the cohesion complex and the DNA polymerases α, δ and ɛ, as well as suggest functional relationships between CMR1 and the respective proteins. In addition, the analysis provides further substantial evidence that the expression of the CMR1 gene could be regulated by the MBF complex. In summary, the application of a novel analytic technique in large biological datasets has provided supporting evidence for a gene of previously unknown function, further hypotheses to test, and a more general demonstration of the value of sophisticated methods to explore new large datasets now so readily generated in biological experiments.National Institute for Health Researc
Recommended from our members
From Multiple Independent Metrics to Single Performance Measure Based on Objective Function
Copyright © The Author 2023. It is extremely common in engineering to design algorithms to perform various tasks. In data-driven decision making in any field one needs to ascertain the quality of an algorithm. Therefore, a robust assessment of algorithms is essential in deciding the best algorithm as well as in improving algorithms. To perform such an assessment objectively is obvious in the case of a single performance metric, but it is unclear in the case of multiple metrics. Nonetheless, F1 measure is widely used in cases with two metrics; F1 measure represents the harmonic mean (HM) of two metrics. Of course, there are other means, e.g., the arithmetic mean (AM) and the geometric mean (GM). As motivations for using them are intuitive and none of them are based on any objective function, it is difficult to judge objectively which is the best one. In this paper, the single metric case is examined to develop two objective functions that are applicable for any number of metrics. These two objective functions lead to two different performance measures - the distance from the origin (DO) and the distance from the ideal position (DIP). It introduces a new concept of the remaining phase space for the evaluation of the quality of a performance measure. On further and closer examinations of the original goal and the phase space of the metrics, amongst these five measures, either HM or DIP is found to be the best. Specifically, it is found that HM is the best measure at the lower performance end, while DIP is clearly the best measure at the higher performance end and is of much practical interest. Rules for deciding the best algorithm and the order of a set of algorithms are presented. These results are derived in the context of multiple independent and bounded metrics. Furthermore, several properties and detailed discussions are provided, following which some published results are reviewed in the present context to elucidate some points.10.13039/501100001809-NSFC, China, through “111 Project” (Grant Number: B20038
Recommended from our members
Intrinsic dimension estimation-based feature selection and multinomial logistic regression for classification of bearing faults using compressively sampled vibration signals
Acknowledgements: Authors wish to thank Brunel University London for their support. Data Availability Statement: The data presented in the first case study may be available on request from the first author, Hosameldin O. A. Ahmed.Copyright: © 2022 by the authors. As failures of rolling bearings lead to major failures in rotating machines, recent vibration-based rolling bearing fault diagnosis techniques are focused on obtaining useful fault features from the huge collection of raw data. However, too many features reduce the classification accuracy and increase the computation time. This paper proposes an effective feature selection technique based on intrinsic dimension estimation of compressively sampled vibration signals. First, compressive sampling (CS) is used to get compressed measurements from the collected raw vibration signals. Then, a global dimension estimator, the geodesic minimal spanning tree (GMST), is employed to compute the minimal number of features needed to represent efficiently the compressively sampled signals. Finally, a feature selection process, combining the stochastic proximity embedding (SPE) and the neighbourhood component analysis (NCA), is used to select fewer features for bearing fault diagnosis. With regression analysis-based predictive modelling technique and the multinomial logistic regression (MLR) classifier, the selected features are assessed in two case studies of rolling bearings vibration signals under different working loads. The experimental results demonstrate that the proposed method can successfully select fewer features, with which the MLR-based trained model achieves high classification accuracy and significantly reduced computation times compared to published research.This research received no external funding
Recommended from our members
Convolutional-Transformer Model with Long-Range Temporal Dependencies for Bearing Fault Diagnosis Using Vibration Signals
Data Availability Statement:
The data presented in the first case study may be available on request from the first author, Hosameldin O. A. Ahmed.Copyright © 2023 by the authors. Fault diagnosis of bearings in rotating machinery is a critical task. Vibration signals are a valuable source of information, but they can be complex and noisy. A transformer model can capture distant relationships, which makes it a promising solution for fault diagnosis. However, its application in this field has been limited. This study aims to contribute to this growing area of research by proposing a novel deep-learning architecture that combines the strengths of CNNs and transformer models for effective fault diagnosis in rotating machinery. Thus, it captures both local and long-range temporal dependencies in the vibration signals. The architecture starts with CNN-based feature extraction, followed by temporal relationship modelling using the transformer. The transformed features are used for classification. Experimental evaluations are conducted on two datasets with six and ten health conditions. In both case studies, the proposed model achieves high accuracy, precision, recall, F1-score, and specificity all above 99% using different training dataset sizes. The results demonstrate the effectiveness of the proposed method in diagnosing bearing faults. The convolutional-transformer model proves to be a promising approach for bearing fault diagnosis. The method shows great potential for improving the accuracy and efficiency of fault diagnosis in rotating machinery.This research received no external funding
Recommended from our members
High Performance Breast Cancer Diagnosis from Mammograms Using Mixture of Experts with EfficientNet Features (MoEffNet)
Data Statement: In this study, we use three publicly available datasets: MIAS (Mammographic Image Analysis Society database) (https://www.repository.cam.ac.uk/items/b6a97f0c-3b9b40ad-8f18-3d121eef1459 ), CBIS-DDSM (Curated Breast Imaging Subset of the Digital Database for Screening Mammography) (https://www.cancerimagingarchive.net/collection/cbisddsm/ ), and INbreast (https://medicalresearch.inescporto.pt/breastresearch/index.p hp/Get_INbreast_Database ).As breast cancer is a leading cause of death for women globally, there is a critical need for better diagnostic tools. To address this challenge, we propose MoEffNet, a cutting-edge framework that offers high-performance breast cancer diagnosis. MoEffNet is characterised by its innovative hybrid integration of EfficientNet and Mixture of Experts (MoEs), two powerful techniques developed to enhance accuracy and efficiency. EfficientNet, known for its robust feature extraction capabilities, utilises compound scaling and depth-wise separable convolutions to capture image features across multiple levels of abstraction. This is combined with MoEs framework, which employs specialised expert networks to analyse distinct aspects of mammograms. MoEffNet analyses features at various levels: low-level for basic patterns, mid-level for detailed analyses, and high-level for complex contents. Features extracted from various EfficientNet model stages are assigned to specialised experts to optimise diagnostic precision. A dynamic gating mechanism (EffiGate) is introduced to ensure that the most relevant experts contribute to each diagnostic decision, by dynamically adjusting their influence based on input data characteristics. This approach ensures that the most effective experts are utilised for each case, resulting in superior accuracy. The scalability of MoEffNet is highlighted by its ability to adapt to various computational constraints and accuracy requirements, using EfficientNet’s architecture, which ranges from B0 to B7 models. We have validated MoEffNet’s effectiveness on three mammographic datasets (MIAS, CBIS-DDSM, and INbreast) achieving outstanding results (AUC > 0.99 across all datasets), outperforming existing methods. Particularly, EfficientNet B1 and B2 models with three or four experts achieved the highest accuracy, demonstrating MoEffNet’s potential as a robust diagnostic tool for early breast cancer detection. Through its innovative hybrid model, robust feature extraction, dynamic gating, and specialised expert networks, MoEffNet sets a new benchmark in automated mammogram analysis, offering a powerful tool for more accurate and reliable breast cancer diagnosis.10.13039/501100007914-Brunel University London “This work was supported in part by Brunel University London research funding scheme.”
Recommended from our members
Three-stage Hybrid Fault Diagnosis for Rolling Bearings with Compressively-sampled data and Subspace Learning Techniques
To avoid the burden of much storage requirements and processing time, this paper proposes a three-stage hybrid method, Compressive Sampling with Correlated Principal and Discriminant Components (CSCPDC), for bearing faults diagnosis based on compressed measurements. In the first stage, Compressive Sampling (CS) is utilised to obtain compressively-sampled signals from raw vibration data. In the second stage, an effective multi-step feature learning algorithm obtains fewer features from correlated principal and discriminant attributes from the compressively-sampled signals, which are then concatenated to increase the performance. In the third stage, with these concatenated features, Multi-class Support Vector Machine (SVM) is used to train, validate, and classify bearing faults. Results show that the proposed method, CS-CPDC, offers high classification accuracies, reduced computation time, and storage requirement, with fewer measurements.National Science Foundation of China; National Science Foundation of Shanghai
Recommended from our members
Modulation classification in MIMO fading channels via expectation maximization with non-data-aided initialization
- …