1,618 research outputs found

    Pancreatic carcinoma, pancreatitis, and healthy controls: metabolite models in a three-class diagnostic dilemma

    Get PDF
    Metabolomics as one of the most rapidly growing technologies in the "-omics” field denotes the comprehensive analysis of low molecular-weight compounds and their pathways. Cancer-specific alterations of the metabolome can be detected by high-throughput mass-spectrometric metabolite profiling and serve as a considerable source of new markers for the early differentiation of malignant diseases as well as their distinction from benign states. However, a comprehensive framework for the statistical evaluation of marker panels in a multi-class setting has not yet been established. We collected serum samples of 40 pancreatic carcinoma patients, 40 controls, and 23 pancreatitis patients according to standard protocols and generated amino acid profiles by routine mass-spectrometry. In an intrinsic three-class bioinformatic approach we compared these profiles, evaluated their selectivity and computed multi-marker panels combined with the conventional tumor marker CA19-9. Additionally, we tested for non-inferiority and superiority to determine the diagnostic surplus value of our multi-metabolite marker panels. Compared to CA19-9 alone, the combined amino acid-based metabolite panel had a superior selectivity for the discrimination of healthy controls, pancreatitis, and pancreatic carcinoma patients [volume under ROC surface  (VUS)=0.891(95% CI 0.7940.968)]. [ {\text{volume under ROC surface}}\;\left( {\text{VUS}} \right) = 0. 8 9 1 { }\left( { 9 5\,\% {\text{ CI }}0. 7 9 4- 0. 9 6 8} \right)]. We combined highly standardized samples, a three-class study design, a high-throughput mass-spectrometric technique, and a comprehensive bioinformatic framework to identify metabolite panels selective for all three groups in a single approach. Our results suggest that metabolomic profiling necessitates appropriate evaluation strategies and—despite all its current limitations—can deliver marker panels with high selectivity even in multi-class setting

    Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data

    Get PDF
    Data plays a key role in the design of expert and intelligent systems and therefore, data preprocessing appears to be a critical step to produce high-quality data and build accurate machine learning models. Over the past decades, increasing attention has been paid towards the issue of class imbalance and this is now a research hotspot in a variety of fields. Although the resampling methods, either by under-sampling the majority class or by over-sampling the minority class, stand among the most powerful techniques to face this problem, their strengths and weaknesses have typically been discussed based only on the class imbalance ratio. However, several questions remain open and need further exploration. For instance, the subtle differences in performance between the over- and under-sampling algorithms are still under-comprehended, and we hypothesize that they could be better explained by analyzing the inner structure of the data sets. Consequently, this paper attempts to investigate and illustrate the effects of the resampling methods on the inner structure of a data set by exploiting local neighborhood information, identifying the sample types in both classes and analyzing their distribution in each resampled set. Experimental results indicate that the resampling methods that produce the highest proportion of safe samples and the lowest proportion of unsafe samples correspond to those with the highest overall performance. The significance of this paper lies in the fact that our findings may contribute to gain a better understanding of how these techniques perform on class-imbalanced data and why over-sampling has been reported to be usually more efficient than under-sampling. The outcomes in this study may have impact on both research and practice in the design of expert and intelligent systems since a priori knowledge about the internal structure of the imbalanced data sets could be incorporated to the learning algorithms

    A Patch-as-Filter Method for Same-Different Problems with Few-Shot Learning

    Get PDF
    Convolutional Neural Network (CNN) has undergone tremendous advancements in recent years, but visual reasoning tasks are still a huge undertaking, particularly in few-shot learning cases. Little is known, especially in solving the Same-Different (SD) task, which is a type of visual reasoning task that requires seeking pattern repetitions in a single image. In this thesis, we propose a patch-as-filter method focusing on solving the SD tasks with few-shot learning. Firstly, a patch in an individual image is detected. Then, transformations are learned to create sample-specific convolutional filters. After applying these filters on the original input images, we, lastly, acquire feature maps indicating the duplicate segments. We show experimentally that our approach achieves the state-of-the-art few-shot performance on the Synthetic Visual Reasoning Test (SVRT) SD tasks by accuracy going up above 30% on average, with only ten training samples. Besides that, to further evaluate the effectiveness of our approach, SVRT-like tasks are generated with more difficult visual reasoning concepts. The results suggest that the average accuracy is increased by approximately 10% compared to several popular few-shot algorithms. The method we suggest here has shed new light upon new CNN approaches in solving the SD tasks with few-shot learning

    Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

    Full text link
    LiDAR-based semantic scene understanding is an important module in the modern autonomous driving perception stack. However, identifying Out-Of-Distribution (OOD) points in a LiDAR point cloud is challenging as point clouds lack semantically rich features when compared with RGB images. We revisit this problem from the perspective of selective classification, which introduces a selective function into the standard closed-set classification setup. Our solution is built upon the basic idea of abstaining from choosing any known categories but learns a point-wise abstaining penalty with a marginbased loss. Synthesizing outliers to approximate unlimited OOD samples is also critical to this idea, so we propose a strong synthesis pipeline that generates outliers originated from various factors: unrealistic object categories, sampling patterns and sizes. We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance. We benchmark our method on SemanticKITTI and nuScenes and achieve state-of-the-art results. Risk-coverage analysis further reveals intrinsic properties of different methods. Codes and models will be publicly available.Comment: codes is available at https://github.com/Daniellli/PAD.gi

    Algorithm selection using edge ML and case-based reasoning

    Get PDF
    In practical data mining, a wide range of classification algorithms is employed for prediction tasks. However, selecting the best algorithm poses a challenging task for machine learning practitioners and experts, primarily due to the inherent variability in the characteristics of classification problems, referred to as datasets, and the unpredictable performance of these algorithms. Dataset characteristics are quantified in terms of meta-features, while classifier performance is evaluated using various performance metrics. The assessment of classifiers through empirical methods across multiple classification datasets, while considering multiple performance metrics, presents a computationally expensive and time-consuming obstacle in the pursuit of selecting the optimal algorithm. Furthermore, the scarcity of sufficient training data, denoted by dimensions representing the number of datasets and the feature space described by meta-feature perspectives, adds further complexity to the process of algorithm selection using classical machine learning methods. This research paper presents an integrated framework called eML-CBR that combines edge edge-ML and case-based reasoning methodologies to accurately address the algorithm selection problem. It adapts a multi-level, multi-view case-based reasoning methodology, considering data from diverse feature dimensions and the algorithms from multiple performance aspects, that distributes computations to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module. The eML-CBR framework offers a suite of contributions, including integrated algorithm selection, multi-view meta-feature extraction, innovative performance criteria, improved algorithm recommendation, data scarcity mitigation through incremental learning, and an open-source CBR module, reshaping research paradigms. The CBR module, trained on 100 datasets and tested with 52 datasets using 9 decision tree algorithms, achieved an accuracy of 94% for correct classifier recommendations within the top k=3 algorithms, making it highly suitable for practical classification applications

    Generalized Bayesian Network Classifiers

    Get PDF

    A computer vision system based on majority-voting ensemble neural network for the automatic classification of three chickpea varieties

    Get PDF
    Producción CientíficaSince different varieties of crops have specific applications, it is therefore important to properly identify each cultivar, in order to avoid fake varieties being sold as genuine, i.e., fraud. Despite that properly trained human experts might accurately identify and classify crop varieties, computer vision systems are needed since conditions such as fatigue, reproducibility, and so on, can influence the expert’s judgment and assessment. Chickpea (Cicer arietinum L.) is an important legume at the world-level and has several varieties. Three chickpea varieties with a rather similar visual appearance were studied here: Adel, Arman, and Azad chickpeas. The purpose of this paper is to present a computer vision system for the automatic classification of those chickpea varieties. First, segmentation was performed using an Hue Saturation Intensity (HSI) color space threshold. Next, color and textural (from the gray level co-occurrence matrix, GLCM) properties (features) were extracted from the chickpea sample images. Then, using the hybrid artificial neural network-cultural algorithm (ANN-CA), the sub-optimal combination of the five most effective properties (mean of the RGB color space components, mean of the HSI color space components, entropy of GLCM matrix at 90°, standard deviation of GLCM matrix at 0°, and mean third component in YCbCr color space) were selected as discriminant features. Finally, an ANN-PSO/ACO/HS majority voting (MV) ensemble methodology merging three different classifier outputs, namely the hybrid artificial neural network-particle swarm optimization (ANN-PSO), hybrid artificial neural network-ant colony optimization (ANN-ACO), and hybrid artificial neural network-harmonic search (ANN-HS), was used. Results showed that the ensemble ANN-PSO/ACO/HS-MV classifier approach reached an average classification accuracy of 99.10 ± 0.75% over the test set, after averaging 1000 random iterations.Unión Europea (project 585596-EPP-1-2017-1-DE-EPPKA2-CBHE-JP
    corecore