195 research outputs found

    BS-KNN: An Effective Algorithm for Predicting Protein Subchloroplast Localization

    Get PDF
    Chloroplasts are organelles found in cells of green plants and eukaryotic algae that conduct photosynthesis. Knowing a protein’s subchloroplast location provides in-depth insights about the protein’s function and the microenvironment where it interacts with other molecules. In this paper, we present BS-KNN, a bit-score weighted K-nearest neighbor method for predicting proteins’ subchloroplast locations. The method makes predictions based on the bit-score weighted Euclidean distance calculated from the composition of selected pseudo-amino acids. Our method achieved 76.4% overall accuracy in assigning proteins to 4 subchloroplast locations in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the method achieved a consistent overall accuracy of 76.0%. The method was also applied to predict subchloroplast locations of proteins in the chloroplast proteome and validated against proteins in Arabidopsis thaliana. The software and datasets of the proposed method are available at https://edisk.fandm.edu/jing.hu/bsknn/bsknn.html

    Memory-Based Learning of Latent Structures for Generative Adversarial Networks

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김건희.본 연구는 Generative Adversarial Network (GAN) 모델의 학습 과정에서 발생하는 두가지 문제점을 해결하는 방안을 제시하였다. 먼저, 일반적인 GAN 모델은 사진과 같은 복잡한 확률변수의 분포를 모델링할 때 잠재변수의 사전확률분포로 표준정규분포를 사용한다. 그러나 이런 연속적인 잠재변수를 사용할 경우 서로 다른 데이터 샘플간의 구조적 불연속성을 반영하기 어렵다. 또 다른 문제점으로, GAN 모델에서 판별자는 학습 과정에서 과거에 생성자 모델이 생성했던 데이터 샘플에 대한 정보를 망각하며, 이로인해 학습 과정이 불안정해진다. 이 두가지 문제점은 생성자가 판별자가 공유하는 memory network를 동시에 학습함으로써 크게 완화할 수 있다. 생성자가 학습 데이터에 내재된 군집의 분포를 학습한다면 이를 통해 구조적 불연속성으로 인한 성능 하락을 피할 수 있으며, 판별자가 주어진 입력 데이터에 대한 판별을 할 때 학습 전 과정에 걸쳐 생성자가 생성했던 데이터 샘플들로부터 학습된 군집 분포를 참조한다면 망각 문제로 인한 영향을 덜 받게 된다. 본 연구에서 제시한 memoryGAN 모델은 비지도학습을 통해 데이터에 내재된 군집의 분포를 학습하여 구조적 불연속성 문제와 망각 문제를 완화하며, 대부분의 GAN 모델에 적용할 수 있다. Fashion-MNIST, CelebA, CIFAR10, 그리고 Chairs 데이터셋에 대한 성능 평가 및 시각화 실험을 통해 memoryGAN이 확률론적으로 해석 가능한 모델이며, 높은 수준의 사진 샘플을 생성한다는 것을 보였다. 특히 memoryGAN은 개선된 최적화 방법이나 Weaker divergence를 도입하지 않고도 CIFAR10 데이터셋에서 Inception Score를 기준으로 비지도학습 방식의 GAN 모델 중 높은 성능을 달성했다.We propose an approach to address two issues that commonly occur during training of unsupervised GANs. First, since GANs use only a continuous latent distribution to embed multiple classes or clusters of data, they often do not correctly handle the structural discontinuity between disparate classes in a latent space. Second, discriminators of GANs easily forget about past generated samples by generators, incurring instability during adversarial training. We argue that these two infamous problems of unsupervised GAN training can be largely alleviated by a learnable memory network to which both generators and discriminators can access. Generators can effectively learn representation of training samples to understand underlying cluster distributions of data, which ease the structure discontinuity problem. At the same time, discriminators can better memorize clusters of previously generated samples, which mitigate the forgetting problem. We propose a novel end-to-end GAN model named memoryGAN, which involves a memory network that is unsupervisedly trainable and integrable to many existing GAN models. With evaluations on multiple datasets such as Fashion-MNIST, CelebA, CIFAR10, and Chairs, we show that our model is probabilistically interpretable, and generates realistic image samples of high visual fidelity. The memoryGAN also achieves the state-of-the-art inception scores over unsupervised GAN models on the CIFAR10 dataset, without any optimization tricks and weaker divergences.Introduction Related Works The MemoryGAN Experiments ConclusionMaste

    Managing the unknown: a survey on Open Set Recognition and tangential areas

    Full text link
    In real-world scenarios classification models are often required to perform robustly when predicting samples belonging to classes that have not appeared during its training stage. Open Set Recognition addresses this issue by devising models capable of detecting unknown classes from samples arriving during the testing phase, while maintaining a good level of performance in the classification of samples belonging to known classes. This review comprehensively overviews the recent literature related to Open Set Recognition, identifying common practices, limitations, and connections of this field with other machine learning research areas, such as continual learning, out-of-distribution detection, novelty detection, and uncertainty estimation. Our work also uncovers open problems and suggests several research directions that may motivate and articulate future efforts towards more safe Artificial Intelligence methods.Comment: 35 pages, 1 figure, 1 tabl

    Semi-supervised evidential label propagation algorithm for graph data

    Get PDF
    International audienceIn the task of community detection, there often exists some useful prior information. In this paper, a Semi-supervised clustering approach using a new Evidential Label Propagation strategy (SELP) is proposed to incorporate the domain knowledge into the community detection model. The main advantage of SELP is that it can take limited supervised knowledge to guide the detection process. The prior information of community labels is expressed in the form of mass functions initially. Then a new evidential label propagation rule is adopted to propagate the labels from labeled data to unlabeled ones. The outliers can be identified to be in a special class. The experimental results demonstrate the effectiveness of SELP

    Landslide Susceptibility Assessment in Western External Rif Chain using Machine Learning Methods

    Get PDF
    Landslides are a major natural hazard in the mountainous Rif region of Northern Morocco. This study aims to create and compare landslide susceptibility maps in the Western External Rif Chain context using three advanced machine learning models: Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbors (KNN). The landslide database, created by satellite imagery and field research, contains an inventory of 3528 cases of slope movements. A database of 12 conditioning factors was prepared, including elevation, slope, aspect, curvature, lithology, rainfall, topographic wetness index (TWI), stream power index (SPI), distance to streams, distance to faults, distance to roads, and land cover. The database was randomly divided into training and validation sets at a ratio of 70/30. The predictive capabilities of the models were evaluated using overall accuracy (Acc), area under the receiver operating characteristic curve (AUC), kappa index, and F score measures. The results indicated that RF was the most suitable model for this study area, demonstrating the highest predictive capability (AUC= 0.86) compared to the other models. This aligns with previous landslide studies, which found that ensemble methods like RF and XGBoost offer superior accuracy. The most important causal factors of landslides in the study area were identified as slope, rainfall, and elevation, while the influence rate of TWI and SPI was the minimum. By analyzing a larger inventory of landslides on a more extensive scale, this study aims to improve the accuracy and reliability of landslide predictions in a west Mediterranean morphoclimatic context that encompasses a wide variety of lithologies. The resulting maps can serve as a crucial resource for land use planning and disaster management planning. Doi: 10.28991/CEJ-2023-09-12-018 Full Text: PD

    Comparison of Machine Learning Methods Applied to SAR Images for Forest Classification in Mediterranean Areas

    Get PDF
    In this paper, multifrequency synthetic aperture radar (SAR) images from ALOS/PALSAR, ENVISAT/ASAR and Cosmo‐SkyMed sensors were studied for forest classification in a test area in Central Italy (San Rossore), where detailed in‐situ measurements were available. A preliminary discrimination of the main land cover classes and forest types was carried out by exploiting the synergy among L‐, C‐ and X‐bands and different polarizations. SAR data were preliminarily inspected to assess the capabilities of discriminating forest from non‐forest and separating broadleaf from coniferous forests. The temporal average backscattering coefficient (°) was computed for each sensor‐polarization pair and labeled on a pixel basis according to the reference map. Several classification methods based on the machine learning framework were applied and validated considering different features, in order to highlight the contribution of bands and polarizations, as well as to assess the classifiers’ performance. The experimental results indicate that the different surface types are best identified by using all bands, followed by joint L‐ and X‐ bands. In the former case, the best overall average accuracy (83.1%) is achieved by random forest classification. Finally, the classification maps on class edges are discussed to highlight the misclassification errors

    Deep Learning-Based Machinery Fault Diagnostics

    Get PDF
    This book offers a compilation for experts, scholars, and researchers to present the most recent advancements, from theoretical methods to the applications of sophisticated fault diagnosis techniques. The deep learning methods for analyzing and testing complex mechanical systems are of particular interest. Special attention is given to the representation and analysis of system information, operating condition monitoring, the establishment of technical standards, and scientific support of machinery fault diagnosis
    corecore