764 research outputs found
Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy
Non-negative matrix factorization (NMF) has proved effective in many
clustering and classification tasks. The classic ways to measure the errors
between the original and the reconstructed matrix are distance or
Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly
handled when we use these error measures. As a consequence, alternative
measures based on nonlinear kernels, such as correntropy, are proposed.
However, the current correntropy-based NMF only targets on the low-level
features without considering the intrinsic geometrical distribution of data. In
this paper, we propose a new NMF algorithm that preserves local invariance by
adding graph regularization into the process of max-correntropy-based matrix
factorization. Meanwhile, each feature can learn corresponding kernel from the
data. The experiment results of Caltech101 and Caltech256 show the benefits of
such combination against other NMF algorithms for the unsupervised image
clustering
Identification of Software Bugs by Analyzing Natural Language-Based Requirements Using Optimized Deep Learning Features
© 2024 Tech Science Press. All rights reserved. This is an open access article distributed under the Creative Commons Attribution License, to view a copy of the license, see: https://creativecommons.org/licenses/by/4.0/Software project outcomes heavily depend on natural language requirements, often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements. Researchers are exploring machine learning to predict software bugs, but a more precise and general approach is needed. Accurate bug prediction is crucial for software evolution and user training, prompting an investigation into deep and ensemble learning methods. However, these studies are not generalized and efficient when extended to other datasets. Therefore, this paper proposed a hybrid approach combining multiple techniques to explore their effectiveness on bug identification problems. The methods involved feature selection, which is used to reduce the dimensionality and redundancy of features and select only the relevant ones; transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other datasets, and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model. Four National Aeronautics and Space Administration (NASA) and four Promise datasets are used in the study, showing an increase in the model’s performance by providing better Area Under the Receiver Operating Characteristic Curve (AUC-ROC) values when different classifiers were combined. It reveals that using an amalgam of techniques such as those used in this study, feature selection, transfer learning, and ensemble methods prove helpful in optimizing the software bug prediction models and providing high-performing, useful end mode.Peer reviewe
AtomAI: A Deep Learning Framework for Analysis of Image and Spectroscopy Data in (Scanning) Transmission Electron Microscopy and Beyond
AtomAI is an open-source software package bridging instrument-specific Python
libraries, deep learning, and simulation tools into a single ecosystem. AtomAI
allows direct applications of the deep convolutional neural networks for atomic
and mesoscopic image segmentation converting image and spectroscopy data into
class-based local descriptors for downstream tasks such as statistical and
graph analysis. For atomically-resolved imaging data, the output is types and
positions of atomic species, with an option for subsequent refinement. AtomAI
further allows the implementation of a broad range of image and spectrum
analysis functions, including invariant variational autoencoders (VAEs). The
latter consists of VAEs with rotational and (optionally) translational
invariance for unsupervised and class-conditioned disentanglement of
categorical and continuous data representations. In addition, AtomAI provides
utilities for mapping structure-property relationships via im2spec and spec2im
type of encoder-decoder models. Finally, AtomAI allows seamless connection to
the first principles modeling with a Python interface, including molecular
dynamics and density functional theory calculations on the inferred atomic
position. While the majority of applications to date were based on atomically
resolved electron microscopy, the flexibility of AtomAI allows straightforward
extension towards the analysis of mesoscopic imaging data once the labels and
feature identification workflows are established/available. The source code and
example notebooks are available at https://github.com/pycroscopy/atomai
Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
Transductive graph-based semi-supervised learning methods usually build an
undirected graph utilizing both labeled and unlabeled samples as vertices.
Those methods propagate label information of labeled samples to neighbors
through their edges in order to get the predicted labels of unlabeled samples.
Most popular semi-supervised learning approaches are sensitive to initial label
distribution happened in imbalanced labeled datasets. The class boundary will
be severely skewed by the majority classes in an imbalanced classification. In
this paper, we proposed a simple and effective approach to alleviate the
unfavorable influence of imbalance problem by iteratively selecting a few
unlabeled samples and adding them into the minority classes to form a balanced
labeled dataset for the learning methods afterwards. The experiments on UCI
datasets and MNIST handwritten digits dataset showed that the proposed approach
outperforms other existing state-of-art methods
Feature Space Augmentation: Improving Prediction Accuracy of Classical Problems in Cognitive Science and Computer Vison
The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as a mixture of amusement, anger, disgust, fear, sadness, anxiety and neutral and their respective levels at any time. The generated model predicts an individual’s dominant state and generates an emotional spectrum, 1x7 vector indicating levels of each emotional state and anxiety. We present an iterative learning framework that alters the feature space uniquely to an individual’s emotion perception, and predicts the emotional state using the individual specific feature space. Hybrid Feature Space for Image Classification: The objective is to improve the accuracy of existing image recognition by leveraging text features from the images. As humans, we perceive objects using colors, dimensions, geometry and any textual information we can gather. Current image recognition algorithms rely exclusively on the first 3 and do not use the textual information. This study develops and tests an approach that trains a classifier on a hybrid text based feature space that has comparable accuracy to the state of the art CNN’s while being significantly inexpensive computationally. Moreover, when combined with CNN’S the approach yields a statistically significant boost in accuracy. Both models are validated using cross validation and holdout validation, and are evaluated against the state of the art
The Mycobacterium tuberculosis transposon sequencing database (MtbTnDB): a large-scale guide to genetic conditional essentiality [preprint]
Characterization of gene essentiality across different conditions is a useful approach for predicting gene function. Transposon sequencing (TnSeq) is a powerful means of generating genome-wide profiles of essentiality and has been used extensively in Mycobacterium tuberculosis (Mtb) genetic research. Over the past two decades, dozens of TnSeq screens have been published, yielding valuable insights into the biology of Mtb in vitro, inside macrophages, and in model host organisms. However, these Mtb TnSeq profiles are distributed across dozens of research papers within supplementary materials, which makes querying them cumbersome and assembling a complete and consistent synthesis of existing data challenging. Here, we address this problem by building a central repository of publicly available TnSeq screens performed in M. tuberculosis, which we call the Mtb transposon sequencing database (MtbTnDB). The MtbTnDB encompasses 64 published and unpublished TnSeq screens, and is standardized, open-access, and allows users easy access to data, visualizations, and functional predictions through an interactive web-app (www.mtbtndb.app). We also present evidence that (i) genes in the same genomic neighborhood tend to have similar TnSeq profiles, and (ii) clusters of genes with similar TnSeq profiles tend to be enriched for genes belonging to the same functional categories. Finally, we test and evaluate machine learning models trained on TnSeq profiles to guide functional annotation of orphan genes in Mtb. In addition to facilitating the exploration of conditional genetic essentiality in this important human pathogen via a centralized TnSeq data repository, the MtbTnDB will enable hypothesis generation and the extraction of meaningful patterns by facilitating the comparison of datasets across conditions. This will provide a basis for insights into the functional organization of Mtb genes as well as gene function prediction
- …