12,764 research outputs found

    py4DSTEM: a software package for multimodal analysis of four-dimensional scanning transmission electron microscopy datasets

    Get PDF
    Scanning transmission electron microscopy (STEM) allows for imaging, diffraction, and spectroscopy of materials on length scales ranging from microns to atoms. By using a high-speed, direct electron detector, it is now possible to record a full 2D image of the diffracted electron beam at each probe position, typically a 2D grid of probe positions. These 4D-STEM datasets are rich in information, including signatures of the local structure, orientation, deformation, electromagnetic fields and other sample-dependent properties. However, extracting this information requires complex analysis pipelines, from data wrangling to calibration to analysis to visualization, all while maintaining robustness against imaging distortions and artifacts. In this paper, we present py4DSTEM, an analysis toolkit for measuring material properties from 4D-STEM datasets, written in the Python language and released with an open source license. We describe the algorithmic steps for dataset calibration and various 4D-STEM property measurements in detail, and present results from several experimental datasets. We have also implemented a simple and universal file format appropriate for electron microscopy data in py4DSTEM, which uses the open source HDF5 standard. We hope this tool will benefit the research community, helps to move the developing standards for data and computational methods in electron microscopy, and invite the community to contribute to this ongoing, fully open-source project

    Applicability of semi-supervised learning assumptions for gene ontology terms prediction

    Get PDF
    Gene Ontology (GO) is one of the most important resources in bioinformatics, aiming to provide a unified framework for the biological annotation of genes and proteins across all species. Predicting GO terms is an essential task for bioinformatics, but the number of available labelled proteins is in several cases insufficient for training reliable machine learning classifiers. Semi-supervised learning methods arise as a powerful solution that explodes the information contained in unlabelled data in order to improve the estimations of traditional supervised approaches. However, semi-supervised learning methods have to make strong assumptions about the nature of the training data and thus, the performance of the predictor is highly dependent on these assumptions. This paper presents an analysis of the applicability of semi-supervised learning assumptions over the specific task of GO terms prediction, focused on providing judgment elements that allow choosing the most suitable tools for specific GO terms. The results show that semi-supervised approaches significantly outperform the traditional supervised methods and that the highest performances are reached when applying the cluster assumption. Besides, it is experimentally demonstrated that cluster and manifold assumptions are complimentary to each other and an analysis of which GO terms can be more prone to be correctly predicted with each assumption, is provided.Postprint (published version

    Solution Path Algorithm for Twin Multi-class Support Vector Machine

    Full text link
    The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems, however, which is faced with some difficulties such as model selection and solving multi-classification problems quickly. This paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear with respect to the regularization parameters by combining the linear equations and block matrix theory. Eight kinds of events are defined to seek for the starting event and then the solution path algorithm is designed, which greatly reduces the computational cost. In addition, only few points are combined to complete the initialization and Lagrangian multipliers are proved to be 1 as the regularization parameter tends to infinity. Simulation results based on UCI datasets show that the proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level
    corecore