2,381 research outputs found

    Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data

    Full text link
    B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center, their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called "substitution profiles", are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method "Substitution Profiles Using Related Families" (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on an external dataset. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.Comment: 23 page

    Elephant Search with Deep Learning for Microarray Data Analysis

    Full text link
    Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

    MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions

    Full text link
    In this paper, we propose a general class of algorithms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy certain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) algorithm as its core optimization engine. The resulting algorithms rely on iterated soft-thresholding, implemented componentwise, allowing for fast, stable updating that avoids the need for any high-dimensional matrix inversion. We establish a local convergence theory for this class of algorithms under weaker assumptions than previously considered in the statistical literature. We also demonstrate the exceptional effectiveness of new acceleration methods, originally proposed for the EM algorithm, in this class of problems. Simulation results and a microarray data example are provided to demonstrate the algorithm's capabilities and versatility.Comment: A revised version of this paper has been published in the Electronic Journal of Statistic

    Random Forest as a tumour genetic marker extractor

    Get PDF
    Identifying tumour genetic markers is an essential task for biomedicine. In this thesis, we analyse a dataset of chromosomal rearrangements of cancer samples and present a methodology for extracting genetic markers from this dataset by using a Random Forest as a feature selection tool

    Application to the Analysis of Germinal Center Reactions In Vivo

    Get PDF
    Simultaneous detection of multiple cellular and molecular players in their native environment, one of the keys to a full understanding of immune processes, remains challenging for in vivo microscopy. Here, we present a synergistic strategy for spectrally multiplexed in vivo imaging composed of (i) triple two-photon excitation using spatiotemporal synchronization of two femtosecond lasers, (ii) a broad set of fluorophores with emission ranging from blue to near infrared, (iii) an effective spectral unmixing algorithm. Using our approach, we simultaneously excite and detect seven fluorophores expressed in distinct cellular and tissue compartments, plus second harmonics generation from collagen fibers in lymph nodes. This enables us to visualize the dynamic interplay of all the central cellular players during germinal center reactions. While current in vivo imaging typically enables recording the dynamics of 4 tissue components at a time, our strategy allows a more comprehensive analysis of cellular dynamics involving 8 single-labeled compartments. It enables to investigate the orchestration of multiple cellular subsets determining tissue function, thus, opening the way for a mechanistic understanding of complex pathophysiologic processes in vivo. In the future, the design of transgenic mice combining a larger spectrum of fluorescent proteins will reveal the full potential of our method
    • …
    corecore