2,381 research outputs found
Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data
B cells develop high affinity receptors during the course of affinity
maturation, a cyclic process of mutation and selection. At the end of affinity
maturation, a number of cells sharing the same ancestor (i.e. in the same
"clonal family") are released from the germinal center, their amino acid
frequency profile reflects the allowed and disallowed substitutions at each
position. These clonal-family-specific frequency profiles, called "substitution
profiles", are useful for studying the course of affinity maturation as well as
for antibody engineering purposes. However, most often only a single sequence
is recovered from each clonal family in a sequencing experiment, making it
impossible to construct a clonal-family-specific substitution profile. Given
the public release of many high-quality large B cell receptor datasets, one may
ask whether it is possible to use such data in a prediction model for
clonal-family-specific substitution profiles. In this paper, we present the
method "Substitution Profiles Using Related Families" (SPURF), a penalized
tensor regression framework that integrates information from a rich assemblage
of datasets to predict the clonal-family-specific substitution profile for any
single input sequence. Using this framework, we show that substitution profiles
from similar clonal families can be leveraged together with simulated
substitution profiles and germline gene sequence information to improve
prediction. We fit this model on a large public dataset and validate the
robustness of our approach on an external dataset. Furthermore, we provide a
command-line tool in an open-source software package
(https://github.com/krdav/SPURF) implementing these ideas and providing easy
prediction using our pre-fit models.Comment: 23 page
Elephant Search with Deep Learning for Microarray Data Analysis
Even though there is a plethora of research in Microarray gene expression
data analysis, still, it poses challenges for researchers to effectively and
efficiently analyze the large yet complex expression of genes. The feature
(gene) selection method is of paramount importance for understanding the
differences in biological and non-biological variation between samples. In
order to address this problem, a novel elephant search (ES) based optimization
is proposed to select best gene expressions from the large volume of microarray
data. Further, a promising machine learning method is envisioned to leverage
such high dimensional and complex microarray dataset for extracting hidden
patterns inside to make a meaningful prediction and most accurate
classification. In particular, stochastic gradient descent based Deep learning
(DL) with softmax activation function is then used on the reduced features
(genes) for better classification of different samples according to their gene
expression levels. The experiments are carried out on nine most popular Cancer
microarray gene selection datasets, obtained from UCI machine learning
repository. The empirical results obtained by the proposed elephant search
based deep learning (ESDL) approach are compared with most recent published
article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl
MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions
In this paper, we propose a general class of algorithms for optimizing an
extensive variety of nonsmoothly penalized objective functions that satisfy
certain regularity conditions. The proposed framework utilizes the
majorization-minimization (MM) algorithm as its core optimization engine. The
resulting algorithms rely on iterated soft-thresholding, implemented
componentwise, allowing for fast, stable updating that avoids the need for any
high-dimensional matrix inversion. We establish a local convergence theory for
this class of algorithms under weaker assumptions than previously considered in
the statistical literature. We also demonstrate the exceptional effectiveness
of new acceleration methods, originally proposed for the EM algorithm, in this
class of problems. Simulation results and a microarray data example are
provided to demonstrate the algorithm's capabilities and versatility.Comment: A revised version of this paper has been published in the Electronic
Journal of Statistic
Random Forest as a tumour genetic marker extractor
Identifying tumour genetic markers is an essential task for biomedicine. In this thesis, we analyse a dataset of chromosomal rearrangements of cancer samples and present a methodology for extracting genetic markers from this dataset by using a Random Forest as a feature selection tool
Application to the Analysis of Germinal Center Reactions In Vivo
Simultaneous detection of multiple cellular and molecular players in their
native environment, one of the keys to a full understanding of immune
processes, remains challenging for in vivo microscopy. Here, we present a
synergistic strategy for spectrally multiplexed in vivo imaging composed of
(i) triple two-photon excitation using spatiotemporal synchronization of two
femtosecond lasers, (ii) a broad set of fluorophores with emission ranging
from blue to near infrared, (iii) an effective spectral unmixing algorithm.
Using our approach, we simultaneously excite and detect seven fluorophores
expressed in distinct cellular and tissue compartments, plus second harmonics
generation from collagen fibers in lymph nodes. This enables us to visualize
the dynamic interplay of all the central cellular players during germinal
center reactions. While current in vivo imaging typically enables recording
the dynamics of 4 tissue components at a time, our strategy allows a more
comprehensive analysis of cellular dynamics involving 8 single-labeled
compartments. It enables to investigate the orchestration of multiple cellular
subsets determining tissue function, thus, opening the way for a mechanistic
understanding of complex pathophysiologic processes in vivo. In the future,
the design of transgenic mice combining a larger spectrum of fluorescent
proteins will reveal the full potential of our method
- …