21 research outputs found
Replica conditional sequential monte carlo
© 2019 International Machine Learning Society (IMLS). We propose a Markov chain Monte Carlo (MCMC) scheme to perform state inference in non-linear non-Gaussian state-space models. Current state-of-the-art methods to address this problem rely on particle MCMC techniques and its variants, such as the iterated conditional Sequential Monte Carlo (cSMC) scheme, which uses a Sequential Monte Carlo (SMC) type proposal within MCMC. A deficiency of standard SMC proposals is that they only use observations up to time t to propose states at time t when an entire observation sequence is available. More sophisticated SMC based on lookahead techniques could be used but they can be difficult to put in practice. We propose here replica cSMC where we build SMC proposals for one replica using information from the entire observation sequence by conditioning on the states of the other replicas. This approach is easily parallelizable and we demonstrate its excellent empirical performance when compared to the standard iterated cSMC scheme at fixed computational complexity
Sequencing identifies a distinct signature of circulating microRNAs in early radiographic knee osteoarthritis
OBJECTIVE: MicroRNAs act locally and systemically to impact osteoarthritis (OA) pathophysiology, but comprehensive profiling of the circulating miRNome in early vs late stages of OA has yet to be conducted. Sequencing has emerged as the preferred method for microRNA profiling since it offers high sensitivity and specificity. Our objective is to sequence the miRNome in plasma from 91 patients with early [Kellgren-Lawrence (KL) grade 0 or 1 (n = 41)] or late [KL grade 3 or 4 (n = 50)] symptomatic radiographic knee OA to identify unique microRNA signatures in each disease state.
DESIGN: MicroRNA libraries were prepared using the QIAseq miRNA Library Kit and sequenced on the Illumina NextSeq 550.Counts were produced for microRNAs captured in miRBase and for novel microRNAs. Statistical, bioinformatics, and computational biology approaches were used to refine and interpret the final list of microRNAs.
RESULTS: From 215 differentially expressed microRNAs (FDR \u3c 0.01), 97 microRNAs showed an increase or decrease in expression in ≥85% of samples in the early OA group as compared to the median expression in the late OA group. Increasing this threshold to ≥95%, seven microRNAs were identified: hsa-miR-335-3p, hsa-miR-199a-5p, hsa-miR-671-3p, hsa-miR-1260b, hsa-miR-191-3p, hsa-miR-335-5p, and hsa-miR-543. Four novel microRNAs were present in ≥50% of early OA samples and had 27 predicted gene targets in common with the prioritized set of predicted gene targets from the 97 microRNAs, suggesting common underlying mechanisms.
CONCLUSION: Applying sequencing to well-characterized patient cohorts produced unbiased profiling of the circulating miRNome and identified a unique panel of 11 microRNAs in early radiographic knee OA
Distances on a Graph
In this article, our ultimate goal is to transform a graph’s adjacency matrix into a distance matrix. Because cluster density is not observable prior to the actual clustering, our goal is to find a distance whose pairwise minimization will lead to densely connected clusters. Our thesis is centered on the widely accepted notion that strong clusters are sets of vertices with high induced subgraph density. We posit that vertices sharing more connections are closer to each other than vertices sharing fewer connections. This definition of distance differs from the usual shortest-path distance. At the cluster level, our thesis translates into low mean intra-cluster distances, which reflect high densities. We compare three distance measures from the literature. Our benchmark is the accuracy of each measure’s reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs, where clusters and intra-cluster density are known in advance. In this article, we restrict our attention to unweighted graphs with no self-loops or multiple edges. We examine the relationship between mean intra-cluster distances and intra-cluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters
Fragmentation, Price Formation, and Cross-Impact in Bitcoin Markets
In the light of micro-scale inefficiencies due to the highly fragmented bitcoin trading landscape, we use a granular data set comprising orderbook and trades data from the most liquid bitcoin markets, to understand the price formation process at sub-1-second time scales. To this end, we construct a set of features that encapsulate relevant microstructural information over short lookback windows. These features are subsequently leveraged, first to generate a leader–lagger network that quantifies how markets impact one another, and then to train linear models capable of explaining between 10% and 37% of total variation in 500 ms future returns (depending on which market is the prediction target). The results are then compared with those of various PnL calculations that take trading realities, such as transaction costs, into account. The PnL calculations are based on natural taker strategies (meaning they employ market orders) associated with each model. Our findings emphasize the role of a market's fee regime in determining both its propensity to lead or lag, and the profitability of our taker strategy. We further derive a natural maker strategy (using only passive limit orders) which, due to the difficulties associated with backtesting maker strategies, we test in a real-world live trading experiment, in which we turned over 1.5 M USD in notional volume. Lending additional confidence to our models, and by extension to the features they are based on, the results indicate a significant improvement over a naive benchmark strategy, which we also deploy in a live trading environment with real capital, for the sake of comparison
LOW-RANK EXTENDED KALMAN FILTERING FOR ONLINE LEARNING OF NEURAL NETWORKS FROM STREAMING DATA
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm