77 research outputs found
Sensitivity of robust optimization problems under drift and volatility uncertainty
We examine optimization problems in which an investor has the opportunity to
trade in stocks with the goal of maximizing her worst-case cost of
cumulative gains and losses. Here, worst-case refers to taking into account all
possible drift and volatility processes for the stocks that fall within a
-neighborhood of predefined fixed baseline processes. Although
solving the worst-case problem for a fixed is known to be very
challenging in general, we show that it can be approximated as by the baseline problem (computed using the baseline processes) in the
following sense: Firstly, the value of the worst-case problem is equal to the
value of the baseline problem plus times a correction term. This
correction term can be computed explicitly and quantifies how sensitive a given
optimization problem is to model uncertainty. Moreover, approximately optimal
trading strategies for the worst-case problem can be obtained using optimal
strategies from the corresponding baseline problem
Feature-aligned N-BEATS with Sinkhorn divergence
In this study, we propose Feature-aligned N-BEATS as a domain generalization
model for univariate time series forecasting problems. The proposed model is an
extension of the doubly residual stacking architecture of N-BEATS (Oreshkin et
al. [34]) into a representation learning framework. The model is a new
structure that involves marginal feature probability measures (i.e.,
pushforward measures of multiple source domains) induced by the intricate
composition of residual operators of N-BEATS in each stack and aligns them
stack-wise via an entropic regularized Wasserstein distance referred to as the
Sinkhorn divergence (Genevay et al. [14]). The loss function consists of a
typical forecasting loss for multiple source domains and an alignment loss
calculated with the Sinkhorn divergence, which allows the model to learn
invariant features stack-wise across multiple source data sequences while
retaining N-BEATS's interpretable design. We conduct a comprehensive
experimental evaluation of the proposed approach and the results demonstrate
the model's forecasting and generalization capabilities in comparison with
methods based on the original N-BEATS
BOtied: Multi-objective Bayesian optimization with tied multivariate ranks
Many scientific and industrial applications require joint optimization of
multiple, potentially competing objectives. Multi-objective Bayesian
optimization (MOBO) is a sample-efficient framework for identifying
Pareto-optimal solutions. We show a natural connection between non-dominated
solutions and the highest multivariate rank, which coincides with the outermost
level line of the joint cumulative distribution function (CDF). We propose the
CDF indicator, a Pareto-compliant metric for evaluating the quality of
approximate Pareto sets that complements the popular hypervolume indicator. At
the heart of MOBO is the acquisition function, which determines the next
candidate to evaluate by navigating the best compromises among the objectives.
Multi-objective acquisition functions that rely on box decomposition of the
objective space, such as the expected hypervolume improvement (EHVI) and
entropy search, scale poorly to a large number of objectives. We propose an
acquisition function, called BOtied, based on the CDF indicator. BOtied can be
implemented efficiently with copulas, a statistical tool for modeling complex,
high-dimensional distributions. We benchmark BOtied against common acquisition
functions, including EHVI and random scalarization (ParEGO), in a series of
synthetic and real-data experiments. BOtied performs on par with the baselines
across datasets and metrics while being computationally efficient.Comment: 10 pages (+5 appendix), 9 figures. Submitted to NeurIP
Probabilistic prediction of cyanobacteria abundance in a Korean reservoir using a Bayesian Poisson model
There have been increasing reports of harmful algal blooms (HABs) worldwide. However, the factors that influence cyanobacteria dominance and HAB formation can be siteāspecific and idiosyncratic, making prediction challenging. The drivers of cyanobacteria blooms in Lake Paldang, South Korea, the summer climate of which is strongly affected by the East Asian monsoon, may differ from those in wellāstudied North American lakes. Using the observational data sampled during the growing season in 2007ā2011, a Bayesian hurdle Poisson model was developed to predict cyanobacteria abundance in the lake. The model allowed cyanobacteria absence (zero count) and nonzero cyanobacteria counts to be modeled as functions of different environmental factors. The model predictions demonstrated that the principal factor that determines the success of cyanobacteria was temperature. Combined with high temperature, increased residence time indicated by low outflow rates appeared to increase the probability of cyanobacteria occurrence. A stable water column, represented by low suspended solids, and high temperature were the requirements for high abundance of cyanobacteria. Our model results had management implications; the model can be used to forecast cyanobacteria watch or alert levels probabilistically and develop mitigation strategies of cyanobacteria blooms. Key Points A Bayesian hurdle Poisson model predicted cyanobacteria abundance Temperature, flushing rate, and water column stability were key factors The model forecasted cyanobacteria watch and alert levels probabilisticallyPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/106958/1/wrcr20820.pd
-depth-optimized Quantum Search with Quantum Data-access Machine
Quantum search algorithms offer a remarkable advantage of quadratic reduction
in query complexity using quantum superposition principle. However, how an
actual architecture may access and handle the database in a quantum superposed
state has been largely unexplored so far; the quantum state of data was simply
assumed to be prepared and accessed by a black-box operation -- so-called
quantum oracle, even though this process, if not appropriately designed, may
adversely diminish the quantum query advantage. Here, we introduce an efficient
quantum data-access process, dubbed as quantum data-access machine (QDAM), and
present a general architecture for quantum search algorithm. We analyze the
runtime of our algorithm in view of the fault-tolerant quantum computation
(FTQC) consisting of logical qubits within an effective quantum error
correction code. Specifically, we introduce a measure involving two
computational complexities, i.e. quantum query and -depth complexities,
which can be critical to assess performance since the logical non-Clifford
gates, such as the (i.e., rotation) gate, are known to be costliest
to implement in FTQC. Our analysis shows that for searching data, a QDAM
model exhibiting a logarithmic, i.e., , growth of the -depth
complexity can be constructed. Further analysis reveals that our QDAM-embedded
quantum search requires runtime cost. Our study
thus demonstrates that the quantum data search algorithm can truly speed up
over classical approaches with the logarithmic -depth QDAM as a key
component.Comment: 13 pages, 8 figures / Comment welcom
Blind Biological Sequence Denoising with Self-Supervised Set Learning
Biological sequence analysis relies on the ability to denoise the imprecise
output of sequencing platforms. We consider a common setting where a short
sequence is read out repeatedly using a high-throughput long-read platform to
generate multiple subreads, or noisy observations of the same sequence.
Denoising these subreads with alignment-based approaches often fails when too
few subreads are available or error rates are too high. In this paper, we
propose a novel method for blindly denoising sets of sequences without directly
observing clean source sequence labels. Our method, Self-Supervised Set
Learning (SSSL), gathers subreads together in an embedding space and estimates
a single set embedding as the midpoint of the subreads in both the latent and
sequence spaces. This set embedding represents the "average" of the subreads
and can be decoded into a prediction of the clean sequence. In experiments on
simulated long-read DNA data, SSSL methods denoise small reads of
subreads with 17% fewer errors and large reads of subreads with 8% fewer
errors compared to the best baseline. On a real dataset of antibody sequences,
SSSL improves over baselines on two self-supervised metrics, with a significant
improvement on difficult small reads that comprise over 60% of the test set. By
accurately denoising these reads, SSSL promises to better realize the potential
of high-throughput DNA sequencing data for downstream scientific applications
- ā¦