77 research outputs found

    Sensitivity of robust optimization problems under drift and volatility uncertainty

    Full text link
    We examine optimization problems in which an investor has the opportunity to trade in dd stocks with the goal of maximizing her worst-case cost of cumulative gains and losses. Here, worst-case refers to taking into account all possible drift and volatility processes for the stocks that fall within a Īµ\varepsilon-neighborhood of predefined fixed baseline processes. Although solving the worst-case problem for a fixed Īµ>0\varepsilon>0 is known to be very challenging in general, we show that it can be approximated as Īµā†’0\varepsilon\to 0 by the baseline problem (computed using the baseline processes) in the following sense: Firstly, the value of the worst-case problem is equal to the value of the baseline problem plus Īµ\varepsilon times a correction term. This correction term can be computed explicitly and quantifies how sensitive a given optimization problem is to model uncertainty. Moreover, approximately optimal trading strategies for the worst-case problem can be obtained using optimal strategies from the corresponding baseline problem

    Feature-aligned N-BEATS with Sinkhorn divergence

    Full text link
    In this study, we propose Feature-aligned N-BEATS as a domain generalization model for univariate time series forecasting problems. The proposed model is an extension of the doubly residual stacking architecture of N-BEATS (Oreshkin et al. [34]) into a representation learning framework. The model is a new structure that involves marginal feature probability measures (i.e., pushforward measures of multiple source domains) induced by the intricate composition of residual operators of N-BEATS in each stack and aligns them stack-wise via an entropic regularized Wasserstein distance referred to as the Sinkhorn divergence (Genevay et al. [14]). The loss function consists of a typical forecasting loss for multiple source domains and an alignment loss calculated with the Sinkhorn divergence, which allows the model to learn invariant features stack-wise across multiple source data sequences while retaining N-BEATS's interpretable design. We conduct a comprehensive experimental evaluation of the proposed approach and the results demonstrate the model's forecasting and generalization capabilities in comparison with methods based on the original N-BEATS

    BOtied: Multi-objective Bayesian optimization with tied multivariate ranks

    Full text link
    Many scientific and industrial applications require joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. We show a natural connection between non-dominated solutions and the highest multivariate rank, which coincides with the outermost level line of the joint cumulative distribution function (CDF). We propose the CDF indicator, a Pareto-compliant metric for evaluating the quality of approximate Pareto sets that complements the popular hypervolume indicator. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. Multi-objective acquisition functions that rely on box decomposition of the objective space, such as the expected hypervolume improvement (EHVI) and entropy search, scale poorly to a large number of objectives. We propose an acquisition function, called BOtied, based on the CDF indicator. BOtied can be implemented efficiently with copulas, a statistical tool for modeling complex, high-dimensional distributions. We benchmark BOtied against common acquisition functions, including EHVI and random scalarization (ParEGO), in a series of synthetic and real-data experiments. BOtied performs on par with the baselines across datasets and metrics while being computationally efficient.Comment: 10 pages (+5 appendix), 9 figures. Submitted to NeurIP

    Probabilistic prediction of cyanobacteria abundance in a Korean reservoir using a Bayesian Poisson model

    Full text link
    There have been increasing reports of harmful algal blooms (HABs) worldwide. However, the factors that influence cyanobacteria dominance and HAB formation can be siteā€specific and idiosyncratic, making prediction challenging. The drivers of cyanobacteria blooms in Lake Paldang, South Korea, the summer climate of which is strongly affected by the East Asian monsoon, may differ from those in wellā€studied North American lakes. Using the observational data sampled during the growing season in 2007ā€“2011, a Bayesian hurdle Poisson model was developed to predict cyanobacteria abundance in the lake. The model allowed cyanobacteria absence (zero count) and nonzero cyanobacteria counts to be modeled as functions of different environmental factors. The model predictions demonstrated that the principal factor that determines the success of cyanobacteria was temperature. Combined with high temperature, increased residence time indicated by low outflow rates appeared to increase the probability of cyanobacteria occurrence. A stable water column, represented by low suspended solids, and high temperature were the requirements for high abundance of cyanobacteria. Our model results had management implications; the model can be used to forecast cyanobacteria watch or alert levels probabilistically and develop mitigation strategies of cyanobacteria blooms. Key Points A Bayesian hurdle Poisson model predicted cyanobacteria abundance Temperature, flushing rate, and water column stability were key factors The model forecasted cyanobacteria watch and alert levels probabilisticallyPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/106958/1/wrcr20820.pd

    TT-depth-optimized Quantum Search with Quantum Data-access Machine

    Full text link
    Quantum search algorithms offer a remarkable advantage of quadratic reduction in query complexity using quantum superposition principle. However, how an actual architecture may access and handle the database in a quantum superposed state has been largely unexplored so far; the quantum state of data was simply assumed to be prepared and accessed by a black-box operation -- so-called quantum oracle, even though this process, if not appropriately designed, may adversely diminish the quantum query advantage. Here, we introduce an efficient quantum data-access process, dubbed as quantum data-access machine (QDAM), and present a general architecture for quantum search algorithm. We analyze the runtime of our algorithm in view of the fault-tolerant quantum computation (FTQC) consisting of logical qubits within an effective quantum error correction code. Specifically, we introduce a measure involving two computational complexities, i.e. quantum query and TT-depth complexities, which can be critical to assess performance since the logical non-Clifford gates, such as the TT (i.e., Ļ€/8\pi/8 rotation) gate, are known to be costliest to implement in FTQC. Our analysis shows that for NN searching data, a QDAM model exhibiting a logarithmic, i.e., O(logā”N)O(\log{N}), growth of the TT-depth complexity can be constructed. Further analysis reveals that our QDAM-embedded quantum search requires O(NƗlogā”N)O(\sqrt{N} \times \log{N}) runtime cost. Our study thus demonstrates that the quantum data search algorithm can truly speed up over classical approaches with the logarithmic TT-depth QDAM as a key component.Comment: 13 pages, 8 figures / Comment welcom

    Blind Biological Sequence Denoising with Self-Supervised Set Learning

    Full text link
    Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of ā‰¤6\leq 6 subreads with 17% fewer errors and large reads of >6>6 subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications
    • ā€¦
    corecore