Centrum Wiskunde & Informatica

CWI's Institutional Repository
Not a member yet
    26894 research outputs found

    Learning to judge: LLMs designing and applying evaluation rubrics

    Get PDF
    Large language models (LLMs) are increasingly used as evaluators for natural language generation, applying human-defined rubrics to assess system outputs. However, human rubrics are often static and misaligned with how models internally represent language quality. We introduce GER-Eval (Generating Evaluation Rubrics for Evaluation) to investigate whether LLMs can design and apply their own evaluation rubrics. We evaluate the semantic coherence and scoring reliability of LLM-defined criteria and their alignment with human criteria. LLMs reliably generate interpretable and task-aware evaluation dimensions and apply them within models, but their scoring reliability degrades in factual and knowledge-intensive settings. Closed-source models such as GPT4o achieve higher agreement and cross-model generalization than open-weight models such as Llama. Our findings position evaluation as a learned linguistic capability of LLMs, consistent within models but fragmented across them, and call for new methods that jointly model human and LLM evaluative language to improve reliability and interpretability

    Multivariate sensitivity analysis of electric machine efficiency maps and profiles under design uncertainty

    No full text
    This work introduces the use of multivariate global sensitivity analysis for assessing the impact of uncertain electric machine design parameters on efficiency maps and profiles. Contrary to the common approach of applying variance-based (Sobol') sensitivity analysis elementwise, multivariate sensitivity analysis provides a single sensitivity index per parameter, thus allowing for a holistic estimation of parameter importance over the full efficiency map or profile. Its benefits are demonstrated on permanent magnet synchronous machine models of different fidelity. Computations based on Monte Carlo sampling and polynomial chaos expansions are compared in terms of computational cost. The sensitivity analysis results are subsequently used to simplify the models, by fixing non-influential parameters to their nominal values and allowing random variations only for influential parameters. Uncertainty estimates obtained with the full and reduced models confirm the validity of model simplification guided by multivariate sensitivity analysis

    An overview of convergence rates for sum of squares hierarchies in polynomial optimization

    Get PDF
    In this chapter we consider polynomial optimization problems, asking to minimize a polynomial function over a compact semialgebraic set, defined by polynomial inequalities. This models a great variety of (in general, nonlinear nonconvex) optimization problems. Various hierarchies of (lower and upper) bounds have been introduced, having the remarkable property that they converge asymptotically to the global minimum. These bounds exploit algebraic representations of positive polynomials in terms of sums of squares and can be computed using semidefinite optimization. Our focus in this chapter lies in the performance analysis of these hierarchies of bounds, namely, in how far the bounds are from the global minimum as the degrees of the sums of squares they involve tend to infinity. We present the main state-of-the-art results and offer a gentle introductory overview over the various techniques that have been recently developed to establish them, stemming from the theory of orthogonal polynomials, approximation theory, Fourier analysis, and more

    Resource allocation based on past incident patterns

    Get PDF
    We formulate and solve two resource allocation problems motivated by a preparedness question of emergency response services. First, we consider the assignment of vehicles to stations, and, in a second step, assign crews to vehicles. In both cases, we work in a minimax framework and define the objective function for a spatial catchment area as the total risk in this area per resource unit allocated to it. The solutions are explicit and can be calculated in practice by a greedy algorithm that successively allocates a resource unit to an area having maximal relative risk, with suitable tie breaker rules. The approach is illustrated on a data set of incidents reported to the Twente Fire Brigade

    Testing and learning structured quantum Hamiltonians

    Get PDF
    We consider the problems of testing and learning an nn-qubit Hamiltonian H=xλxσxH = \sum_x \lambda_x \sigma_x expressed in its Pauli basis, from queries to its evolution operator U=eiHtU = e^{−iHt}. To this end, we prove the following results. 1. Testing: We give a tolerant testing protocol to decide if a Hamiltonian is ϵ1\epsilon_1-close to kk-local or ϵ2\epsilon_2-far from kk-local in the 2ℓ_2 norm of the coefficients, with O(1/(ϵ2ϵ1)4)O(1/(\epsilon_2−\epsilon_1)^4) queries, thereby solving two open questions posed in a recent work by Bluhm, Caro and Oufkir (Bluhm, A., Caro, M.C., Oufkir, A.). We give a protocol for testing whether a Hamiltonian is ϵ1\epsilon_1-close to being ss-sparse or ϵ2\epsilon_2-far from being ss-sparse in the 2ℓ_2 norm of the coefficients, with O(s6/(ϵ22ϵ12)6)O(s^{6}/(\epsilon_2^2 − \epsilon_1^2)^6) queries. 2. Learning: We give a protocol to ϵ\epsilon-learn unstructured Hamiltonian in the ℓ_∞ norm of the coefficients with O(1/ϵ4)O(1/\epsilon^4) queries. Combining this with the non-commutative Bohnenblust-Hille inequality, we obtain an algorithm for learning kk-local Hamiltonians in 2ℓ_2 norm of the coefficients that only uses O(exp(k2+klog(1/ϵ)))O(exp(k2 + k log(1/\epsilon))) queries. For Hamiltonians that are ss-sparse in the Pauli basis, we can learn them in the 2ℓ_2 norm with O(s2/ϵ4)O(s^2/\epsilon^4) queries. 3. Learning without quantum memory: The learning results stated above have no dependence on the system size nn, but require nn-qubit quantum memory. We give subroutines that allow us to reproduce all the above learning results without quantum memory; squaring the query complexity and paying a (logn)(log n)-factor in the local case and an nn-factor in the sparse case. 4. Testing without quantum memory: We give a new subroutine called Pauli hashing, which allows one to tolerantly test ss-sparse Hamiltonians in the 2ℓ2 norm using O~(s14/(ϵ22ϵ12)18)Õ(s^{14}/(\epsilon_2^2 − \epsilon_1^2)^18) query complexity. A key ingredient is showing that ss-sparse Pauli channels can be tested in a tolerant fashion as being ϵ1\epsilon_1-close to being ss-sparse or ϵ2\epsilon2-far under the diamond norm, using O~(s2/(ϵ2ϵ1)6)Õ(s^{2}/(\epsilon_2 − \epsilon_1)^6) queries via Pauli hashing. In order to prove these results, we prove new structural theorems for local Hamiltonians, sparse Pauli channels and sparse Hamiltonians. We complement our learning algorithms with lower bounds that are polynomially weaker. Furthermore, our algorithms use short time evolutions and do not assume prior knowledge of the terms on which the Pauli spectrum is supported on, i.e., we do not require prior knowledge of the support of the Hamiltonian

    Generative super-resolution of turbulent flows via stochastic interpolants

    Get PDF
    Capturing the intricate multiscale features of turbulent flows remains a fundamental challenge due to the limited resolution of experimental data and the computational cost of high-fidelity simulations. In many practical scenarios only coarse representations of the flows are feasible, leaving crucial fine-scale dynamics unresolved. This study addresses that limitation by leveraging generative models to perform super-resolution of velocity fields and reconstruct the unresolved scales from low-resolution conditionals. In particular, the recently formalized stochastic interpolants are employed to super-resolve a case study of two-dimensional turbulence. Key to our approach is the iterative application of stochastic interpolants over local patches of the flow field, that enables efficient reconstruction without the need to process the full domain simultaneously. The patch-wise strategy is shown to yield physically consistent super-resolved flow snapshots, and key statistical quantities – such as the kinetic energy spectrum – are accurately recovered. Moreover, the patch-wise approach is observed to produce super-resolutions of a quality comparable to those produced using a full field approach, and, in general, stochastic interpolants are observed to outperform contesting generative models across a range of metrics. Although only demonstrated for a 2D case study, these results highlight the potential of using stochastic interpolants to super-resolve turbulent flows

    Online matching on 3-uniform hypergraphs

    Get PDF
    The online matching problem was introduced by Karp, Vazirani and Vazirani (STOC 1990) on bipartite graphs with vertex arrivals. It is well-known that the optimal competitive ratio is 1-1/e for both integral and fractional versions of the problem. Since then, there has been considerable effort to find optimal competitive ratios for other related settings. In this work, we go beyond the graph case and study the online matching problem on k-uniform hypergraphs. For k=3, we provide an optimal primal-dual fractional algorithm, which achieves a competitive ratio of (e-1)/(e+1)≈0.4621. As our main technical contribution, we present a carefully constructed adversarial instance, which shows that this ratio is in fact optimal. It combines ideas from known hard instances for bipartite graphs under the edge-arrival and vertex-arrival models. For k≥3, we give a simple integral algorithm which performs better than greedy when the online nodes have bounded degree. As a corollary, it achieves the optimal competitive ratio of 1/2 on 3-uniform hypergraphs when every online node has degree at most 2. This is because the special case where every online node has degree 1 is equivalent to the edge-arrival model on graphs, for which an upper bound of 1/2 is known

    Towards robust quantitative photoacoustic tomography via learned iterative methods

    Get PDF
    Photoacoustic tomography (PAT) is a medical imaging modality that can provide high-resolution tissue images based on the optical absorption. Classical reconstruction methods for quantifying the absorption coefficients rely on sufficient prior information to overcome noisy and imperfect measurements. As these methods utilize computationally expensive forward models, the computation becomes slow, limiting their potential for time-critical applications. As an alternative approach, deep learning-based reconstruction methods have been established for faster and more accurate reconstructions. However, most of these methods rely on having a large amount of training data, which is not the case in practice. In this work, we adopt the model-based learned iterative approach for the use in Quantitative PAT (QPAT), in which additional information from the model is iteratively provided to the updating networks, allowing better generalizability with scarce training data. We compare the performance of different learned updates based on gradient descent, Gauss-Newton, and Quasi-Newton methods. The learning tasks are formulated as greedy, requiring iterate-wise optimality, as well as end-to-end, where all networks are trained jointly. The implemented methods are tested with ideal simulated data as well as against a digital twin dataset that emulates scarce training data and high modeling error

    Approximating the volume of a truncated relaxation of the independence polytope

    Get PDF
    Answering a question of Gamarnik and Smedira [15], we give a polynomial time algorithm that approximately computes the volume of a truncation of a relaxation of the independent set polytope, improving on their quasi-polynomial time algorithm. Our algorithm is obtained by viewing the volume as an evaluation of a graph polynomial and we approximate this evaluation using Barvinok’s interpolation method

    13,776

    full texts

    26,917

    metadata records
    Updated in last 30 days.
    CWI's Institutional Repository is based in Netherlands
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇