26894 research outputs found
Sort by
Learning to judge: LLMs designing and applying evaluation rubrics
Large language models (LLMs) are increasingly used as evaluators for natural language generation, applying human-defined rubrics to assess system outputs. However, human rubrics are often static and misaligned with how models internally represent language quality. We introduce GER-Eval (Generating Evaluation Rubrics for Evaluation) to investigate whether LLMs can design and apply their own evaluation rubrics. We evaluate the semantic coherence and scoring reliability of LLM-defined criteria and their alignment with human criteria. LLMs reliably generate interpretable and task-aware evaluation dimensions and apply them within models, but their scoring reliability degrades in factual and knowledge-intensive settings. Closed-source models such as GPT4o achieve higher agreement and cross-model generalization than open-weight models such as Llama. Our findings position evaluation as a learned linguistic capability of LLMs, consistent within models but fragmented across them, and call for new methods that jointly model human and LLM evaluative language to improve reliability and interpretability
Multivariate sensitivity analysis of electric machine efficiency maps and profiles under design uncertainty
This work introduces the use of multivariate global sensitivity analysis for assessing the impact of uncertain electric machine design parameters on efficiency maps and profiles. Contrary to the common approach of applying variance-based (Sobol') sensitivity analysis elementwise, multivariate sensitivity analysis provides a single sensitivity index per parameter, thus allowing for a holistic estimation of parameter importance over the full efficiency map or profile. Its benefits are demonstrated on permanent magnet synchronous machine models of different fidelity. Computations based on Monte Carlo sampling and polynomial chaos expansions are compared in terms of computational cost. The sensitivity analysis results are subsequently used to simplify the models, by fixing non-influential parameters to their nominal values and allowing random variations only for influential parameters. Uncertainty estimates obtained with the full and reduced models confirm the validity of model simplification guided by multivariate sensitivity analysis
An overview of convergence rates for sum of squares hierarchies in polynomial optimization
In this chapter we consider polynomial optimization problems, asking to minimize a polynomial function over a compact semialgebraic set, defined by polynomial inequalities. This models a great variety of (in general, nonlinear nonconvex) optimization problems. Various hierarchies of (lower and upper) bounds have been introduced, having the remarkable property that they converge asymptotically to the global minimum. These bounds exploit algebraic representations of positive polynomials in terms of sums of squares and can be computed using semidefinite optimization. Our focus in this chapter lies in the performance analysis of these hierarchies of bounds, namely, in how far the bounds are from the global minimum as the degrees of the sums of squares they involve tend to infinity. We present the main state-of-the-art results and offer a gentle introductory overview over the various techniques that have been recently developed to establish them, stemming from the theory of orthogonal polynomials, approximation theory, Fourier analysis, and more
Resource allocation based on past incident patterns
We formulate and solve two resource allocation problems motivated by a preparedness question of emergency response services. First, we consider the assignment of vehicles to stations, and, in a second step, assign crews to vehicles. In both cases, we work in a minimax framework and define the objective function for a spatial catchment area as the total risk in this area per resource unit allocated to it. The solutions are explicit and can be calculated in practice by a greedy algorithm that successively allocates a resource unit to an area having maximal relative risk, with suitable tie breaker rules. The approach is illustrated on a data set of incidents reported to the Twente Fire Brigade
Testing and learning structured quantum Hamiltonians
We consider the problems of testing and learning an -qubit Hamiltonian expressed in its Pauli basis, from queries to its evolution operator . To this end, we prove the following results.
1. Testing: We give a tolerant testing protocol to decide if a Hamiltonian is -close to -local or -far from -local in the norm of the coefficients, with queries, thereby solving two open questions posed in a recent work by Bluhm, Caro and Oufkir (Bluhm, A., Caro, M.C., Oufkir, A.). We give a protocol for testing whether a Hamiltonian is -close to being -sparse or -far from being -sparse in the norm of the coefficients, with queries.
2. Learning: We give a protocol to -learn unstructured Hamiltonian in the norm of the coefficients with queries. Combining this with the non-commutative Bohnenblust-Hille inequality, we obtain an algorithm for learning -local Hamiltonians in norm of the coefficients that only uses queries. For Hamiltonians that are -sparse in the Pauli basis, we can learn them in the norm with queries.
3. Learning without quantum memory: The learning results stated above have no dependence on the system size , but require -qubit quantum memory. We give subroutines that allow us to reproduce all the above learning results without quantum memory; squaring the query complexity and paying a -factor in the local case and an -factor in the sparse case.
4. Testing without quantum memory: We give a new subroutine called Pauli hashing, which allows one to tolerantly test -sparse Hamiltonians in the norm using query complexity. A key ingredient is showing that -sparse Pauli channels can be tested in a tolerant fashion as being -close to being -sparse or -far under the diamond norm, using queries via Pauli hashing.
In order to prove these results, we prove new structural theorems for local Hamiltonians, sparse Pauli channels and sparse Hamiltonians. We complement our learning algorithms with lower bounds that are polynomially weaker. Furthermore, our algorithms use short time evolutions and do not assume prior knowledge of the terms on which the Pauli spectrum is supported on, i.e., we do not require prior knowledge of the support of the Hamiltonian
Generative super-resolution of turbulent flows via stochastic interpolants
Capturing the intricate multiscale features of turbulent flows remains a fundamental challenge due to the limited resolution of experimental data and the computational cost of high-fidelity simulations. In many practical scenarios only coarse representations of the flows are feasible, leaving crucial fine-scale dynamics unresolved. This study addresses that limitation by leveraging generative models to perform super-resolution of velocity fields and reconstruct the unresolved scales from low-resolution conditionals. In particular, the recently formalized stochastic interpolants are employed to super-resolve a case study of two-dimensional turbulence. Key to our approach is the iterative application of stochastic interpolants over local patches of the flow field, that enables efficient reconstruction without the need to process the full domain simultaneously. The patch-wise strategy is shown to yield physically consistent super-resolved flow snapshots, and key statistical quantities – such as the kinetic energy spectrum – are accurately recovered. Moreover, the patch-wise approach is observed to produce super-resolutions of a quality comparable to those produced using a full field approach, and, in general, stochastic interpolants are observed to outperform contesting generative models across a range of metrics. Although only demonstrated for a 2D case study, these results highlight the potential of using stochastic interpolants to super-resolve turbulent flows
Online matching on 3-uniform hypergraphs
The online matching problem was introduced by Karp, Vazirani and Vazirani (STOC 1990) on bipartite graphs with vertex arrivals. It is well-known that the optimal competitive ratio is 1-1/e for both integral and fractional versions of the problem. Since then, there has been considerable effort to find optimal competitive ratios for other related settings. In this work, we go beyond the graph case and study the online matching problem on k-uniform hypergraphs. For k=3, we provide an optimal primal-dual fractional algorithm, which achieves a competitive ratio of (e-1)/(e+1)≈0.4621. As our main technical contribution, we present a carefully constructed adversarial instance, which shows that this ratio is in fact optimal. It combines ideas from known hard instances for bipartite graphs under the edge-arrival and vertex-arrival models. For k≥3, we give a simple integral algorithm which performs better than greedy when the online nodes have bounded degree. As a corollary, it achieves the optimal competitive ratio of 1/2 on 3-uniform hypergraphs when every online node has degree at most 2. This is because the special case where every online node has degree 1 is equivalent to the edge-arrival model on graphs, for which an upper bound of 1/2 is known
Towards robust quantitative photoacoustic tomography via learned iterative methods
Photoacoustic tomography (PAT) is a medical imaging modality that can provide high-resolution tissue images based on the optical absorption. Classical reconstruction methods for quantifying the absorption coefficients rely on sufficient prior information to overcome noisy and imperfect measurements. As these methods utilize computationally expensive forward models, the computation becomes slow, limiting their potential for time-critical applications. As an alternative approach, deep learning-based reconstruction methods have been established for faster and more accurate reconstructions. However, most of these methods rely on having a large amount of training data, which is not the case in practice. In this work, we adopt the model-based learned iterative approach for the use in Quantitative PAT (QPAT), in which additional information from the model is iteratively provided to the updating networks, allowing better generalizability with scarce training data. We compare the performance of different learned updates based on gradient descent, Gauss-Newton, and Quasi-Newton methods. The learning tasks are formulated as greedy, requiring iterate-wise optimality, as well as end-to-end, where all networks are trained jointly. The implemented methods are tested with ideal simulated data as well as against a digital twin dataset that emulates scarce training data and high modeling error
Approximating the volume of a truncated relaxation of the independence polytope
Answering a question of Gamarnik and Smedira [15], we give a polynomial time algorithm that approximately computes the volume of a truncation of a relaxation of the independent set polytope, improving on their quasi-polynomial time algorithm. Our algorithm is obtained by viewing the volume as an evaluation of a graph polynomial and we approximate this evaluation using Barvinok’s interpolation method