5 research outputs found

    A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems With User Preferences

    No full text
    corecore