49 research outputs found
GameEval: Evaluating LLMs on Conversational Games
The rapid advancements in large language models (LLMs) have presented
challenges in evaluating those models. Existing evaluation methods are either
reference-based or preference based, which inevitably need human intervention
or introduce test bias caused by evaluator models. In this paper, we propose
GameEval, a novel approach to evaluating LLMs through goal-driven
conversational games, overcoming the limitations of previous methods. GameEval
treats LLMs as game players and assigns them distinct roles with specific goals
achieved by launching conversations of various forms, including discussion,
question answering, and voting. We design three unique games with cooperative
or adversarial objectives, accompanied by corresponding evaluation metrics, to
show how this new paradigm comprehensively evaluates model performance.Through
extensive experiments, we show that GameEval can effectively differentiate the
capabilities of various LLMs, providing a comprehensive assessment of their
integrated abilities to solve complex problems. Our public anonymous code is
available at https://github.com/GameEval/GameEval
Learning to Program with Natural Language
Large Language Models (LLMs) have shown remarkable performance in various
basic natural language tasks, which raises hope for achieving Artificial
General Intelligence. For completing the complex task, we still need a program
for the task first and then ask LLMs to follow the program to generate the
specific solution. We propose using natural language as a new programming
language to describe task procedures, making them easily understandable to both
humans and LLMs. ~The LLM is capable of directly generating natural language
programs, but these programs may still contain factual errors or incomplete
steps. Therefore, we further propose the Learning to Program (\text{LP}) method
to ask LLMs themselves to learn the natural language program based on the
training dataset of the complex task first and then use the learned program to
guide the inference. Our experiments on the reasoning tasks of five different
reasoning types (8 datasets) demonstrate the effectiveness of our approach.
Further, our analysis experiment shows that the learned program can be directly
used to guide another LLM to improve its performance, which reveals a new
transfer learning paradigm.Comment: Work in progres
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
Controllable video generation has gained significant attention in recent
years. However, two main limitations persist: Firstly, most existing works
focus on either text, image, or trajectory-based control, leading to an
inability to achieve fine-grained control in videos. Secondly, trajectory
control research is still in its early stages, with most experiments being
conducted on simple datasets like Human3.6M. This constraint limits the models'
capability to process open-domain images and effectively handle complex curved
trajectories. In this paper, we propose DragNUWA, an open-domain
diffusion-based video generation model. To tackle the issue of insufficient
control granularity in existing works, we simultaneously introduce text, image,
and trajectory information to provide fine-grained control over video content
from semantic, spatial, and temporal perspectives. To resolve the problem of
limited open-domain trajectory control in current research, We propose
trajectory modeling with three aspects: a Trajectory Sampler (TS) to enable
open-domain control of arbitrary trajectories, a Multiscale Fusion (MF) to
control trajectories in different granularities, and an Adaptive Training (AT)
strategy to generate consistent videos following trajectories. Our experiments
validate the effectiveness of DragNUWA, demonstrating its superior performance
in fine-grained control in video generation. The homepage link is
\url{https://www.microsoft.com/en-us/research/project/dragnuwa/
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Artificial Intelligence (AI) has made incredible progress recently. On the
one hand, advanced foundation models like ChatGPT can offer powerful
conversation, in-context learning and code generation abilities on a broad
range of open-domain tasks. They can also generate high-level solution outlines
for domain-specific tasks based on the common sense knowledge they have
acquired. However, they still face difficulties with some specialized tasks
because they lack enough domain-specific data during pre-training or they often
have errors in their neural network computations on those tasks that need
accurate executions. On the other hand, there are also many existing models and
systems (symbolic-based or neural-based) that can do some domain-specific tasks
very well. However, due to the different implementation or working mechanisms,
they are not easily accessible or compatible with foundation models. Therefore,
there is a clear and pressing need for a mechanism that can leverage foundation
models to propose task solution outlines and then automatically match some of
the sub-tasks in the outlines to the off-the-shelf models and systems with
special functionalities to complete them. Inspired by this, we introduce
TaskMatrix.AI as a new AI ecosystem that connects foundation models with
millions of APIs for task completion. Unlike most previous work that aimed to
improve a single AI model, TaskMatrix.AI focuses more on using existing
foundation models (as a brain-like central system) and APIs of other AI models
and systems (as sub-task solvers) to achieve diversified tasks in both digital
and physical domains. As a position paper, we will present our vision of how to
build such an ecosystem, explain each key component, and use study cases to
illustrate both the feasibility of this vision and the main challenges we need
to address next
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Tribological Properties and Wear Mechanism of C/C Composite Applied in Finger Seal
The application of C/C composites in finger seals can effectively solve the problem of seal wear due to its excellent tribological and mechanical behaviors. However, the designable characteristics of composites, such as the density and orientation of fabric on the friction plane, have a very important influence on the tribological properties and service life of sealing materials. In order to obtain a better material design scheme for the C/C composite on the finger seal, it is necessary to conduct research on the tribological properties and wear mechanism of the C/C composite based on the working conditions of the finger seal. Therefore, a reciprocating tribo-tester was used to conduct the test by abrading the C/C composite disk with a pin made of 1045,080M46. The effects of material density, fabric orientation, and load and sliding velocity on the tribological properties and wear mechanism of the C/C composite were studied. The results show that the friction coefficient and wear rate of the composite with a perpendicular orientation (non-woven cloth perpendicular to the friction plane) were lower than those with a parallel orientation (non-woven cloth parallel to the friction plane). The tribological properties with higher density are better than those of material with a lower density. The friction coefficient of low-density material increases with the load, whereas it decreases gradually with high-density material. The wear rate increases with the load for two-density materials. With the increase in the sliding velocity, the friction coefficient decreases. The wear rate of low-density material decreases significantly, whereas it changes little with high-density material. The influence of the sliding velocity on the friction and wear properties of the C/C composite is greater than that of the load. This study provides a feasible material design idea for effectively alleviating the wear of finger seals
Time Reverse Modeling of Damage Detection in Underwater Concrete Beams Using Piezoelectric Intelligent Modules
Underwater cracks in concrete structures are often difficult to detect due to their complexity of the service environment. With numerical and experimental analysis of concrete beams immersed in water, an active monitoring system, based on a cement-based piezoelectric intelligent module array (CPIMA), was developed to locate and quantify the underwater cracks. Time reversal (TR) of the stress wave field is accomplished to focus on the crack area through the concrete beam specimen by the system. First, a piezoelectric actuator is applied to emit the initial propagating wave, which can be reflected, attenuated, and diffracted by the crack, transmitted through water filled in the crack, as well as diffracted by the coarse aggregates. To extract the damage waveforms associated with the crack and analyze the robust time-reversal invariance under the high-order multiple scattering effect, a pair of homogeneous and heterogeneous forward finite element (FE) models is established. Then, the damage waveforms are time-reversed and re-propagated in the inverse numerical model, where an optimal refocusing is achieved on the crack that behaves as an acoustic source. Finally, the damage area is obtained in the form of the stacked energy distribution of each time step. The focus results are represented by cloud images and compared with root-mean-square deviation (RMSD) values. Numerical simulation and experiments show that this method can identify and quantify underwater cracks effectively
Data from: Nitrogen deposition cancels out exotic earthworm effects on plant-feeding nematode communities
The activity and spread of exotic earthworms often are spatially correlated with N deposition because both arise from human activities. Exotic earthworms, in turn, can also greatly affect soil abiotic and biotic properties, as well as related ecological processes. Previous studies showed, for example, that earthworms can counteract the detrimental effects of plant-feeding nematodes on plant growth. However, potential interactive effects of N deposition and exotic earthworms on ecosystems are poorly understood.
We explored the changes in density of plant-feeding nematodes in response to the presence of exotic earthworms, and whether these changes are altered by elevated N deposition in a two-factorial field mesocosm experiment at the Heshan National Field Research Station of Forest Ecosystem, in southern China.
Our results show that earthworm addition marginally significantly increased the density of exotic earthworms and significantly increased the mass of earthworm casts. The total density of plant-feeding nematodes was not significantly affected by exotic earthworms or N deposition. However, exotic earthworms tended to increase the density of plant-feeding nematode taxa that are less detrimental to plant growth (r-strategists), while they significantly reduced the density of more harmful plant-feeding nematodes (K-strategists). Importantly, these earthworm effects were restricted to the ambient N deposition treatment, and elevated N deposition cancelled out the earthworm effect. Although exotic earthworms and N deposition interactively altered foliar N : P ratio in the target tree species, this did not result in significant changes in shoot and root biomass in the short term.
Overall, our study indicates that N deposition can cancel out exotic earthworm-induced reductions in the density of harmful plant-feeding nematodes. These results suggest that anthropogenic N deposition can alter biotic interactions between exotic and native soil organisms with potential implications for ecosystem functioning