4,744 research outputs found
ET-AL: Entropy-Targeted Active Learning for Bias Mitigation in Materials Data
Growing materials data and data-driven informatics drastically promote the
discovery and design of materials. While there are significant advancements in
data-driven models, the quality of data resources is less studied despite its
huge impact on model performance. In this work, we focus on data bias arising
from uneven coverage of materials families in existing knowledge. Observing
different diversities among crystal systems in common materials databases, we
propose an information entropy-based metric for measuring this bias. To
mitigate the bias, we develop an entropy-targeted active learning (ET-AL)
framework, which guides the acquisition of new data to improve the diversity of
underrepresented crystal systems. We demonstrate the capability of ET-AL for
bias mitigation and the resulting improvement in downstream machine learning
models. This approach is broadly applicable to data-driven materials discovery,
including autonomous data acquisition and dataset trimming to reduce bias, as
well as data-driven informatics in other scientific domains.Comment: 35 pages, 13 figures, under revie
Uncertainty-Aware Mixed-Variable Machine Learning for Materials Design
Data-driven design shows the promise of accelerating materials discovery but
is challenging due to the prohibitive cost of searching the vast design space
of chemistry, structure, and synthesis methods. Bayesian Optimization (BO)
employs uncertainty-aware machine learning models to select promising designs
to evaluate, hence reducing the cost. However, BO with mixed numerical and
categorical variables, which is of particular interest in materials design, has
not been well studied. In this work, we survey frequentist and Bayesian
approaches to uncertainty quantification of machine learning with mixed
variables. We then conduct a systematic comparative study of their performances
in BO using a popular representative model from each group, the random
forest-based Lolo model (frequentist) and the latent variable Gaussian process
model (Bayesian). We examine the efficacy of the two models in the optimization
of mathematical functions, as well as properties of structural and functional
materials, where we observe performance differences as related to problem
dimensionality and complexity. By investigating the machine learning models'
predictive and uncertainty estimation capabilities, we provide interpretations
of the observed performance differences. Our results provide practical guidance
on choosing between frequentist and Bayesian uncertainty-aware machine learning
models for mixed-variable BO in materials design
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling
Synthetic data has emerged as a promising source for 3D human research as it
offers low-cost access to large-scale human datasets. To advance the diversity
and annotation quality of human models, we introduce a new synthetic dataset,
SynBody, with three appealing features: 1) a clothed parametric human model
that can generate a diverse range of subjects; 2) the layered human
representation that naturally offers high-quality 3D annotations to support
multiple tasks; 3) a scalable system for producing realistic data to facilitate
real-world tasks. The dataset comprises 1.2M images with corresponding accurate
3D annotations, covering 10,000 human body models, 1,187 actions, and various
viewpoints. The dataset includes two subsets for human pose and shape
estimation as well as human neural rendering. Extensive experiments on SynBody
indicate that it substantially enhances both SMPL and SMPL-X estimation.
Furthermore, the incorporation of layered annotations offers a valuable
training resource for investigating the Human Neural Radiance Fields (NeRF).Comment: Accepted by ICCV 2023. Project webpage: https://synbody.github.io
A Survey on Large Language Model based Autonomous Agents
Autonomous agents have long been a prominent research topic in the academic
community. Previous research in this field often focuses on training agents
with limited knowledge within isolated environments, which diverges
significantly from the human learning processes, and thus makes the agents hard
to achieve human-like decisions. Recently, through the acquisition of vast
amounts of web knowledge, large language models (LLMs) have demonstrated
remarkable potential in achieving human-level intelligence. This has sparked an
upsurge in studies investigating autonomous agents based on LLMs. To harness
the full potential of LLMs, researchers have devised diverse agent
architectures tailored to different applications. In this paper, we present a
comprehensive survey of these studies, delivering a systematic review of the
field of autonomous agents from a holistic perspective. More specifically, our
focus lies in the construction of LLM-based agents, for which we propose a
unified framework that encompasses a majority of the previous work.
Additionally, we provide a summary of the various applications of LLM-based AI
agents in the domains of social science, natural science, and engineering.
Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI
agents. Based on the previous studies, we also present several challenges and
future directions in this field. To keep track of this field and continuously
update our survey, we maintain a repository for the related references at
https://github.com/Paitesanshi/LLM-Agent-Survey.Comment: 32 pages, 3 figure
Recommended from our members
An animal model of SARS produced by infection of Macaca mulatta with SARS coronavirus.
A new SARS animal model was established by inoculating SARS coronavirus (SARS-CoV) into rhesus macaques (Macaca mulatta) through the nasal cavity. Pathological pulmonary changes were successively detected on days 5-60 after virus inoculation. All eight animals showed a transient fever 2-3 days after inoculation. Immunological, molecular biological, and pathological studies support the establishment of this SARS animal model. Firstly, SARS-CoV-specific IgGs were detected in the sera of macaques from 11 to 60 days after inoculation. Secondly, SARS-CoV RNA could be detected in pharyngeal swab samples using nested RT-PCR in all infected animals from 5 days after virus inoculation. Finally, histopathological changes of interstitial pneumonia were found in the lungs during the 60 days after viral inoculation: these changes were less marked at later time points, indicating that an active healing process together with resolution of an acute inflammatory response was taking place in these animals. This animal model should provide insight into the mechanisms of SARS-CoV-related pulmonary disease and greatly facilitate the development of vaccines and therapeutics against SARS
LncRNAs: the bridge linking RNA and colorectal cancer.
Long noncoding RNAs (lncRNAs) are transcribed by genomic regions (exceeding 200 nucleotides in length) that do not encode proteins. While the exquisite regulation of lncRNA transcription can provide signals of malignant transformation, lncRNAs control pleiotropic cancer phenotypes through interactions with other cellular molecules including DNA, protein, and RNA. Recent studies have demonstrated that dysregulation of lncRNAs is influential in proliferation, angiogenesis, metastasis, invasion, apoptosis, stemness, and genome instability in colorectal cancer (CRC), with consequent clinical implications. In this review, we explicate the roles of different lncRNAs in CRC, and the potential implications for their clinical application
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Synthesizing high-fidelity head avatars is a central problem for computer
vision and graphics. While head avatar synthesis algorithms have advanced
rapidly, the best ones still face great obstacles in real-world scenarios. One
of the vital causes is inadequate datasets -- 1) current public datasets can
only support researchers to explore high-fidelity head avatars in one or two
task directions; 2) these datasets usually contain digital head assets with
limited data volume, and narrow distribution over different attributes. In this
paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive
advance in head avatar research. It contains massive data assets, with 243+
million complete head frames, and over 800k video sequences from 500 different
identities captured by synchronized multi-view cameras at 30 FPS. It is a
large-scale digital library for head avatars with three key attributes: 1) High
Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K
cameras in 360 degrees. 2) High Diversity: The collected subjects vary from
different ages, eras, ethnicities, and cultures, providing abundant materials
with distinctive styles in appearance and geometry. Moreover, each subject is
asked to perform various motions, such as expressions and head rotations, which
further extend the richness of assets. 3) Rich Annotations: we provide
annotations with different granularities: cameras' parameters, matting, scan,
2D/3D facial landmarks, FLAME fitting, and text description.
Based on the dataset, we build a comprehensive benchmark for head avatar
research, with 16 state-of-the-art methods performed on five main tasks: novel
view synthesis, novel expression synthesis, hair rendering, hair editing, and
talking head generation. Our experiments uncover the strengths and weaknesses
of current methods. RenderMe-360 opens the door for future exploration in head
avatars.Comment: Technical Report; Project Page: 36; Github Link:
https://github.com/RenderMe-360/RenderMe-36
Genome sequence of the Ornithopus/Lupinus-nodulating Bradyrhizobium sp. strain WSM471
Bradyrhizobium sp. strain WSM471 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-(N-2) fixing root nodule formed on the annual legume Ornithopus pinnatus (Miller) Druce growing at Oyster Harbour, Albany district, Western Australia in 1982. This strain is in commercial production as an inoculant for Lupinus and Ornithopus. Here we describe the features of Bradyrhizobium sp. strain WSM471, together with genome sequence information and annotation. The 7,784,016 bp high-quality-draft genome is arranged in 1 scaffold of 2 contigs, contains 7,372 protein-coding genes and 58 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program
- …