273 research outputs found
Sequential optimization for efficient high-quality object proposal generation
We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING ++, which inherits the virtue of good computational efficiency of BING [1] but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5 and 16.7 percent on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster
ATP: Adaptive Tensor Parallelism for Foundation Models
Foundation models have impressive performance and generalization capabilities
across a wide range of applications. The increasing size of the models
introduces great challenges for the training. Tensor parallelism is a critical
technique that is currently used in almost all foundation model training and
has a significant impact on overall training performance. However, current
tensor parallelism in machine learning frameworks misses optimization
opportunities in fitting various interconnection topologies. In this work, we
present ATP, an adaptive tensor parallelism framework for foundation models,
which can automatically select the optimal parallel strategy on different
interconnections. We propose column- and row-first tensor parallelism based on
2D device meshes and construct a search space. Combined with the hierarchical
communication matrix, ATP can identify the optimal strategy in the search
space. We also propose chunk-based overlapping to reduce communication
overhead. Our evaluations show ATP consistently outperforms the
state-of-the-art approaches for various model sizes and interconnects,
achieving end-to-end training performance improvements of up to 37-64% on
specific interconnects. Based on our theoretical model, the communication
overhead of ATP decreases with scaling, indicating a qualitative leap forward
Tin Nanoparticles Encapsulated Carbon Nanoboxes as High-Performance Anode for Lithium-Ion Batteries
One of the crucial challenges for applying Sn as an anode of lithium-ion batteries (LIBs) is the dramatic volume change during lithiation/delithiation process, which causes a rapid capacity fading and then deteriorated battery performance. To address this issue, herein, we report the design and fabrication of Sn encapsulated carbon nanoboxes (denoted as Sn@C) with yolk@shell architectures. In this design, the carbon shell can facilitate the good transport kinetics whereas the hollow space between Sn and carbon shell can accommodate the volume variation during repeated charge/discharge process. Accordingly, this composite electrode exhibits a high reversible capacity of 675 mAh g−1 at a current density of 0.8 A g−1 after 500 cycles and preserves as high as 366mAh g−1 at a higher current density of 3 A g−1 even after 930 cycles. The enhanced electrochemical performance can be ascribed to the crystal size reduction of Sn cores and the formation of polymeric gel-like layer outside the electrode surface after long-term cycles, resulting in improved capacity and enhanced rate performance
Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models
Online hate is an escalating problem that negatively impacts the lives of
Internet users, and is also subject to rapid changes due to evolving events,
resulting in new waves of online hate that pose a critical threat. Detecting
and mitigating these new waves present two key challenges: it demands
reasoning-based complex decision-making to determine the presence of hateful
content, and the limited availability of training samples hinders updating the
detection model. To address this critical issue, we present a novel framework
called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD
employs a reasoning-based approach that leverages the recently introduced
chain-of-thought (CoT) prompting technique, harnessing the capabilities of
large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot
detection by automatically generating and updating detection prompts with new
derogatory terms and targets in new wave samples to effectively address new
waves of online hate. To demonstrate the effectiveness of our approach, we
compile a new dataset consisting of tweets related to three recently witnessed
new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the
US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal
patterns in these new waves concerning the evolution of events and the pressing
need for techniques to rapidly update existing moderation tools to counteract
them. Comparative evaluations against state-of-the-art tools illustrate the
superiority of our framework, showcasing a substantial 22.22% to 83.33%
improvement in detecting the three new waves of online hate. Our work
highlights the severe threat posed by the emergence of new waves of online hate
and represents a paradigm shift in addressing this threat practically.Comment: To Appear in the 45th IEEE Symposium on Security and Privacy, May
20-23, 202
Charging and discharging in thermal energy storage unit with fin-stone hybrid structure for enhancing heat transfer of phase change materials
This work proposes a fin-stone hybrid structure integrating fins (popular thermal enhancers) and natural stones (widely used sensible heat storage media) to enhance the heat transfer of phase change materials for on-site thermal energy storage applications, with advantages of low cost, environmental friendliness, and easy accessibility. 3D numerical models of charging and discharging in shell-and-tube heat storage units with various configurations, including fins, the fin-stone hybrid structure, stones, and no heat transfer enhancement, were constructed, and the performance evaluation and comparison were carried out. Compared to fins, fin-stone hybrid structures with 20 mm-, 30 mm-, and 40 mm-sized stones shorten the charging time by 67%, 54%, and 56%, and the discharging time by 73%, 60%, and 46%, respectively. Small stones have better heat transfer enhancement, which is attributed to the small volume, large surface area, and contact with the tube and fins. The advantage of the fin-stone hybrid structure, i.e. the shortening of phase change time, is more significant in charging than in discharging, in comparison with stones, as both heat conduction and natural convection are enhanced. Moreover, the hybrid structure exhibits satisfactory temperature stability with a 48.9 °C temperature change in charging and 37.2 °C in discharging, each lower than the fins, which is beneficial to stabilise the heat transfer fluid outlet temperature. The yearly supplied energy of the hybrid structure with 20 mm-sized stones is 121% and 72% more than that of fins and stones, respectively
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
High-resolution Large Multimodal Models (LMMs) encounter the challenges of
excessive visual tokens and quadratic visual complexity. Current
high-resolution LMMs address the quadratic complexity while still generating
excessive visual tokens. However, the redundancy in visual tokens is the key
problem as it leads to more substantial compute. To mitigate this issue, we
propose ConvLLaVA, which employs ConvNeXt, a hierarchical backbone, as the
visual encoder of LMM to replace Vision Transformer (ViT). ConvLLaVA compresses
high-resolution images into information-rich visual features, effectively
preventing the generation of excessive visual tokens. To enhance the
capabilities of ConvLLaVA, we propose two critical optimizations. Since the
low-resolution pretrained ConvNeXt underperforms when directly applied on high
resolution, we update it to bridge the gap. Moreover, since ConvNeXt's original
compression ratio is inadequate for much higher resolution inputs, we train a
successive stage to further compress the visual tokens, thereby reducing
redundancy. These optimizations enable ConvLLaVA to support inputs of 1536x1536
resolution generating only 576 visual tokens, capable of handling images of
arbitrary aspect ratios. Experimental results demonstrate that our method
achieves competitive performance with state-of-the-art models on mainstream
benchmarks. The ConvLLaVA model series are publicly available at
https://github.com/alibaba/conv-llava.Comment: 17 page
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL
Large Language Models (LLMs) play a crucial role in capturing structured
semantics to enhance language understanding, improve interpretability, and
reduce bias. Nevertheless, an ongoing controversy exists over the extent to
which LLMs can grasp structured semantics. To assess this, we propose using
Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to
extract structured semantics. In our assessment, we employ the prompting
approach, which leads to the creation of our few-shot SRL parser, called
PromptSRL. PromptSRL enables LLMs to map natural languages to explicit semantic
structures, which provides an interpretable window into the properties of LLMs.
We find interesting potential: LLMs can indeed capture semantic structures, and
scaling-up doesn't always mirror potential. Additionally, limitations of LLMs
are observed in C-arguments, etc. Lastly, we are surprised to discover that
significant overlap in the errors is made by both LLMs and untrained humans,
accounting for almost 30% of all errors.Comment: Accepted by ICIC 202
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
The rapid advancement of large language models (LLMs) has paved the way for
the development of highly capable autonomous agents. However, existing
multi-agent frameworks often struggle with integrating diverse capable
third-party agents due to reliance on agents defined within their own
ecosystems. They also face challenges in simulating distributed environments,
as most frameworks are limited to single-device setups. Furthermore, these
frameworks often rely on hard-coded communication pipelines, limiting their
adaptability to dynamic task requirements. Inspired by the concept of the
Internet, we propose the Internet of Agents (IoA), a novel framework that
addresses these limitations by providing a flexible and scalable platform for
LLM-based multi-agent collaboration. IoA introduces an agent integration
protocol, an instant-messaging-like architecture design, and dynamic mechanisms
for agent teaming and conversation flow control. Through extensive experiments
on general assistant tasks, embodied AI tasks, and retrieval-augmented
generation benchmarks, we demonstrate that IoA consistently outperforms
state-of-the-art baselines, showcasing its ability to facilitate effective
collaboration among heterogeneous agents. IoA represents a step towards linking
diverse agents in an Internet-like environment, where agents can seamlessly
collaborate to achieve greater intelligence and capabilities. Our codebase has
been released at \url{https://github.com/OpenBMB/IoA}.Comment: work in progres
BING: Binarized normed gradients for objectness estimation at 300fps
Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING’s localization performance, when used in multithresholding straddling expansion (MTSE) postprocessing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersectionover- union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image
Tin Nanoparticles Encapsulated Carbon Nanoboxes as High-Performance Anode for Lithium-Ion Batteries
One of the crucial challenges for applying Sn as an anode of lithium-ion batteries (LIBs) is the dramatic volume change during lithiation/delithiation process, which causes a rapid capacity fading and then deteriorated battery performance. To address this issue, herein, we report the design and fabrication of Sn encapsulated carbon nanoboxes (denoted as Sn@C) with yolk@shell architectures. In this design, the carbon shell can facilitate the good transport kinetics whereas the hollow space between Sn and carbon shell can accommodate the volume variation during repeated charge/discharge process. Accordingly, this composite electrode exhibits a high reversible capacity of 675 mAh g−1 at a current density of 0.8 A g−1 after 500 cycles and preserves as high as 366 mAh g−1 at a higher current density of 3 A g−1 even after 930 cycles. The enhanced electrochemical performance can be ascribed to the crystal size reduction of Sn cores and the formation of polymeric gel-like layer outside the electrode surface after long-term cycles, resulting in improved capacity and enhanced rate performance
- …
