68 research outputs found
Low-carbon scenario analysis on urban transport of one metropolitan in China in 2020
Purpose: This paper discussed possible ways of implementing effective energy conservation and GHG emission reduction measures by providing: the forecasts of mid-to-long term city-wide carbon emission rate; and the analysis of potential low-carbon transport solutions.
Design/methodology/approach: According to the characteristics of the transport system in Beijing, based on the review and application analysis of existing transport energy and GHG emission calculation models, the comprehensive carbon emission calculation model established. Existing data were utilized with regression analysis to project the prospective traffic data in the baseline scenario at the target year of 2020 to calculate the emission amount. Four low-carbon scenarios were set in accordance with the goal of “low carbon transportation, green trip”, and the effectiveness of each low-carbon scenario was evaluated by comparing them with the baseline scenario in terms of the respective GHG emission rate.
Findings: Under the current developing trend in policy environment and technical specifications, the total projected GHG (CO2) emissions from transport sector at 2020 in Beijing will reach 24.69 million t CO2; private-vehicle is the major contributor among all transport modes at 15.96 million t CO2.
Practical implications: Limiting the growth in private-vehicle ownership, reducing the frequency of mid-to-long range travel and the average trip distance, and prompting the public transit oriented policies are all possible solutions to reduce carbon emission. The most effective practice involves a shift in public travel behavior.
Originality/value: This paper presents a method to forecast the mid-to-long term city-wide carbon emission rate; and provides some potential low-carbon transport solutions.Peer Reviewe
Analysis on Alighting and Boarding Movement Laws in Subway Using Modified Social Force Model
This paper presents a multi-agent simulator based on social force model to simulate each passenger’s boarding and alighting behavior both in a train and on a platform seamlessly. Passengers can be divided into three types: to board, alight and stay in train. They have different individual attributes and follow different walking rules. Due to the characteristics of subway environment and passengers' behavior in boarding and alighting, some adjustment and improvement were made to the basic social force model: (1) In some cases during the process of boarding and alighting, the driving force targeting to destination needs to be doubled, and the repulsion force between two agents needs to be reduced. (2) Passengers who stay in the train show quite different movement from the usual pedestrian. They usually want to remain still, unless they are in front of the door. To describe their behaviors, we introduced a tangent detour force. The scope of the interaction between agents is extended and some passengers out of the visual field also should be counted. (3) Divide the repulsive force between an agent and an obstacle into the frontal force and convex corner force. These two forces have different spheres of influence and calculation methods. The agents could exhibit reasonable intelligence and diversity during alighting and boarding
RepViT: Revisiting Mobile CNN From ViT Perspective
Recently, lightweight Vision Transformers (ViTs) demonstrate superior
performance and lower latency compared with lightweight Convolutional Neural
Networks (CNNs) on resource-constrained mobile devices. This improvement is
usually attributed to the multi-head self-attention module, which enables the
model to learn global representations. However, the architectural disparities
between lightweight ViTs and lightweight CNNs have not been adequately
examined. In this study, we revisit the efficient design of lightweight CNNs
and emphasize their potential for mobile devices. We incrementally enhance the
mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by
integrating the efficient architectural choices of lightweight ViTs. This ends
up with a new family of pure lightweight CNNs, namely RepViT. Extensive
experiments show that RepViT outperforms existing state-of-the-art lightweight
ViTs and exhibits favorable latency in various vision tasks. On ImageNet,
RepViT achieves over 80\% top-1 accuracy with nearly 1ms latency on an iPhone
12, which is the first time for a lightweight model, to the best of our
knowledge. Our largest model, RepViT-M3, obtains 81.4\% accuracy with only
1.3ms latency. The code and trained models are available at
\url{https://github.com/jameslahm/RepViT}.Comment: 9 pages, 7 figure
Investigating the effectiveness of coacervates produced from conjugated and unconjugated Spirulina protein in delivering unstable oil to the intestinal phase of digestion
This study investigated the potential of complex coacervates produced using Spirulina protein concentrate (SPC) conjugated with maltodextrin (MD) and carrageenan (CG) for encapsulating and delivering sensitive oils. A wet-heating Maillard reaction was employed to conjugate SPC with MD, followed by coacervation with CG to form the conjugate-based coacervates. Additionally, a mixture of unconjugated SPC and MD was coacervated with CG to produce mixture-based coacervates. Both types of coacervates were utilised as wall materials for encapsulating canola oil. The in-vitro digestion of the resulting microcapsules was assessed in oral, gastric, and intestinal phases, focusing on physicochemical parameters such as droplet size, zeta-potential, microstructure, proteolysis, oil release and lipolysis. The findings revealed that microcapsules prepared using both (SPC-MD mixture)-CG and (SPC-MD conjugate)-CG coacervates were remarkably stable against gastric digestion, as evidenced by the minimal production of free amino acids (15 mM). Most of the encapsulated oil (62–67%) was released during the intestinal phase due to the breakdown of the coacervates. Notably, the microcapsules produced with (SPC-MD conjugate)-CG coacervates demonstrated a lower degree of lipolysis (41.77% free fatty acid content) compared to those prepared with (SPC-MD mixture)-CG coacervates (53.35% free fatty acid content). These results highlight the potential of complex coacervates produced using conjugated SPC as promising materials for the encapsulation and delivery of sensitive oils
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Enabling bi-directional retrieval of images and texts is important for
understanding the correspondence between vision and language. Existing methods
leverage the attention mechanism to explore such correspondence in a
fine-grained manner. However, most of them consider all semantics equally and
thus align them uniformly, regardless of their diverse complexities. In fact,
semantics are diverse (i.e. involving different kinds of semantic concepts),
and humans usually follow a latent structure to combine them into
understandable languages. It may be difficult to optimally capture such
sophisticated correspondences in existing methods. In this paper, to address
such a deficiency, we propose an Iterative Matching with Recurrent Attention
Memory (IMRAM) method, in which correspondences between images and texts are
captured with multiple steps of alignments. Specifically, we introduce an
iterative matching scheme to explore such fine-grained correspondence
progressively. A memory distillation unit is used to refine alignment knowledge
from early steps to later ones. Experiment results on three benchmark datasets,
i.e. Flickr8K, Flickr30K, and MS COCO, show that our IMRAM achieves
state-of-the-art performance, well demonstrating its effectiveness. Experiments
on a practical business advertisement dataset, named \Ads{}, further validates
the applicability of our method in practical scenarios.Comment: 9 pages; Accepted by CVPR202
GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition
The dominant approaches for named entity recognition (NER) mostly adopt
complex recurrent neural networks (RNN), e.g., long-short-term-memory (LSTM).
However, RNNs are limited by their recurrent nature in terms of computational
efficiency. In contrast, convolutional neural networks (CNN) can fully exploit
the GPU parallelism with their feedforward architectures. However, little
attention has been paid to performing NER with CNNs, mainly owing to their
difficulties in capturing the long-term context information in a sequence. In
this paper, we propose a simple but effective CNN-based network for NER, i.e.,
gated relation network (GRN), which is more capable than common CNNs in
capturing long-term context. Specifically, in GRN we firstly employ CNNs to
explore the local context features of each word. Then we model the relations
between words and use them as gates to fuse local context features into global
ones for predicting labels. Without using recurrent layers that process a
sentence in a sequential manner, our GRN allows computations to be performed in
parallel across the entire sentence. Experiments on two benchmark NER datasets
(i.e., CoNLL2003 and Ontonotes 5.0) show that, our proposed GRN can achieve
state-of-the-art performance with or without external knowledge. It also enjoys
lower time costs to train and test.We have made the code publicly available at
https://github.com/HuiChen24/NER-GRN.Comment: This paper is accepted by AAAI 201
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Vision and text have been fully explored in contemporary video-text
foundational models, while other modalities such as audio and subtitles in
videos have not received sufficient attention. In this paper, we resort to
establish connections between multi-modality video tracks, including Vision,
Audio, and Subtitle, and Text by exploring an automatically generated
large-scale omni-modality video caption dataset called VAST-27M. Specifically,
we first collect 27 million open-domain video clips and separately train a
vision and an audio captioner to generate vision and audio captions. Then, we
employ an off-the-shelf Large Language Model (LLM) to integrate the generated
captions, together with subtitles and instructional prompts into omni-modality
captions. Based on the proposed VAST-27M dataset, we train an omni-modality
video-text foundational model named VAST, which can perceive and process
vision, audio, and subtitle modalities from video, and better support various
tasks including vision-text, audio-text, and multi-modal video-text tasks
(retrieval, captioning and QA). Extensive experiments have been conducted to
demonstrate the effectiveness of our proposed VAST-27M corpus and VAST
foundation model. VAST achieves 22 new state-of-the-art results on various
cross-modality benchmarks. Code, model and dataset will be released at
https://github.com/TXH-mercury/VAST.Comment: 23 pages, 5 figure
Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources
For languages with no annotated resources, transferring knowledge from
rich-resource languages is an effective solution for named entity recognition
(NER). While all existing methods directly transfer from source-learned model
to a target language, in this paper, we propose to fine-tune the learned model
with a few similar examples given a test case, which could benefit the
prediction by leveraging the structural and semantic information conveyed in
such similar examples. To this end, we present a meta-learning algorithm to
find a good model parameter initialization that could fast adapt to the given
test case and propose to construct multiple pseudo-NER tasks for meta-training
by computing sentence similarities. To further improve the model's
generalization ability across different languages, we introduce a masking
scheme and augment the loss function with an additional maximum term during
meta-training. We conduct extensive experiments on cross-lingual named entity
recognition with minimal resources over five target languages. The results show
that our approach significantly outperforms existing state-of-the-art methods
across the board.Comment: This paper is accepted by AAAI2020. Code is available at
https://github.com/microsoft/vert-papers/tree/master/papers/Meta-Cros
InfoEntropy Loss to Mitigate Bias of Learning Difficulties for Generative Language Models
Generative language models are usually pretrained on large text corpus via
predicting the next token (i.e., sub-word/word/phrase) given the previous ones.
Recent works have demonstrated the impressive performance of large generative
language models on downstream tasks. However, existing generative language
models generally neglect an inherent challenge in text corpus during training,
i.e., the imbalance between frequent tokens and infrequent ones. It can lead a
language model to be dominated by common and easy-to-learn tokens, thereby
overlooking the infrequent and difficult-to-learn ones. To alleviate that, we
propose an Information Entropy Loss (InfoEntropy Loss) function. During
training, it can dynamically assess the learning difficulty of a to-be-learned
token, according to the information entropy of the corresponding predicted
probability distribution over the vocabulary. Then it scales the training loss
adaptively, trying to lead the model to focus more on the difficult-to-learn
tokens. On the Pile dataset, we train generative language models at different
scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models
incorporating the proposed InfoEntropy Loss can gain consistent performance
improvement on downstream benchmarks
- …