115 research outputs found
Effects of Donor- and Supporter-Based Campaign Networks on Crowdfunding Campaign Success
Driven by the increasing popularity of crowdfunding, academic researchers have examined the impacts of internal social capital accumulated on crowdfunding platforms and external social capital formed through online and offline friend networks on campaign success. However, no research has examined the impacts of social networks from a structural perspective. In the current research, we investigate the extent to which donor- and supporter-based campaign network centralities affect the amount of capital a fundraising campaign is able to generate. Using a panel data set collected from a donation-based crowdfunding platform, Fundly, we reveal that campaign network centralities based on strong ties (shared donors) and weak ties (shared supporters) are more important predictors of fundraising success than the number of donors a campaign has
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Automatic evaluations for natural language generation (NLG) conventionally
rely on token-level or embedding-level comparisons with text references. This
differs from human language processing, for which visual imagination often
improves comprehension. In this work, we propose ImaginE, an imagination-based
automatic evaluation metric for natural language generation. With the help of
StableDiffusion, a state-of-the-art text-to-image generator, we automatically
generate an image as the embodied imagination for the text snippet and compute
the imagination similarity using contextual embeddings. Experiments spanning
several text generation tasks demonstrate that adding machine-generated images
with our ImaginE displays great potential in introducing multi-modal
information into NLG evaluation, and improves existing automatic metrics'
correlations with human similarity judgments in both reference-based and
reference-free evaluation scenarios.Comment: EACL 202
End-to-end Dense Video Captioning as Sequence Generation
Dense video captioning aims to identify the events of interest in an input
video, and generate descriptive captions for each event. Previous approaches
usually follow a two-stage generative process, which first proposes a segment
for each event, then renders a caption for each identified segment. Recent
advances in large-scale sequence generation pretraining have seen great success
in unifying task formulation for a great variety of tasks, but so far, more
complex tasks such as dense video captioning are not able to fully utilize this
powerful paradigm. In this work, we show how to model the two subtasks of dense
video captioning jointly as one sequence generation task, and simultaneously
predict the events and the corresponding descriptions. Experiments on YouCook2
and ViTT show encouraging results and indicate the feasibility of training
complex tasks such as end-to-end dense video captioning integrated into
large-scale pre-trained models
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
The field of text-to-image (T2I) generation has garnered significant
attention both within the research community and among everyday users. Despite
the advancements of T2I models, a common issue encountered by users is the need
for repetitive editing of input prompts in order to receive a satisfactory
image, which is time-consuming and labor-intensive. Given the demonstrated text
generation power of large-scale language models, such as GPT-k, we investigate
the potential of utilizing such models to improve the prompt editing process
for T2I generation. We conduct a series of experiments to compare the common
edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting
T2I, and examine factors that may influence this process. We found that GPT-k
models focus more on inserting modifiers while humans tend to replace words and
phrases, which includes changes to the subject matter. Experimental results
show that GPT-k are more effective in adjusting modifiers rather than
predicting spontaneous changes in the primary subject matters. Adopting the
edit suggested by GPT-k models may reduce the percentage of remaining edits by
20-30%.Comment: EMNLP 202
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Recent advances in text-to-image synthesis make it possible to visualize
machine imaginations for a given context. On the other hand, when generating
text, human writers are gifted at creative visualization, which enhances their
writings by forming imaginations as blueprints before putting down the stories
in words. Inspired by such a cognitive process, we ask the natural question of
whether we can endow machines with the same ability to utilize visual
information and construct a general picture of the context to guide text
generation. In this work, we propose iNLG that uses machine-generated images to
guide language models (LM) in open-ended text generation. The experiments and
analyses demonstrate the effectiveness of iNLG on open-ended text generation
tasks, including text completion, story generation, and concept-to-text
generation in few-shot scenarios. Both automatic metrics and human evaluations
verify that the text snippets generated by our iNLG are coherent and
informative while displaying minor degeneration
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Attaining a high degree of user controllability in visual generation often
requires intricate, fine-grained inputs like layouts. However, such inputs
impose a substantial burden on users when compared to simple text inputs. To
address the issue, we study how Large Language Models (LLMs) can serve as
visual planners by generating layouts from text conditions, and thus
collaborate with visual generative models. We propose LayoutGPT, a method to
compose in-context visual demonstrations in style sheet language to enhance the
visual planning skills of LLMs. LayoutGPT can generate plausible layouts in
multiple domains, ranging from 2D images to 3D indoor scenes. LayoutGPT also
shows superior performance in converting challenging language concepts like
numerical and spatial relations to layout arrangements for faithful
text-to-image generation. When combined with a downstream image generation
model, LayoutGPT outperforms text-to-image models/systems by 20-40% and
achieves comparable performance as human users in designing visual layouts for
numerical and spatial correctness. Lastly, LayoutGPT achieves comparable
performance to supervised methods in 3D indoor scene synthesis, demonstrating
its effectiveness and potential in multiple visual domains
Diagnosing Vision-and-Language Navigation: What Really Matters
Vision-and-language navigation (VLN) is a multimodal task where an agent
follows natural language instructions and navigates in visual environments.
Multiple setups have been proposed, and researchers apply new model
architectures or training techniques to boost navigation performance. However,
recent studies witness a slow-down in the performance improvements in both
indoor and outdoor VLN tasks, and the agents' inner mechanisms for making
navigation decisions remain unclear. To the best of our knowledge, the way the
agents perceive the multimodal input is under-studied and clearly needs
investigations. In this work, we conduct a series of diagnostic experiments to
unveil agents' focus during navigation. Results show that indoor navigation
agents refer to both object tokens and direction tokens in the instruction when
making decisions. In contrast, outdoor navigation agents heavily rely on
direction tokens and have a poor understanding of the object tokens.
Furthermore, instead of merely staring at surrounding objects, indoor
navigation agents can set their sights on objects further from the current
viewpoint. When it comes to vision-and-language alignments, many models claim
that they are able to align object tokens with certain visual targets, but we
cast doubt on the reliability of such alignments
- …