125 research outputs found
Inferring Social Status and Rich Club Effects in Enterprise Communication Networks
Social status, defined as the relative rank or position that an individual
holds in a social hierarchy, is known to be among the most important motivating
forces in social behaviors. In this paper, we consider the notion of status
from the perspective of a position or title held by a person in an enterprise.
We study the intersection of social status and social networks in an
enterprise. We study whether enterprise communication logs can help reveal how
social interactions and individual status manifest themselves in social
networks. To that end, we use two enterprise datasets with three communication
channels --- voice call, short message, and email --- to demonstrate the
social-behavioral differences among individuals with different status. We have
several interesting findings and based on these findings we also develop a
model to predict social status. On the individual level, high-status
individuals are more likely to be spanned as structural holes by linking to
people in parts of the enterprise networks that are otherwise not well
connected to one another. On the community level, the principle of homophily,
social balance and clique theory generally indicate a "rich club" maintained by
high-status individuals, in the sense that this community is much more
connected, balanced and dense. Our model can predict social status of
individuals with 93% accuracy.Comment: 13 pages, 4 figure
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration
We identify two crucial limitations in the evaluation of recent
parallel-integrated method Parallel Context Windows (PCW), which extends the
maximum context lengths of language models, e.g., 2048 for LLaMA, by harnessing
window-wise attention and positional embedding techniques. We first show that a
simple yet strong baseline, weighted sum ensemble, is missing for the
in-context few-shot classification. Moreover, on more challenging
Chain-of-Thought (CoT) reasoning (e.g., HotpotQA), PCW would present unexpected
deterioration regarding question miscomprehension and false inference. Based on
our findings, we suggest that the existing PCW design may not guarantee
sufficient improvement and practicality in handling lengthy documents in
real-world applications. More community efforts on enabling language models'
long context understanding ability should be paid
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Open large language models (LLMs) with great performance in various tasks
have significantly advanced the development of LLMs. However, they are far
inferior to commercial models such as ChatGPT and GPT-4 when acting as agents
to tackle complex tasks in the real world. These agent tasks employ LLMs as the
central controller responsible for planning, memorization, and tool
utilization, necessitating both fine-grained prompting methods and robust LLMs
to achieve satisfactory performance. Though many prompting methods have been
proposed to complete particular agent tasks, there is lack of research focusing
on improving the agent capabilities of LLMs themselves without compromising
their general abilities. In this work, we present AgentTuning, a simple and
general method to enhance the agent abilities of LLMs while maintaining their
general LLM capabilities. We construct AgentInstruct, a lightweight
instruction-tuning dataset containing high-quality interaction trajectories. We
employ a hybrid instruction-tuning strategy by combining AgentInstruct with
open-source instructions from general domains. AgentTuning is used to
instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show
that AgentTuning enables LLMs' agent capabilities without compromising general
abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent
tasks, demonstrating generalized agent capabilities. We open source the
AgentInstruct and AgentLM-7B, 13B, and 70B models at
https://github.com/THUDM/AgentTuning, serving open and powerful alternatives to
commercial LLMs for agent tasks.Comment: 31 page
GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner
Graph self-supervised learning (SSL), including contrastive and generative
approaches, offers great potential to address the fundamental challenge of
label scarcity in real-world graph data. Among both sets of graph SSL
techniques, the masked graph autoencoders (e.g., GraphMAE)--one type of
generative method--have recently produced promising results. The idea behind
this is to reconstruct the node features (or structures)--that are randomly
masked from the input--with the autoencoder architecture. However, the
performance of masked feature reconstruction naturally relies on the
discriminability of the input features and is usually vulnerable to disturbance
in the features. In this paper, we present a masked self-supervised learning
framework GraphMAE2 with the goal of overcoming this issue. The idea is to
impose regularization on feature reconstruction for graph SSL. Specifically, we
design the strategies of multi-view random re-mask decoding and latent
representation prediction to regularize the feature reconstruction. The
multi-view random re-mask decoding is to introduce randomness into
reconstruction in the feature space, while the latent representation prediction
is to enforce the reconstruction in the embedding space. Extensive experiments
show that GraphMAE2 can consistently generate top results on various public
datasets, including at least 2.45% improvements over state-of-the-art baselines
on ogbn-Papers100M with 111M nodes and 1.6B edges.Comment: Accepted to WWW'2
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
We present ImageReward -- the first general-purpose text-to-image human
preference reward model -- to address various prevalent issues in generative
models and align them with human values and preferences. Its training is based
on our systematic annotation pipeline that covers both the rating and ranking
components, collecting a dataset of 137k expert comparisons to date. In human
evaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by
38.6\%), making it a promising automatic metric for evaluating and improving
text-to-image synthesis. The reward model is publicly available via the
\texttt{image-reward} package at \url{https://github.com/THUDM/ImageReward}.Comment: 24 page
Does Negative Sampling Matter? A Review with Insights into its Theory and Applications
Negative sampling has swiftly risen to prominence as a focal point of
research, with wide-ranging applications spanning machine learning, computer
vision, natural language processing, data mining, and recommender systems. This
growing interest raises several critical questions: Does negative sampling
really matter? Is there a general framework that can incorporate all existing
negative sampling methods? In what fields is it applied? Addressing these
questions, we propose a general framework that leverages negative sampling.
Delving into the history of negative sampling, we trace the development of
negative sampling through five evolutionary paths. We dissect and categorize
the strategies used to select negative sample candidates, detailing global,
local, mini-batch, hop, and memory-based approaches. Our review categorizes
current negative sampling methods into five types: static, hard, GAN-based,
Auxiliary-based, and In-batch methods, providing a clear structure for
understanding negative sampling. Beyond detailed categorization, we highlight
the application of negative sampling in various areas, offering insights into
its practical benefits. Finally, we briefly discuss open problems and future
directions for negative sampling.Comment: 20 pages, 11 figure
BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs
In-Batch contrastive learning is a state-of-the-art self-supervised method
that brings semantically-similar instances close while pushing dissimilar
instances apart within a mini-batch. Its key to success is the negative sharing
strategy, in which every instance serves as a negative for the others within
the mini-batch. Recent studies aim to improve performance by sampling hard
negatives \textit{within the current mini-batch}, whose quality is bounded by
the mini-batch itself. In this work, we propose to improve contrastive learning
by sampling mini-batches from the input data. We present
BatchSampler\footnote{The code is available at
\url{https://github.com/THUDM/BatchSampler}} to sample mini-batches of
hard-to-distinguish (i.e., hard and true negatives to each other) instances. To
make each mini-batch have fewer false negatives, we design the proximity graph
of randomly-selected instances. To form the mini-batch, we leverage random walk
with restart on the proximity graph to help sample hard-to-distinguish
instances. BatchSampler is a simple and general technique that can be directly
plugged into existing contrastive learning models in vision, language, and
graphs. Extensive experiments on datasets of three modalities show that
BatchSampler can consistently improve the performance of powerful contrastive
models, as shown by significant improvements of SimCLR on ImageNet-100, SimCSE
on STS (language), and GraphCL and MVGRL on graph datasets.Comment: 17 pages, 16 figure
- …