88 research outputs found
D: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which
collects data from their own data sources, it would be most useful when the
data collected from different workers can be {\em unique} and {\em different}.
Ironically, recent analysis of decentralized parallel stochastic gradient
descent (D-PSGD) relies on the assumption that the data hosted on different
workers are {\em not too different}. In this paper, we ask the question: {\em
Can we design a decentralized parallel stochastic gradient descent algorithm
that is less sensitive to the data variance across workers?} In this paper, we
present D, a novel decentralized parallel stochastic gradient descent
algorithm designed for large data variance \xr{among workers} (imprecisely,
"decentralized" data). The core of D is a variance blackuction extension of
the standard D-PSGD algorithm, which improves the convergence rate from
to where
denotes the variance among data on different workers. As a result, D is
robust to data variance among workers. We empirically evaluated D on image
classification tasks where each worker has access to only the data of a limited
set of labels, and find that D significantly outperforms D-PSGD
Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Large language models (LLMs) have unveiled remarkable reasoning capabilities
by exploiting chain-of-thought (CoT) prompting, which generates intermediate
reasoning chains to serve as the rationale for deriving the answer. However,
current CoT methods either simply employ general prompts such as Let's think
step by step, or heavily rely on pre-defined task-specific demonstrations to
attain preferable performances, thereby engendering an inescapable gap between
performance and generalization. To bridge this gap, we propose GeM-CoT, a
Generalizable CoT prompting mechanism in Mixed-task scenarios where the type of
input questions is unknown. GeM-CoT first categorizes the question type and
subsequently samples or constructs demonstrations from the corresponding data
pool in an automatic pattern. With this technical design, GeM-CoT
simultaneously enjoys superior generalization capabilities and remarkable
performances on 10 public reasoning tasks and 23 BBH tasks.Comment: 17 pages, 12 figure
Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers
Large language models (LLMs) have shown remarkable ability on controllable
text generation. However, the potential of LLMs in generating text from
structured tables remains largely under-explored. In this paper, we study the
capabilities of LLMs for table-to-text generation tasks, particularly aiming to
investigate their performance in generating natural language statements that
can be logically entailed by a provided table. First, we investigate how LLMs
compare to state-of-the-art table-to-text fine-tuned models, and demonstrate
that LLMs can generate statements with higher faithfulness compared with
previous state-of-the-art fine-tuned models. Given this finding, we next
explore whether LLMs can serve as faithfulness-level automated evaluation
metrics. Through human evaluation, we show that evaluation metrics adopted from
LLMs correlates better with human judgments compared with existing
faithfulness-level metrics. Finally, we demonstrate that LLMs using
chain-of-thought prompting can generate high-fidelity natural language feedback
for other table-to-text models' generations, provide insights for future work
regarding the distillation of text generation capabilities from LLMs to smaller
models.Comment: work in progres
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
Pre-trained language models like ChatGPT have significantly improved code
generation. As these models scale up, there is an increasing need for the
output to handle more intricate tasks. Moreover, in bioinformatics, generating
functional programs poses additional notable challenges due to the amount of
domain knowledge, the need for complicated data operations, and intricate
functional dependencies between the operations. Here, we present BioCoder, a
benchmark developed to evaluate existing pre-trained models in generating
bioinformatics code. In relation to function-code generation, BioCoder covers
potential package dependencies, class declarations, and global variables. It
incorporates 1026 functions and 1243 methods in Python and Java from GitHub and
253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing
framework for evaluation, and we have applied it to evaluate many models
including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+,
InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes
the importance of domain knowledge, pragmatic code generation, and contextual
understanding. Our dataset, benchmark, Docker images, and scripts required for
testing are all available at https://github.com/gersteinlab/biocoder
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Despite the remarkable capabilities of Large Language Models (LLMs) like
GPT-4, producing complex, structured tabular data remains challenging. Our
study assesses LLMs' proficiency in structuring tables and introduces a novel
fine-tuning method, cognizant of data structures, to bolster their performance.
We unveil Struc-Bench, a comprehensive benchmark featuring prominent LLMs
(GPT-NeoX-20B, GPT-3.5, GPT-4, and Vicuna), which spans text tables, HTML, and
LaTeX formats. Our proposed FormatCoT aids in crafting format-specific
instructions from the intended outputs to populate this benchmark. Addressing
the gap in task-centered evaluation, we propose two innovative metrics, P-Score
(Prompting Score) and H-Score (Heuristical Score), to more accurately gauge LLM
performance. Our experiments show that applying our structure-aware fine-tuning
to LLaMA-7B leads to substantial performance gains, outshining its LLM
counterparts across most measures. In-depth error analysis and creating an
ability map across six dimensions -- coverage, formatting, reasoning,
comprehension, pragmatics, and hallucination -- highlight areas for future
enhancements and suggest forthcoming research trajectories. Our code and models
can be found at https://github.com/gersteinlab/Struc-Bench
Hop: Heterogeneity-Aware Decentralized Training
Recent work has shown that decentralized algorithms can deliver superior
performance over centralized ones in the context of machine learning. The two
approaches, with the main difference residing in their distinct communication
patterns, are both susceptible to performance degradation in heterogeneous
environments. Although vigorous efforts have been devoted to supporting
centralized algorithms against heterogeneity, little has been explored in
decentralized algorithms regarding this problem.
This paper proposes Hop, the first heterogeneity-aware decentralized training
protocol. Based on a unique characteristic of decentralized training that we
have identified, the iteration gap, we propose a queue-based synchronization
mechanism that can efficiently implement backup workers and bounded staleness
in the decentralized setting. To cope with deterministic slowdown, we propose
skipping iterations so that the effect of slower workers is further mitigated.
We build a prototype implementation of Hop on TensorFlow. The experiment
results on CNN and SVM show significant speedup over standard decentralized
training in heterogeneous settings
- …