11 research outputs found
The Sword of the Halhala River
The man was climbing Nameless Hill, on the bank of the Halhala River. It wasn’t particularly high—more like a dwarf among hills, really—but even so, climbing it was hard work, and he was panting. From the corner of his eye he glimpsed drops of his sweat falling into the thick grass and immediately evaporating. Several times on his journey he had felt so discouraged that he would have liked to have given the whole thing up, but he couldn’t quite make himself do it, so he struggled on, swearing..
The Sword of the Halhala River
The man was climbing Nameless Hill, on the bank of the Halhala River. It wasn’t particularly high—more like a dwarf among hills, really—but even so, climbing it was hard work, and he was panting. From the corner of his eye he glimpsed drops of his sweat falling into the thick grass and immediately evaporating. Several times on his journey he had felt so discouraged that he would have liked to have given the whole thing up, but he couldn’t quite make himself do it, so he struggled on, swearing..
Universal Self-Consistency for Large Language Model Generation
Self-consistency with chain-of-thought prompting (CoT) has demonstrated
remarkable performance gains on various challenging tasks, by utilizing
multiple reasoning paths sampled from large language models (LLMs). However,
self-consistency relies on the answer extraction process to aggregate multiple
solutions, which is not applicable to free-form answers. In this work, we
propose Universal Self-Consistency (USC), which leverages LLMs themselves to
select the most consistent answer among multiple candidates. We evaluate USC on
a variety of benchmarks, including mathematical reasoning, code generation,
long-context summarization, and open-ended question answering. On open-ended
generation tasks where the original self-consistency method is not applicable,
USC effectively utilizes multiple samples and improves the performance. For
mathematical reasoning, USC matches the standard self-consistency performance
without requiring the answer formats to be similar. Finally, without access to
execution results, USC also matches the execution-based voting performance on
code generation
PaLM 2 Technical Report
We introduce PaLM 2, a new state-of-the-art language model that has better
multilingual and reasoning capabilities and is more compute-efficient than its
predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture
of objectives. Through extensive evaluations on English and multilingual
language, and reasoning tasks, we demonstrate that PaLM 2 has significantly
improved quality on downstream tasks across different model sizes, while
simultaneously exhibiting faster and more efficient inference compared to PaLM.
This improved efficiency enables broader deployment while also allowing the
model to respond faster, for a more natural pace of interaction. PaLM 2
demonstrates robust reasoning capabilities exemplified by large improvements
over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable
performance on a suite of responsible AI evaluations, and enables
inference-time control over toxicity without additional overhead or impact on
other capabilities. Overall, PaLM 2 achieves state-of-the-art performance
across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between
pre-trained models (of various sizes), fine-tuned variants of these models, and
the user-facing products that use these models. In particular, user-facing
products typically include additional pre- and post-processing steps.
Additionally, the underlying models may evolve over time. Therefore, one should
not expect the performance of user-facing products to exactly match the results
reported in this report
An Effective FPGA Solver on Probability Distribution and Preprocessing
The Boolean satisfiability (SAT) problem is the key problem in computer theory and application. A novel algorithm is introduced to implement a SLS hardware solver called probSAT+. The algorithm has no complex heuristic, and it only depends on the concepts of preprocessing technology, probability distribution and centralized search. Through constraining the initial assignments of the variables, the number of flipped variables was reduced while the solver finding a solution. Moreover, the algorithm no longer adopts some non-continuous if-then-else decisions, but depends on a single continuous function f(x,v). The flipping probability is not obtained by complex calculations, instead being selected by looking up tables, which effectively improves the performance of the solver. As far as we know, the probability distribution selection strategy descripted by hardware description language is firstly adopted by hardware SAT solver, which can be easily transplanted to any programmable logic device. The experimental results show that the probSAT+ solver is generally lower than the advanced software solver in the number of flips (up to 9.8 × 10 6 ), and the speedup is approximately 2.6 times with single thread, which shows that the probSAT+ has better results with fewer variables flipping times when a solution can be found. In addition, the success ratio of the solver in finding a solution of the problem in a suitable time is 100%
Effects of Craniotomy and Endoscopic Endonasal Transsphenoidal Surgery on Bodyweight in Adult-Onset Craniopharyngioma: A Single-Center Retrospective Study
Craniopharyngioma (CP) is a histologically benign tumor with high mortality and morbidity. Although surgical treatment is essential in managing CP, the best surgical approach is debated. A retrospective cohort of 117 patients with adult-onset CP (AOCP) treated between 2018 and 2020 in Beijing Tiantan Hospital was identified and examined. The effects of traditional craniotomy (TC) and endoscopic endonasal transsphenoidal surgery (EETS) on the extent of surgical resection, hypothalamic involvement (HI), postoperative endocrine function, and postoperative weight were compared in the cohort. The cohort comprised 43 males and 74 females, divided into the TC (n = 59) and EETS (n = 58) groups. The EETS group possessed a higher rate of gross total resection (GTR) (adjusted odds ratio (aOR) = 4.08, p = 0.029) and improved HI (aOR = 2.58, p = 0.041) than the TC group. Worse postoperative HI was only observed in the TC group (5 patients). The EETS was associated with fewer adverse hormonal outcomes, including posterior pituitary dysfunction (aOR = 0.386, p = 0.040) and hypopituitarism (aOR = 0.384, p = 0.031). Additionally, multivariate logistic regression analysis confirmed that EETS was related to fewer cases of weight gain >5% (aOR = 0.376, p = 0.034), significant weight change (aOR = 0.379, p = 0.022), and postoperative obesity (aOR = 0.259, p = 0.032). Compared to TC, EETS shows advantages in accomplishing GTR, hypothalamus protection, postoperative endocrine function reservation, and postoperative weight control. These data suggest that the EETS deserves more application in managing patients with AOCP