51 research outputs found
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Managing long sequences has become an important and necessary feature for
large language models (LLMs). However, it is still an open question of how to
comprehensively and systematically evaluate the long-sequence capability of
LLMs. One of the reasons is that conventional and widely-used benchmarks mainly
consist of short sequences. In this paper, we propose M4LE, a Multi-ability,
Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation.
M4LE is based on a diverse NLP task pool comprising 36 NLP datasets, 11 task
types and 12 domains. To alleviate the scarcity of tasks with naturally long
sequences and incorporate multiple-ability assessment, we propose an automatic
approach (but with negligible human annotations) to convert short-sequence
tasks into a unified long-sequence scenario where LLMs have to identify single
or multiple relevant spans in long contexts based on explicit or semantic
hints. Specifically, the scenario includes five different types of abilities:
(1) explicit single-span; (2) semantic single-span; (3) explicit multiple-span;
(4) semantic multiple-span; and (5) global context understanding. The resulting
samples in M4LE are evenly distributed from 1k to 8k input length. We conducted
a systematic evaluation on 11 well-established LLMs, especially those optimized
for long-sequence inputs. Our results reveal that: 1) Current LLMs struggle to
understand long context, particularly when tasks require multiple-span
attention. 2) Semantic retrieval task is more difficult for competent LLMs. 3)
Models fine-tuned on longer text with position interpolation have comparable
performance to those using Neural Tangent Kernel (NTK) aware scaling methods
without fine-tuning. We make our benchmark publicly available to encourage
future research in this challenging area.Comment: Code and data are available at https://github.com/KwanWaiChung/M4L
Aligning Large Language Models with Human: A Survey
Large Language Models (LLMs) trained on extensive textual corpora have
emerged as leading solutions for a broad array of Natural Language Processing
(NLP) tasks. Despite their notable performance, these models are prone to
certain limitations such as misunderstanding human instructions, generating
potentially biased content, or factually incorrect (hallucinated) information.
Hence, aligning LLMs with human expectations has become an active area of
interest within the research community. This survey presents a comprehensive
overview of these alignment technologies, including the following aspects. (1)
Data collection: the methods for effectively collecting high-quality
instructions for LLM alignment, including the use of NLP benchmarks, human
annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed
review of the prevailing training methods employed for LLM alignment. Our
exploration encompasses Supervised Fine-tuning, both Online and Offline human
preference training, along with parameter-efficient training mechanisms. (3)
Model Evaluation: the methods for evaluating the effectiveness of these
human-aligned LLMs, presenting a multifaceted approach towards their
assessment. In conclusion, we collate and distill our findings, shedding light
on several promising future research avenues in the field. This survey,
therefore, serves as a valuable resource for anyone invested in understanding
and advancing the alignment of LLMs to better suit human-oriented tasks and
expectations. An associated GitHub link collecting the latest papers is
available at https://github.com/GaryYufei/AlignLLMHumanSurvey.Comment: work in progres
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
The ability to follow instructions is crucial for Large Language Models
(LLMs) to handle various real-world applications. Existing benchmarks primarily
focus on evaluating pure response quality, rather than assessing whether the
response follows constraints stated in the instruction. To fill this research
gap, in this paper, we propose FollowBench, a Multi-level Fine-grained
Constraints Following Benchmark for LLMs. FollowBench comprehensively includes
five different types (i.e., Content, Situation, Style, Format, and Example) of
fine-grained constraints. To enable a precise constraint following estimation
on diverse difficulties, we introduce a Multi-level mechanism that
incrementally adds a single constraint to the initial instruction at each
increased level. To assess whether LLMs' outputs have satisfied every
individual constraint, we propose to prompt strong LLMs with
constraint-evolution paths to handle challenging open-ended instructions. By
evaluating ten closed-source and open-source popular LLMs on FollowBench, we
highlight the weaknesses of LLMs in instruction following and point towards
potential avenues for future work. The data and code are publicly available at
https://github.com/YJiangcm/FollowBench.Comment: 19 pages, 9 figures, 14 table
Effects of Phase-Locking Deficits on Speech Recognition in Older Adults With Presbycusis
Objective: People with presbycusis (PC) often report difficulties in speech recognition, especially under noisy listening conditions. Investigating the PC-related changes in central representations of envelope signals and temporal fine structure (TFS) signals of speech sounds is critical for understanding the mechanism underlying the PC-related deficit in speech recognition. Frequency-following responses (FFRs) to speech stimulation can be used to examine the subcortical encoding of both envelope and TFS speech signals. This study compared FFRs to speech signals between listeners with PC and those with clinically normal hearing (NH) under either quiet or noise-masking conditions.Methods: FFRs to a 170-ms speech syllable /da/ were recorded under either a quiet or noise-masking (with a signal-to-noise ratio (SNR) of 8 dB) condition in 14 older adults with PC and 13 age-matched adults with NH. The envelope (FFRENV) and TFS (FFRTFS) components of FFRs were analyzed separately by adding and subtracting the alternative polarity responses, respectively. Speech recognition in noise was evaluated in each participant.Results: In the quiet condition, compared with the NH group, the PC group exhibited smaller F0 and H3 amplitudes and decreased stimulus-response (S-R) correlation for FFRENV but not for FFRTFS. Both the H2 and H3 amplitudes and the S-R correlation of FFRENV significantly decreased in the noise condition compared with the quiet condition in the NH group but not in the PC group. Moreover, the degree of hearing loss was correlated with noise-induced changes in FFRTFS morphology. Furthermore, the speech-in-noise (SIN) threshold was negatively correlated with the noise-induced change in H2 (for FFRENV) and the S-R correlation for FFRENV in the quiet condition.Conclusion: Audibility affects the subcortical encoding of both envelope and TFS in PC patients. The impaired ability to adjust the balance between the envelope and TFS in the noise condition may be part of the mechanism underlying PC-related deficits in speech recognition in noise. FFRs can predict SIN perception performance
我が国における研究学園都市研究の概観
二十一世紀末以降,高等教育システム改革の実施と進学者定員枠の拡大に伴い,我が国においては研究学園都市(中国語では「大学城」と称す)の設置ブームが巻き起こった。これを受けて,研究学園都市という新しい区域形態に対する多角的な研究が次第に国内の学界で注目され始めた。本論では,近年のわが国における研究学園都市研究に関する大量の文献に対し,詳細な振り返りと概観をした。そして最後に,研究学園都市に対するさらなる研究を行う意義と今後の展望をまとめた。Since the end of the twentieth century, with the reform of higher education and expansion of college enrollment, a great many university towns have been built in our country. Many scholars have paid more and more attention to research of the university town, the new form of regional pattern, from many different views. This paper has reviewed in details most of the research of university towns of our country and given some comment on this. Finally, the paper put forward the significance of further studying the university towns
Accounting bonus depreciation on intangible assets as a factor of innovative development of economy
<p>(A) the outside view of our resulting mesh; (B) the outside view of SRF mesh; (C) the cutaway view of our resulting mesh; (D) the cutaway view of SRF mesh.</p
Hexahedral mesh generation via constrained quadrilateralization
<div><p>Decomposing a volume into high-quality hexahedral cells is a challenging task in finite element simulations and computer graphics. Inspired by the use of a spatial twist continuum and frame field in previous hexahedral mesh generation methods, we present a method of hexahedral mesh generation via constrained quadrilateralization that combines a spatial twist continuum and frame fields. Given a volume represented by a tetrahedral mesh, surface quadrilateral mesh and frame field, we first extend the loop of the surface of a solid to a layer of hexahedral elements, then divide the solid into two smaller sub-solids by the layer, and finally handle them recursively until all of the sub-solids are empty. In our hexahedral mesh generation framework, we apply constrained quadrilateralization to extend the loop to a layer of hexahedral elements. The “divide-and-conquer” strategy used in this method is suitable for parallelization. This method can potentially lead to easier and more robust implementations that are more parallelizable and less dependent on heavy numerical libraries. The testing results show that the quality of the meshes generated by this method is similar to those produced by current state-of-the-art mesh generation methods.</p></div
The two sub-solids after dividing by the layer.
<p>(A) the left sub-solid; (B) the right sub-solid.</p
- …