51 research outputs found

    M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models

    Full text link
    Managing long sequences has become an important and necessary feature for large language models (LLMs). However, it is still an open question of how to comprehensively and systematically evaluate the long-sequence capability of LLMs. One of the reasons is that conventional and widely-used benchmarks mainly consist of short sequences. In this paper, we propose M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation. M4LE is based on a diverse NLP task pool comprising 36 NLP datasets, 11 task types and 12 domains. To alleviate the scarcity of tasks with naturally long sequences and incorporate multiple-ability assessment, we propose an automatic approach (but with negligible human annotations) to convert short-sequence tasks into a unified long-sequence scenario where LLMs have to identify single or multiple relevant spans in long contexts based on explicit or semantic hints. Specifically, the scenario includes five different types of abilities: (1) explicit single-span; (2) semantic single-span; (3) explicit multiple-span; (4) semantic multiple-span; and (5) global context understanding. The resulting samples in M4LE are evenly distributed from 1k to 8k input length. We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs. Our results reveal that: 1) Current LLMs struggle to understand long context, particularly when tasks require multiple-span attention. 2) Semantic retrieval task is more difficult for competent LLMs. 3) Models fine-tuned on longer text with position interpolation have comparable performance to those using Neural Tangent Kernel (NTK) aware scaling methods without fine-tuning. We make our benchmark publicly available to encourage future research in this challenging area.Comment: Code and data are available at https://github.com/KwanWaiChung/M4L

    Aligning Large Language Models with Human: A Survey

    Full text link
    Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs with human expectations has become an active area of interest within the research community. This survey presents a comprehensive overview of these alignment technologies, including the following aspects. (1) Data collection: the methods for effectively collecting high-quality instructions for LLM alignment, including the use of NLP benchmarks, human annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment. Our exploration encompasses Supervised Fine-tuning, both Online and Offline human preference training, along with parameter-efficient training mechanisms. (3) Model Evaluation: the methods for evaluating the effectiveness of these human-aligned LLMs, presenting a multifaceted approach towards their assessment. In conclusion, we collate and distill our findings, shedding light on several promising future research avenues in the field. This survey, therefore, serves as a valuable resource for anyone invested in understanding and advancing the alignment of LLMs to better suit human-oriented tasks and expectations. An associated GitHub link collecting the latest papers is available at https://github.com/GaryYufei/AlignLLMHumanSurvey.Comment: work in progres

    FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

    Full text link
    The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response follows constraints stated in the instruction. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs. FollowBench comprehensively includes five different types (i.e., Content, Situation, Style, Format, and Example) of fine-grained constraints. To enable a precise constraint following estimation on diverse difficulties, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level. To assess whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with constraint-evolution paths to handle challenging open-ended instructions. By evaluating ten closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work. The data and code are publicly available at https://github.com/YJiangcm/FollowBench.Comment: 19 pages, 9 figures, 14 table

    Effects of Phase-Locking Deficits on Speech Recognition in Older Adults With Presbycusis

    Get PDF
    Objective: People with presbycusis (PC) often report difficulties in speech recognition, especially under noisy listening conditions. Investigating the PC-related changes in central representations of envelope signals and temporal fine structure (TFS) signals of speech sounds is critical for understanding the mechanism underlying the PC-related deficit in speech recognition. Frequency-following responses (FFRs) to speech stimulation can be used to examine the subcortical encoding of both envelope and TFS speech signals. This study compared FFRs to speech signals between listeners with PC and those with clinically normal hearing (NH) under either quiet or noise-masking conditions.Methods: FFRs to a 170-ms speech syllable /da/ were recorded under either a quiet or noise-masking (with a signal-to-noise ratio (SNR) of 8 dB) condition in 14 older adults with PC and 13 age-matched adults with NH. The envelope (FFRENV) and TFS (FFRTFS) components of FFRs were analyzed separately by adding and subtracting the alternative polarity responses, respectively. Speech recognition in noise was evaluated in each participant.Results: In the quiet condition, compared with the NH group, the PC group exhibited smaller F0 and H3 amplitudes and decreased stimulus-response (S-R) correlation for FFRENV but not for FFRTFS. Both the H2 and H3 amplitudes and the S-R correlation of FFRENV significantly decreased in the noise condition compared with the quiet condition in the NH group but not in the PC group. Moreover, the degree of hearing loss was correlated with noise-induced changes in FFRTFS morphology. Furthermore, the speech-in-noise (SIN) threshold was negatively correlated with the noise-induced change in H2 (for FFRENV) and the S-R correlation for FFRENV in the quiet condition.Conclusion: Audibility affects the subcortical encoding of both envelope and TFS in PC patients. The impaired ability to adjust the balance between the envelope and TFS in the noise condition may be part of the mechanism underlying PC-related deficits in speech recognition in noise. FFRs can predict SIN perception performance

    我が国における研究学園都市研究の概観

    Get PDF
    二十一世紀末以降,高等教育システム改革の実施と進学者定員枠の拡大に伴い,我が国においては研究学園都市(中国語では「大学城」と称す)の設置ブームが巻き起こった。これを受けて,研究学園都市という新しい区域形態に対する多角的な研究が次第に国内の学界で注目され始めた。本論では,近年のわが国における研究学園都市研究に関する大量の文献に対し,詳細な振り返りと概観をした。そして最後に,研究学園都市に対するさらなる研究を行う意義と今後の展望をまとめた。Since the end of the twentieth century, with the reform of higher education and expansion of college enrollment, a great many university towns have been built in our country. Many scholars have paid more and more attention to research of the university town, the new form of regional pattern, from many different views. This paper has reviewed in details most of the research of university towns of our country and given some comment on this. Finally, the paper put forward the significance of further studying the university towns

    Accounting bonus depreciation on intangible assets as a factor of innovative development of economy

    Full text link
    <p>(A) the outside view of our resulting mesh; (B) the outside view of SRF mesh; (C) the cutaway view of our resulting mesh; (D) the cutaway view of SRF mesh.</p

    Hexahedral mesh generation via constrained quadrilateralization

    No full text
    <div><p>Decomposing a volume into high-quality hexahedral cells is a challenging task in finite element simulations and computer graphics. Inspired by the use of a spatial twist continuum and frame field in previous hexahedral mesh generation methods, we present a method of hexahedral mesh generation via constrained quadrilateralization that combines a spatial twist continuum and frame fields. Given a volume represented by a tetrahedral mesh, surface quadrilateral mesh and frame field, we first extend the loop of the surface of a solid to a layer of hexahedral elements, then divide the solid into two smaller sub-solids by the layer, and finally handle them recursively until all of the sub-solids are empty. In our hexahedral mesh generation framework, we apply constrained quadrilateralization to extend the loop to a layer of hexahedral elements. The “divide-and-conquer” strategy used in this method is suitable for parallelization. This method can potentially lead to easier and more robust implementations that are more parallelizable and less dependent on heavy numerical libraries. The testing results show that the quality of the meshes generated by this method is similar to those produced by current state-of-the-art mesh generation methods.</p></div

    A surface loop corresponding to a sheet.

    No full text
    <p>A surface loop corresponding to a sheet.</p

    The two sub-solids after dividing by the layer.

    No full text
    <p>(A) the left sub-solid; (B) the right sub-solid.</p
    corecore