120 research outputs found

    Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

    Full text link
    The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a range of speech sample complexities and durations. This module offers the flexibility to extract speech features across various granularities, spanning from frames and phonemes to words and discourse. The proposed design captures the variable length feature of speech and addresses the limitations of fixed-length attention. Our evaluation leverages a parallel attention architecture complemented by a dynamic gating mechanism that amalgamates traditional attention with the Echo-MSA module output. Empirical evidence from our study reveals that integrating Echo-MSA into the primary model's training regime significantly enhances the word error rate (WER) performance, all while preserving the intrinsic stability of the original model

    Posuzování bonity klienta pomocí klasifikačních stromů

    Get PDF
    The credit-granting process leads to a choice between two actions—give this new applicant credit or refuse this applicant credit. In the thesis, we use the CART algorithm with 10-fold cross-validation to build the model to be able predict creditworthiness of a new applicant.276/5000 Proces udělování kreditů vede k výběru mezi dvěma akcemi - dát tomuto novému žadatelskému kreditu nebo odmítnout tento žadatelský kredit. V práci používáme algoritmus CART s 10-násobnou křížovou validací k vytvoření modelu, který bude schopen předpovědět bonitu nového žadatele.154 - Katedra financívelmi dobř

    Informační efektivnost akciových trhů

    Get PDF
    The efficient market hypothesis is one of the possible approaches when explaining the movements of asset prices in financial markets. In this thesis, the author deals with the hypothesis of whether the main indexes of selected Asian stock markets (mainland China, Hong Kong, Japan) follow the random walk hypothesis or not. The basic testing period was defined from 2010 to 2020 using the data of daily frequency. The predictability of returns in the form of random walk models is investigated with the help of selected statistical tests. The application framework is based on a combination of mathematical models of efficient markets with linear and nonlinear testing procedures. The random walk hypothesis is rejected most often in the case of the SSE50 index using the whole data sample. In addition, the dynamics of the results of the BDS independence test and variance ratio test in time are evaluated as well. To evaluate the significance of applied tests, the methods of bootstrapping are utilized. Based on dynamic results of statistical tests, the investment strategy proposal was discussed as well.Hypotéza efektivního trhu je jedním z možných přístupů k vysvětlení pohybu cen aktiv na finančních trzích. V této diplomové práci se autor zabývá hypotézou, zda hlavní indexy vybraných akciových trhů ve středoevropských zemích (pevninská Čína, Hong Kong, Japonsko) vykazují chování, jenž je v souladu s hypotézou náhodné procházky. Základní testovací období bylo definováno od roku 2010 do roku 2020. V diplomové práci jsou použity údaje o denní frekvenci. Předvídatelnost výnosů ve formě modelů náhodné procházky je zkoumána pomocí vybraných statistických testů. Aplikační rámec je založen na kombinaci matematických modelů efektivních trhů s lineárními a nelineárními testovacími postupy. Hypotéza náhodné procházky je nejčastěji zamítnuta v případě indexu SSE50 při užití celého datového vzorku. Hodnocena je rovněž dynamika výsledků v čase pomocí BDS testu nezávislosti a testu poměru rozptylů. K vyhodnocení významnosti těchto testů jsou použity metody bootstrappingu. Na základě dynamických výsledků statistických testů byl také předložen návrh investiční strategie.154 - Katedra financívýborn

    On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

    Full text link
    Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.Comment: Accepted by ACL2023 (Short Paper

    DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

    Full text link
    Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70\% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT's resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.Comment: Accepted by ACL202

    TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

    Full text link
    Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. However, the continual learning aspect of these aligned LLMs has been largely overlooked. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs, owing to both their simplicity and the models' potential exposure during instruction tuning. In this paper, we introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs. TRACE consists of 8 distinct datasets spanning challenging tasks including domain-specific tasks, multilingual capabilities, code generation, and mathematical reasoning. All datasets are standardized into a unified format, allowing for effortless automatic evaluation of LLMs. Our experiments show that after training on TRACE, aligned LLMs exhibit significant declines in both general ability and instruction-following capabilities. For example, the accuracy of llama2-chat 13B on gsm8k dataset declined precipitously from 28.8\% to 2\% after training on our datasets. This highlights the challenge of finding a suitable tradeoff between achieving performance on specific tasks while preserving the original prowess of LLMs. Empirical findings suggest that tasks inherently equipped with reasoning paths contribute significantly to preserving certain capabilities of LLMs against potential declines. Motivated by this, we introduce the Reasoning-augmented Continual Learning (RCL) approach. RCL integrates task-specific cues with meta-rationales, effectively reducing catastrophic forgetting in LLMs while expediting convergence on novel tasks

    ARA-net: an attention-aware retinal atrophy segmentation network coping with fundus images

    Get PDF
    BackgroundAccurately detecting and segmenting areas of retinal atrophy are paramount for early medical intervention in pathological myopia (PM). However, segmenting retinal atrophic areas based on a two-dimensional (2D) fundus image poses several challenges, such as blurred boundaries, irregular shapes, and size variation. To overcome these challenges, we have proposed an attention-aware retinal atrophy segmentation network (ARA-Net) to segment retinal atrophy areas from the 2D fundus image.MethodsIn particular, the ARA-Net adopts a similar strategy as UNet to perform the area segmentation. Skip self-attention connection (SSA) block, comprising a shortcut and a parallel polarized self-attention (PPSA) block, has been proposed to deal with the challenges of blurred boundaries and irregular shapes of the retinal atrophic region. Further, we have proposed a multi-scale feature flow (MSFF) to challenge the size variation. We have added the flow between the SSA connection blocks, allowing for capturing considerable semantic information to detect retinal atrophy in various area sizes.ResultsThe proposed method has been validated on the Pathological Myopia (PALM) dataset. Experimental results demonstrate that our method yields a high dice coefficient (DICE) of 84.26%, Jaccard index (JAC) of 72.80%, and F1-score of 84.57%, which outperforms other methods significantly.ConclusionOur results have demonstrated that ARA-Net is an effective and efficient approach for retinal atrophic area segmentation in PM

    Binding to Na(+) /H(+) exchanger regulatory factor 2 (NHERF2) affects trafficking and function of the enteropathogenic Escherichia coli type III secretion system effectors Map, EspI and NleH.

    Get PDF
    Enteropathogenic Escherichia coli (EPEC) strains are diarrhoeal pathogens that use a type III secretion system to translocate effector proteins into host cells in order to colonize and multiply in the human gut. Map, EspI and NleH1 are conserved EPEC effectors that possess a C-terminal class I PSD-95/Disc Large/ZO-1 (PDZ)-binding motif. Using a PDZ array screen we identified Na(+)/H(+) exchanger regulatory factor 2 (NHERF2), a scaffold protein involved in tethering and recycling ion channels in polarized epithelia that contains two PDZ domains, as a common target of Map, EspI and NleH1. Using recombinant proteins and co-immunoprecipitation we confirmed that NHERF2 binds each of the effectors. We generated a HeLa cell line stably expressing HA-tagged NHERF2 and found that Map, EspI and NleH1 colocalize and interact with intracellular NHERF2 via their C-terminal PDZ-binding motif. Overexpression of NHERF2 enhanced the formation and persistence of Map-induced filopodia, accelerated the trafficking of EspI to the Golgi and diminished the anti-apoptotic activity of NleH1. The binding of multiple T3SS effectors to a single scaffold protein is unique. Our data suggest that NHERF2 may act as a plasma membrane sorting site, providing a novel regulatory mechanism to control the intracellular spatial and temporal effector protein activity

    Secrets of RLHF in Large Language Models Part I: PPO

    Full text link
    Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO code
    corecore