391 research outputs found

    Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

    Full text link
    In this work, we study rapid, step-wise improvements of the loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate tasks, whereas CNNs have no such issue on the tasks we studied. When transformers learn the intermediate task, they do this rapidly and unexpectedly after both training and validation loss saturated for hundreds of epochs. We call these rapid improvements Eureka-moments, since the transformer appears to suddenly learn a previously incomprehensible task. Similar leaps in performance have become known as Grokking. In contrast to Grokking, for Eureka-moments, both the validation and the training loss saturate before rapidly improving. We trace the problem back to the Softmax function in the self-attention block of transformers and show ways to alleviate the problem. These fixes improve training speed. The improved models reach 95% of the baseline model in just 20% of training steps while having a much higher likelihood to learn the intermediate task, lead to higher final accuracy and are more robust to hyper-parameters

    Language Emptiness of Continuous-Time Parametric Timed Automata

    Full text link
    Parametric timed automata extend the standard timed automata with the possibility to use parameters in the clock guards. In general, if the parameters are real-valued, the problem of language emptiness of such automata is undecidable even for various restricted subclasses. We thus focus on the case where parameters are assumed to be integer-valued, while the time still remains continuous. On the one hand, we show that the problem remains undecidable for parametric timed automata with three clocks and one parameter. On the other hand, for the case with arbitrary many clocks where only one of these clocks is compared with (an arbitrary number of) parameters, we show that the parametric language emptiness is decidable. The undecidability result tightens the bounds of a previous result which assumed six parameters, while the decidability result extends the existing approaches that deal with discrete-time semantics only. To the best of our knowledge, this is the first positive result in the case of continuous-time and unbounded integer parameters, except for the rather simple case of single-clock automata

    Minimum-Cost Reachability for Priced Timed Automata

    Get PDF
    This paper introduces the model of linearly priced timed automata as an extension of timed automata, with prices on both transitions and locations. For this model we consider the minimum-cost reachability problem: i.e. given a linearly priced timed automaton and a targetstate, determine the minimum cost of executions from the initial state to the target state. This problem generalizes the minimum-time reachability problem for ordinary timed automata. We prove decidability of this problem by offering an algorithmic solution, which is based on a combination of branch-and-bound techniques and a new notion of priced regions. The latter allows symbolic representation and manipulation of reachable states together with the cost of reaching them.Keywords: Timed Automata, Verification, Data Structures, Algorithms,Optimization

    Shifting attention in viewer- and object-based reference frames after unilateral brain injury

    Get PDF
    The aims of the present study were to investigate the respective roles that object- and viewer-based reference frames play in reorienting visual attention, and to assess their influence after unilateral brain injury. To do so, we studied 16 right hemisphere injured (RHI) and 13 left hemisphere injured (LHI) patients. We used a cueing design that manipulates the location of cues and targets relative to a display comprised of two rectangles (i.e., objects). Unlike previous studies with patients, we presented all cues at midline rather than in the left or right visual fields. Thus, in the critical conditions in which targets were presented laterally, reorienting of attention was always from a midline cue. Performance was measured for lateralized target detection as a function of viewer-based (contra- and ipsilesional sides) and object-based (requiring reorienting within or between objects) reference frames. As expected, contralesional detection was slower than ipsilesional detection for the patients. More importantly, objects influenced target detection differently in the contralesional and ipsilesional fields. Contralesionally, reorienting to a target within the cued object took longer than reorienting to a target in the same location but in the uncued object. This finding is consistent with object-based neglect. Ipsilesionally, the means were in the opposite direction. Furthermore, no significant difference was found in object-based influences between the patient groups (RHI vs. LHI). These findings are discussed in the context of reference frames used in reorienting attention for target detection

    Probing short-term face memory in developmental prosopagnosia

    Get PDF
    It has recently been proposed that the face recognition deficits seen in neurodevelopmental disorders may reflect impaired short-term face memory. For example, introducing a brief delay between the presentation of target and test faces seems to disproportionately impair matching or recognition performance on individuals with Autism Spectrum Disorders. The present study sought to determine whether deficits of short-term face memory contribute to impaired face recognition seen in Developmental Prosopagnosia. To determine whether developmental prosopagnosics exhibit impaired short-term face memory, the present study used a six-alternative-forced-choice match-to-sample procedure. Memory demand was manipulated by employing a short or long delay between the presentation of the target face, and the six test faces. Crucially, the perceptual demands were identical in both conditions, thereby allowing the independent contribution of short-term face memory to be assessed. Prosopagnostics showed clear evidence of a category-specific impairment for face-matching in both conditions; they were both slower and less accurate than matched controls. Crucially however, the prosopagnosics showed no evidence of disproportionate face recognition impairment in the long-interval condition. While individuals with developmental prosopagnosia may have problems with the perceptual encoding of faces, it appears that their representations are stable over short durations. These results suggest that the face recognition difficulties seen in developmental prosopagnosia and autism may be qualitatively different, attributable to deficits of perceptual encoding and perceptual maintenance, respectively
    • …
    corecore