14 research outputs found

    Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks

    Full text link
    Large Language Models (LLMs) evaluation is a patchy and inconsistent landscape, and it is becoming clear that the quality of automatic evaluation metrics is not keeping up with the pace of development of generative models. We aim to improve the understanding of current models' performance by providing a preliminary and hybrid evaluation on a range of open and closed-source generative LLMs on three NLP benchmarks: text summarisation, text simplification and grammatical error correction (GEC), using both automatic and human evaluation. We also explore the potential of the recently released GPT-4 to act as an evaluator. We find that ChatGPT consistently outperforms many other popular models according to human reviewers on the majority of metrics, while scoring much more poorly when using classic automatic evaluation metrics. We also find that human reviewers rate the gold reference as much worse than the best models' outputs, indicating the poor quality of many popular benchmarks. Finally, we find that GPT-4 is capable of ranking models' outputs in a way which aligns reasonably closely to human judgement despite task-specific variations, with a lower alignment in the GEC task.Comment: Accepted at EMNLP 202

    A Product-Form Model for the Performance Evaluation of a Bandwidth Allocation Strategy in WSNs

    Get PDF
    Wireless Sensor Networks (WSNs) are important examples of Collective Adaptive System, which consist of a set of motes that are spatially distributed in an indoor or outdoor space. Each mote monitors its surrounding conditions, such as humidity, intensity of light, temperature, and vibrations, but also collects complex information, such as images or small videos, and cooperates with the whole set of motes forming the WSN to allow the routing process. The traffic in the WSN consists of packets that contain the data harvested by the motes and can be classified according to the type of information that they carry. One pivotal problem in WSNs is the bandwidth allocation among the motes. The problem is known to be challenging due to the reduced computational capacity of the motes, their energy consumption constraints, and the fully decentralised network architecture. In this article, we study a novel algorithm to allocate the WSN bandwidth among the motes by taking into account the type of traffic they aim to send. Under the assumption of a mesh network and Poisson distributed harvested packets, we propose an analytical model for its performance evaluation that allows a designer to study the optimal configuration parameters. Although the Markov chain underlying the model is not reversible, we show it to be.-reversible under a certain renaming of states. By an extensive set of simulations, we show that the analytical model accurately approximates the performance of networks that do not satisfy the assumptions. The algorithm is studied with respect to the achieved throughput and fairness. We show that it provides a good approximation of the max-min fairness requirements

    Phytopatological monitoring of Inonotus rickii

    Full text link

    Sviluppo di un criterio energetico con effetto della tensione media per l'analisi sperimentale della vita a fatica di provini in acciaio inossidabile

    Get PDF
    Verifica del criterio energetico con analisi dell'effetto della tensiome media per la previsione sperimentale della vita a fatica di componenti realizzati in acciaio inossidabile (AISI 304L trafilato a freddo) proposto dal prof. Meneghetti. Segue la comparazione con il metodo classico proposto da woehle

    Drammatico romanzesco. Vittorini tra autore e personaggio

    No full text
    Cette étude porte sur la production littéraire d’Elio Vittorini. Elle s’appuie sur un domaine d’analyse défini comme celui du rapport entre théâtre et roman. Afin de mettre en valeur toutes les manifestations de ce complexe phénomène de proximité et de superposition partielle, dans le sens de l’épicisation du drame autant que de la dramaturgisation du roman, nous avons fait référence à la théorie modale et aux oscillations entre les deux pôles d’écriture auxquels une longue tradition, remontant à Platon, a assigné l’étiquette de mimétique et diégétique. Nous avons cru entrevoir un rôle paradigmatique dans ce processus de contamination des formes dans l’œuvre narrative d’Elio Vittorini, d’abord dans la présence prépondérante de la dimension dialogique et dans une recherche formelle sensible et attentive aux modèles internationaux les plus innovants, menée en dehors d'un cadre rigide de genre. De plus, les documents autographes témoignent l’existence d’une production dramatique que, bien qu’éphémère et inconsistante, alimente en profondeur la production narrative. En particulier, nous avons analysé cette démarche dans la composition de Les Hommes et les Autres, inspirée par le drame inachevé Atto primo qui, puisqu’il ne dépasse pas le stade d’ébauche, dévoile les modèles théâtraux de l’expérimentation et aide à l'interprétation du roman. Enfin, après avoir évalué la dimension mimétique de l’ œuvre, on a déterminé son succès scénique. La prétendue théâtralité du roman a été identifiée surtout par rapport aux expériences théâtrales contemporaines qui reposent précisément sur la tension entre mimétique et diégétique.This study focuses on Elio Vittorini’s work. It is based on a field of research defined as that of the relationship between theatre and novel. In order to explore this phenomenon in its complexity on the direction of the epicization of drama as much as the dramaturgisation of novel, we referred to the modal theory and to the oscillations between the two poles of writing labeled mimetic and diegetic by a long tradition which dates back to Plato. In this process of contamination of forms, a paradigmatic role has been noticed in the narrative work of Elio Vittorini, first of all in the predominant presence of the dialogical dimension and, secondly, in a exploration of form sensitive and attentive to the most innovative international models wich was conducted outside a rigid gender framework. In addition, the autograph documents testify to the existence of a dramatic production that, although ephemeral and inconsistent, feeds in the depth the novels’ mechanism of composition. In particular, we analyzed this approach in the composition of Les Hommes et les Autres, inspired by the unfinished drama Atto primo which, since it does not go beyond the draft stage, reveals the theatrical models of experimentation and helps in the interpretation of the novel. Finally, after assessing the mimetic dimension of the work, we determined its success on stage. The alleged theatricality of the novel has been identified above all in relation to contemporary theatrical experiences that are based precisely on the tension between mimetic and diegetic modes

    Dynamic resource allocation in fork-join queues

    No full text
    Fork-join systems play a pivotal role in the analysis of distributed systems, telecommunication infrastructures, and storage systems. In this article, we consider a fork-join system consisting of K parallel servers, each of which works on one of the K tasks that form each job. The system allocates a fixed amount of computational resources among the K servers, hence determining their service speed. The goal of this article is that of studying the resource allocation policies among the servers. We assume that the queueing disciplines of the fork- and join-queues are First Come First Served. At each epoch, at most K tasks are in service while the others wait in the fork-queues. We propose an algorithm with a very simple implementation that allocates the computational resources in a way that aims at minimizing the join-queue lengths, and hence at reducing the expected job service time. We study its performance in saturation and under exponential service time. The model has an elegant closed-form stationary distribution. Moreover, we provide an algorithm to numerically or symbolically derive the marginal probabilities for the join-queue lengths. Therefore, the expressions for the expected join-queue length and the expected response time under immediate join can be derived. Finally, we compare the performance of the proposed resource allocation algorithm with that of other strategies
    corecore