49 research outputs found
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models
Recent advancements in language models (LMs) have gained substantial
attentions on their capability to generate human-like responses. Though
exhibiting a promising future for various applications such as conversation AI,
these LMs face deployment challenges on various devices due to their extreme
computational cost and unpredictable inference latency. Such varied inference
latency, identified as a consequence of uncertainty intrinsic to the nature of
language, can lead to computational inefficiency and degrade the overall
performance of LMs, especially under high-traffic workloads. Unfortunately, the
bandwidth of these uncertainty sources is extensive, complicating the
prediction of latency and the effects emanating from such uncertainties. To
understand and mitigate the impact of uncertainty on real-time
response-demanding systems, we take the first step to comprehend, quantify and
optimize these uncertainty-induced latency performance variations in LMs.
Specifically, we present RT-LM, an uncertainty-aware resource management
ecosystem for real-time inference of LMs. RT-LM innovatively quantifies how
specific input uncertainties, adversely affect latency, often leading to an
increased output length. Exploiting these insights, we devise a lightweight yet
effective method to dynamically correlate input text uncertainties with output
length at runtime. Utilizing this quantification as a latency heuristic, we
integrate the uncertainty information into a system-level scheduler which
explores several uncertainty-induced optimization opportunities, including
uncertainty-aware prioritization, dynamic consolidation, and strategic CPU
offloading. Quantitative experiments across five state-of-the-art LMs on two
hardware platforms demonstrates that RT-LM can significantly reduce the average
response time and improve throughput while incurring a rather small runtime
overhead.Comment: Accepted by RTSS 202
Contributions aux systèmes répartis en environnements ubiquitaires : adaptation, sensibilité au contexte et tolérance aux fautes
D'années en années, nous observons l'arrivée sur le marche d'ordinateurs personnels de plus en plus petits pour des utilisateurs de plus en plus nombreux, ainsi des assistants personnels numériques et des objets dits connectés, en passant par les téléphones mobiles. Tous ces dispositifs tendent à être interchangeables du point de vue des ressources en mémoire, en calcul et en connectivité : par exemple, les téléphones mobiles sont devenus des équipements informatiques de moins en moins spécialisés ou de plus en plus universels et font dorénavant office en la matière de portails d'accès aux capteurs présents dans l'environnement immédiat de l'utilisateur. L'enjeu abordé dans nos travaux est la construction de systèmes répartis incluant ces nouveaux dispositifs matériels. L'objectif de mes recherches est la conception des paradigmes d'intermédiation génériques sous-jacents aux applications réparties de plus en plus ubiquitaires. Plus particulièrement, la problématique générale de mes travaux est la définition du rôle des intergiciels dans l'intégration des dispositifs mobiles et des objets connectés dans les architectures logicielles réparties. Ces architectures logicielles reposaient très majoritairement sur des infrastructures logicielles fixes au début des travaux présentés dans ce manuscrit. Dans ce manuscrit, je décris mes travaux sur trois sujets : 1) l'adaptation des applications réparties pour la continuité de service pendant les déconnexions, 2) la gestion des informations du contexte d'exécution des applications réparties pour leur sensibilité au contexte, et 3) les mécanismes de détection des entraves dans les environnements fortement dynamiques tels que ceux construits avec des réseaux mobiles spontanés. Sur le premier sujet, nous fournissons une couche intergicielle générique pour la gestion des aspects répartis de la gestion des déconnexions en utilisant une stratégie d'adaptation collaborative dans les architectures à base d'objets et de composants. Sur le deuxième sujet, nous étudions les paradigmes architecturaux pour la construction d'un service de gestion de contexte générique, afin d'adresser la diversité des traitements (fusion et agrégation, corrélation, détection de situation par apprentissage, etc.), puis nous adressons le problème de la distribution des informations de contexte aux différentes échelles de l'Internet des objets. Enfin, sur le troisième sujet, nous commençons par la détection des modes de fonctionnement pour l'adaptation aux déconnexions afin de faire la différence, lorsque cela est possible, entre une déconnexion et une défaillance, et ensuite nous spécifions et construisons un service de gestion de groupe partitionnable. Ce service est assez fort pour interdire la construction de partitions ne correspondant pas à la réalité de l'environnement à un instant donné et est assez faible pour être mis en oeuvre algorithmiquemen
Recommended from our members
Operating system support for warehouse-scale computing
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference.
In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources.
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining
storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects
while supporting diverse application-level consistency policies.
Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing.
Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.St John's College Supplementary Emolument Fund
DARP
An enhanced performance model for metamorphic computer virus classification and detectioN
Metamorphic computer virus employs various code mutation techniques to change its code to become new generations. These generations have similar behavior and functionality and yet, they could not be detected by most commercial antivirus because their solutions depend on a signature database and make use of string signature-based detection methods. However, the antivirus detection engine can be avoided by metamorphism techniques. The purpose of this study is to develop a performance model based on computer virus classification and detection. The model would also be able to examine portable executable files that would classify and detect metamorphic computer viruses. A Hidden Markov Model implemented on portable executable files was employed to classify and detect the metamorphic viruses. This proposed model that produce common virus statistical patterns was evaluated by comparing the results with previous related works and famous commercial antiviruses. This was done by investigating the metamorphic computer viruses and their features, and the existing classifications and detection methods. Specifically, this model was applied on binary format of portable executable files and it was able to classify if the files belonged to a virus family. Besides that, the performance of the model, practically implemented and tested, was also evaluated based on detection rate and overall accuracy. The findings indicated that the proposed model is able to classify and detect the metamorphic virus variants in portable executable file format with a high average of 99.7% detection rate. The implementation of the model is proven useful and applicable for antivirus programs
Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
The primary focus of this thesis is to make Sanskrit manuscripts more
accessible to the end-users through natural language technologies. The
morphological richness, compounding, free word orderliness, and low-resource
nature of Sanskrit pose significant challenges for developing deep learning
solutions. We identify four fundamental tasks, which are crucial for developing
a robust NLP technology for Sanskrit: word segmentation, dependency parsing,
compound type identification, and poetry analysis. The first task, Sanskrit
Word Segmentation (SWS), is a fundamental text processing task for any other
downstream applications. However, it is challenging due to the sandhi
phenomenon that modifies characters at word boundaries. Similarly, the existing
dependency parsing approaches struggle with morphologically rich and
low-resource languages like Sanskrit. Compound type identification is also
challenging for Sanskrit due to the context-sensitive semantic relation between
components. All these challenges result in sub-optimal performance in NLP
applications like question answering and machine translation. Finally, Sanskrit
poetry has not been extensively studied in computational linguistics.
While addressing these challenges, this thesis makes various contributions:
(1) The thesis proposes linguistically-informed neural architectures for these
tasks. (2) We showcase the interpretability and multilingual extension of the
proposed systems. (3) Our proposed systems report state-of-the-art performance.
(4) Finally, we present a neural toolkit named SanskritShala, a web-based
application that provides real-time analysis of input for various NLP tasks.
Overall, this thesis contributes to making Sanskrit manuscripts more accessible
by developing robust NLP technology and releasing various resources, datasets,
and web-based toolkit.Comment: Ph.D. dissertatio
Socio-Cognitive and Affective Computing
Social cognition focuses on how people process, store, and apply information about other people and social situations. It focuses on the role that cognitive processes play in social interactions. On the other hand, the term cognitive computing is generally used to refer to new hardware and/or software that mimics the functioning of the human brain and helps to improve human decision-making. In this sense, it is a type of computing with the goal of discovering more accurate models of how the human brain/mind senses, reasons, and responds to stimuli. Socio-Cognitive Computing should be understood as a set of theoretical interdisciplinary frameworks, methodologies, methods and hardware/software tools to model how the human brain mediates social interactions. In addition, Affective Computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects, a fundamental aspect of socio-cognitive neuroscience. It is an interdisciplinary field spanning computer science, electrical engineering, psychology, and cognitive science. Physiological Computing is a category of technology in which electrophysiological data recorded directly from human activity are used to interface with a computing device. This technology becomes even more relevant when computing can be integrated pervasively in everyday life environments. Thus, Socio-Cognitive and Affective Computing systems should be able to adapt their behavior according to the Physiological Computing paradigm. This book integrates proposals from researchers who use signals from the brain and/or body to infer people's intentions and psychological state in smart computing systems. The design of this kind of systems combines knowledge and methods of ubiquitous and pervasive computing, as well as physiological data measurement and processing, with those of socio-cognitive and affective computing
A shared-disk parallel cluster file system
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance
computing (HPC) as well as IT environments. HPC and IT are quite different environments
and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs).
These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say
that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds.
Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough
for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include:
· Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data.
· Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required).
A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was
slightly modified, and two kernel modules and a user-level daemon were added. In the
prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN.
Our benchmarks for non-overlapping writers over a single file shared among processes
running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while
being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa
IBM Shared University Research (SUR
Software similarity and classification
This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities