14 research outputs found

    wav2letter++: The Fastest Open-source Speech Recognition System

    Full text link
    This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks

    Language Modeling Is Compression

    Full text link
    It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model

    Media gateway utilizando um GPU

    Get PDF
    Mestrado em Engenharia de Computadores e Telemátic

    Transferring Pareto Frontiers across Heterogeneous Hardware Environments

    Get PDF
    Configurable software systems provide options that affect functional and non-functional system properties. Selection of options forms system configurations with corresponding properties values. Nowadays configurable software systems are ubiquitously used by software developers, system administrators, and end users due to their inherent adaptability. However, this flexibility comes at a cost. Configuration space exponentially explodes with addition of new options, making exhaustive system analysis impossible. Nevertheless, many use cases require a deep understanding of a configurable system behavior, especially if the system is redeployed across heterogeneous hardware. This leads to the following research questions: (1) Is it possible to transfer configurable software property information across a collection of heterogeneous hardware? (2) Is it possible to transfer relative configurations optimality information across heterogeneous hardware? We address the first question by proposing an approach for transferring performance prediction models of configurable systems across heterogeneous hardware. This approach builds a predictor model to approximate performance on a source hardware, and a separate transferrer model to transfer approximated performance values to a destination hardware. Experiments on three configurable software systems across 23 heterogeneous hardware environments demonstrated high accuracy (less than 10% error) for all studied combinations. We address the second question by proposing an approach for approximation and transferring of Pareto frontiers of optimal configurations (based on multiple system properties) across heterogeneous hardware. This approach builds an individual predictor and transferrer for each analyzed system property. Using trained models, we can build an approximated Pareto frontier on a source hardware and transfer this frontier to a destination hardware. Experiments on five configurable systems across 34 heterogeneous hardware demonstrated feasibility of the approach and that the accuracy of a transferred frontier mainly depends on the approximation rather than the transferring process, while being linearly proportional to a training sample size

    Cruiser and PhoTable: Exploring Tabletop User Interface Software for Digital Photograph Sharing and Story Capture

    Get PDF
    Digital photography has not only changed the nature of photography and the photographic process, but also the manner in which we share photographs and tell stories about them. Some traditional methods, such as the family photo album or passing around piles of recently developed snapshots, are lost to us without requiring the digital photos to be printed. The current, purely digital, methods of sharing do not provide the same experience as printed photographs, and they do not provide effective face-to-face social interaction around photographs, as experienced during storytelling. Research has found that people are often dissatisfied with sharing photographs in digital form. The recent emergence of the tabletop interface as a viable multi-user direct-touch interactive large horizontal display has provided the hardware that has the potential to improve our collocated activities such as digital photograph sharing. However, while some software to communicate with various tabletop hardware technologies exists, software aspects of tabletop user interfaces are still at an early stage and require careful consideration in order to provide an effective, multi-user immersive interface that arbitrates the social interaction between users, without the necessary computer-human interaction interfering with the social dialogue. This thesis presents PhoTable, a social interface allowing people to effectively share, and tell stories about, recently taken, unsorted digital photographs around an interactive tabletop. In addition, the computer-arbitrated digital interaction allows PhoTable to capture the stories told, and associate them as audio metadata to the appropriate photographs. By leveraging the tabletop interface and providing a highly usable and natural interaction we can enable users to become immersed in their social interaction, telling stories about their photographs, and allow the computer interaction to occur as a side-effect of the social interaction. Correlating the computer interaction with the corresponding audio allows PhoTable to annotate an automatically created digital photo album with audible stories, which may then be archived. These stories remain useful for future sharing -- both collocated sharing and remote (e.g. via the Internet) -- and also provide a personal memento both of the event depicted in the photograph (e.g. as a reminder) and of the enjoyable photo sharing experience at the tabletop. To provide the necessary software to realise an interface such as PhoTable, this thesis explored the development of Cruiser: an efficient, extensible and reusable software framework for developing tabletop applications. Cruiser contributes a set of programming libraries and the necessary application framework to facilitate the rapid and highly flexible development of new tabletop applications. It uses a plugin architecture that encourages code reuse, stability and easy experimentation, and leverages the dedicated computer graphics hardware and multi-core processors of modern consumer-level systems to provide a responsive and immersive interactive tabletop user interface that is agnostic to the tabletop hardware and operating platform, using efficient, native cross-platform code. Cruiser's flexibility has allowed a variety of novel interactive tabletop applications to be explored by other researchers using the framework, in addition to PhoTable. To evaluate Cruiser and PhoTable, this thesis follows recommended practices for systems evaluation. The design rationale is framed within the above scenario and vision which we explore further, and the resulting design is critically analysed based on user studies, heuristic evaluation and a reflection on how it evolved over time. The effectiveness of Cruiser was evaluated in terms of its ability to realise PhoTable, use of it by others to explore many new tabletop applications, and an analysis of performance and resource usage. Usability, learnability and effectiveness of PhoTable was assessed on three levels: careful usability evaluations of elements of the interface; informal observations of usability when Cruiser was available to the public in several exhibitions and demonstrations; and a final evaluation of PhoTable in use for storytelling, where this had the side effect of creating a digital photo album, consisting of the photographs users interacted with on the table and associated audio annotations which PhoTable automatically extracted from the interaction. We conclude that our approach to design has resulted in an effective framework for creating new tabletop interfaces. The parallel goal of exploring the potential for tabletop interaction as a new way to share digital photographs was realised in PhoTable. It is able to support the envisaged goal of an effective interface for telling stories about one's photos. As a serendipitous side-effect, PhoTable was effective in the automatic capture of the stories about individual photographs for future reminiscence and sharing. This work provides foundations for future work in creating new ways to interact at a tabletop and to the ways to capture personal stories around digital photographs for sharing and long-term preservation

    Engineering systematic musicology : methods and services for computational and empirical music research

    Get PDF
    One of the main research questions of *systematic musicology* is concerned with how people make sense of their musical environment. It is concerned with signification and meaning-formation and relates musical structures to effects of music. These fundamental aspects can be approached from many different directions. One could take a cultural perspective where music is considered a phenomenon of human expression, firmly embedded in tradition. Another approach would be a cognitive perspective, where music is considered as an acoustical signal of which perception involves categorizations linked to representations and learning. A performance perspective where music is the outcome of human interaction is also an equally valid view. To understand a phenomenon combining multiple perspectives often makes sense. The methods employed within each of these approaches turn questions into concrete musicological research projects. It is safe to say that today many of these methods draw upon digital data and tools. Some of those general methods are feature extraction from audio and movement signals, machine learning, classification and statistics. However, the problem is that, very often, the *empirical and computational methods require technical solutions* beyond the skills of researchers that typically have a humanities background. At that point, these researchers need access to specialized technical knowledge to advance their research. My PhD-work should be seen within the context of that tradition. In many respects I adopt a problem-solving attitude to problems that are posed by research in systematic musicology. This work *explores solutions that are relevant for systematic musicology*. It does this by engineering solutions for measurement problems in empirical research and developing research software which facilitates computational research. These solutions are placed in an engineering-humanities plane. The first axis of the plane contrasts *services* with *methods*. Methods *in* systematic musicology propose ways to generate new insights in music related phenomena or contribute to how research can be done. Services *for* systematic musicology, on the other hand, support or automate research tasks which allow to change the scope of research. A shift in scope allows researchers to cope with larger data sets which offers a broader view on the phenomenon. The second axis indicates how important Music Information Retrieval (MIR) techniques are in a solution. MIR-techniques are contrasted with various techniques to support empirical research. My research resulted in a total of thirteen solutions which are placed in this plane. The description of seven of these are bundled in this dissertation. Three fall into the methods category and four in the services category. For example Tarsos presents a method to compare performance practice with theoretical scales on a large scale. SyncSink is an example of a service

    SMARTCART: Sistema de asistencia en supermercados

    Full text link
    [ES] Durante los últimos años hemos podido comprobar como nuestras tareas más cotidianas se han visto superpuestas por la tecnología. El proceso del avance del ser humano pasa actualmente por la digitalización de la información y la automatización de las actividades realizadas por éste. Se busca innovar y encontrar utilidades que revolucionen nuestra manera de vivir, pero en otros casos se procede a mejorar lo que ya tenemos. Es el caso de este proyecto. Se hizo un estudio de cuál podría ser un caso como el explicado anteriormente. El resultado fue el siguiente: Digitalizar la compra en un supermercado. En este trabajo se va a desarrollar una aplicación informática que contendrá todos los productos de un supermercado, con la información necesaria de cada producto (incluyendo su localización). El sistema estará acoplado a los carros del centro que a su vez dispondrán de un sistema de sensorización aplicado en la estructura para poder llevar el cálculo de la compra. Los carros recibirán el nuevo nombre de SMARTCART. Además, para hacer más eficiente la compra, el centro dispondrá de localizadores de posición con los que se habilitará un listado de productos dividido por zonas y se informará del cajero al que acudir en caso de finalización. Por último, estos localizadores llamados ¿beacons¿ (balizas electrónicas) enviarán mensajes a los SMARTCARTS de los clientes, como pueda ser, por ejemplo, nuevas ofertas disponibles.[EN] In recent years we have been able to see how our most daily tasks have been overlapped by technology. The process of human advancement currently goes through the digitization of information and the automation of the activities carried out by it. We seek to innovate and find utilities that revolutionize our way of life, but in other cases we proceed to improve what we already have. It is the case of this project. A study was made of what could be a case like the one explained above. The result was the following: Digitize a supermarket purchase. In this work, it will be developed a mobile app which will contain all the products of a supermarket, with the necessary information for each product (including its location). The system will be coupled to the trolleys of the center, which in turn will have a sensorization system applied to the structure in order to carry out the purchase calculation. The carts will be renamed SMARTCART. In addition, to make the purchase more efficient, the center will have position locators that will enable a list of products divided by zones and the cashier that the client must coming up will be informed to in case of completion. In conclusion, these locators called "beacons" (electronic beacons) will send messages to the SMARTCARTS, such as, for example, new offers available.Payá Silla, I. (2020). SMARTCART: Sistema de asistencia en supermercados. Universitat Politècnica de València. http://hdl.handle.net/10251/151983TFG

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Cruiser and PhoTable: Exploring Tabletop User Interface Software for Digital Photograph Sharing and Story Capture

    Get PDF
    Digital photography has not only changed the nature of photography and the photographic process, but also the manner in which we share photographs and tell stories about them. Some traditional methods, such as the family photo album or passing around piles of recently developed snapshots, are lost to us without requiring the digital photos to be printed. The current, purely digital, methods of sharing do not provide the same experience as printed photographs, and they do not provide effective face-to-face social interaction around photographs, as experienced during storytelling. Research has found that people are often dissatisfied with sharing photographs in digital form. The recent emergence of the tabletop interface as a viable multi-user direct-touch interactive large horizontal display has provided the hardware that has the potential to improve our collocated activities such as digital photograph sharing. However, while some software to communicate with various tabletop hardware technologies exists, software aspects of tabletop user interfaces are still at an early stage and require careful consideration in order to provide an effective, multi-user immersive interface that arbitrates the social interaction between users, without the necessary computer-human interaction interfering with the social dialogue. This thesis presents PhoTable, a social interface allowing people to effectively share, and tell stories about, recently taken, unsorted digital photographs around an interactive tabletop. In addition, the computer-arbitrated digital interaction allows PhoTable to capture the stories told, and associate them as audio metadata to the appropriate photographs. By leveraging the tabletop interface and providing a highly usable and natural interaction we can enable users to become immersed in their social interaction, telling stories about their photographs, and allow the computer interaction to occur as a side-effect of the social interaction. Correlating the computer interaction with the corresponding audio allows PhoTable to annotate an automatically created digital photo album with audible stories, which may then be archived. These stories remain useful for future sharing -- both collocated sharing and remote (e.g. via the Internet) -- and also provide a personal memento both of the event depicted in the photograph (e.g. as a reminder) and of the enjoyable photo sharing experience at the tabletop. To provide the necessary software to realise an interface such as PhoTable, this thesis explored the development of Cruiser: an efficient, extensible and reusable software framework for developing tabletop applications. Cruiser contributes a set of programming libraries and the necessary application framework to facilitate the rapid and highly flexible development of new tabletop applications. It uses a plugin architecture that encourages code reuse, stability and easy experimentation, and leverages the dedicated computer graphics hardware and multi-core processors of modern consumer-level systems to provide a responsive and immersive interactive tabletop user interface that is agnostic to the tabletop hardware and operating platform, using efficient, native cross-platform code. Cruiser's flexibility has allowed a variety of novel interactive tabletop applications to be explored by other researchers using the framework, in addition to PhoTable. To evaluate Cruiser and PhoTable, this thesis follows recommended practices for systems evaluation. The design rationale is framed within the above scenario and vision which we explore further, and the resulting design is critically analysed based on user studies, heuristic evaluation and a reflection on how it evolved over time. The effectiveness of Cruiser was evaluated in terms of its ability to realise PhoTable, use of it by others to explore many new tabletop applications, and an analysis of performance and resource usage. Usability, learnability and effectiveness of PhoTable was assessed on three levels: careful usability evaluations of elements of the interface; informal observations of usability when Cruiser was available to the public in several exhibitions and demonstrations; and a final evaluation of PhoTable in use for storytelling, where this had the side effect of creating a digital photo album, consisting of the photographs users interacted with on the table and associated audio annotations which PhoTable automatically extracted from the interaction. We conclude that our approach to design has resulted in an effective framework for creating new tabletop interfaces. The parallel goal of exploring the potential for tabletop interaction as a new way to share digital photographs was realised in PhoTable. It is able to support the envisaged goal of an effective interface for telling stories about one's photos. As a serendipitous side-effect, PhoTable was effective in the automatic capture of the stories about individual photographs for future reminiscence and sharing. This work provides foundations for future work in creating new ways to interact at a tabletop and to the ways to capture personal stories around digital photographs for sharing and long-term preservation

    Engineering and built environment project conference 2015: book of abstracts - Toowoomba, Australia, 21-25 September 2015

    Get PDF
    Book of Abstracts of the USQ Engineering and Built Environment Conference 2015, held Toowoomba, Australia, 21-25 September 2015. These proceedings include extended abstracts of the verbal presentations that are delivered at the project conference. The work reported at the conference is the research undertaken by students in meeting the requirements of courses ENG4111/ENG4112 Research Project for undergraduate or ENG8411/ENG8412 Research Project and Dissertation for postgraduate students
    corecore