24 research outputs found

    Bytes Are All You Need: Transformers Operating Directly On File Bytes

    Full text link
    Modern deep learning approaches usually transform inputs into a modality-specific form. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate performing classification directly on file bytes, without the need for decoding files at inference time. Using file bytes as model inputs enables the development of models which can operate on multiple input modalities. Our model, \emph{ByteFormer}, achieves an ImageNet Top-1 classification accuracy of 77.33%77.33\% when training and testing directly on TIFF file bytes using a transformer backbone with configuration similar to DeiT-Ti (72.2%72.2\% accuracy when operating on RGB images). Without modifications or hyperparameter tuning, ByteFormer achieves 95.42%95.42\% classification accuracy when operating on WAV files from the Speech Commands v2 dataset (compared to state-of-the-art accuracy of 98.7%98.7\%). Additionally, we demonstrate that ByteFormer has applications in privacy-preserving inference. ByteFormer is capable of performing inference on particular obfuscated input representations with no loss of accuracy. We also demonstrate ByteFormer's ability to perform inference with a hypothetical privacy-preserving camera which avoids forming full images by consistently masking 90%90\% of pixel channels, while still achieving 71.35%71.35\% accuracy on ImageNet. Our code will be made available at https://github.com/apple/ml-cvnets/tree/main/examples/byteformer

    Peripheral hearing loss reduces the ability of children to direct selective attention during multi-talker listening

    Get PDF
    Restoring normal hearing requires knowledge of how peripheral and central auditory processes are affected by hearing loss. Previous research has focussed primarily on peripheral changes following sensorineural hearing loss, whereas consequences for central auditory processing have received less attention. We examined the ability of hearing-impaired children to direct auditory attention to a voice of interest (based on the talker’s spatial location or gender) in the presence of a common form of background noise: the voices of competing talkers (i.e. during multi-talker, or “Cocktail Party” listening). We measured brain activity using electro-encephalography (EEG) when children prepared to direct attention to the spatial location or gender of an upcoming target talker who spoke in a mixture of three talkers. Compared to normally-hearing children, hearing-impaired children showed significantly less evidence of preparatory brain activity when required to direct spatial attention. This finding is consistent with the idea that hearing-impaired children have a reduced ability to prepare spatial attention for an upcoming talker. Moreover, preparatory brain activity was not restored when hearing-impaired children listened with their acoustic hearing aids. An implication of these findings is that steps to improve auditory attention alongside acoustic hearing aids may be required to improve the ability of hearing-impaired children to understand speech in the presence of competing talkers

    Correction of evident falsehood requires explicit negation

    Get PDF
    The danger of receiving false information is omnipresent, and people might be highly vigilant against being influenced by falsehoods. Yet, as research on misinformation reveals, people are often biased by false information, even when they know the valid alternative. The question is why? The current research explores the relative encoding strength of two opposing alternatives involved in the correction of falsehood: the false concept and the valid concept. These encoding strengths may be critical for what people remember and how they act upon receiving false information. We compared two triggers for the correction of falsehood—a sentence consisting of clearly false information (e.g., “honey is made by butterflies”) and a sentence consisting of an explicit negation of this information (e.g., “honey is not made by butterflies”). The general pattern of results from five experiments demonstrates that the valid concept (e.g., “bees”) exhibits a weaker presence in memory than the false concept (e.g., “butterflies”) following the comprehension of evidently false information as compared to its explicit negation. Thus, the current research provides an answer to the riddle of the persistence of false information: False information is less likely to be mentally corrected if it is not explicitly negated. Even when people detect that a sentence is false, they tend to focus on the false concept rather than on the valid concept. These findings shed new light on extant research and offer fresh insights about the processing of false information and related phenomena such as the reliance on misinformation

    Alleviating the Last Mile of Encoding: The mei-friend Package for the Atom Text Editor

    Get PDF
    MEC 2021 BEST PAPER AWARD. Though MEI is widely used in music informatics and digital musicology research, the relative lack of authoring software and the specialised nature of its community have limited the availability of high-quality MEI encodings. Translating to MEI from other encoding formats, or generating MEI via optical music recognition processes, is thus a typical component of many MEI-project workflows. However, automated translations rarely achieve results of sufficient quality, a problem well-known in the community and documented in the literature. Final correction and validation by hand is therefore a common requirement. In this paper, we present meifriend, an extension to the Atom text editor, which aims to relieve the degree of manual labour required in this process. The tool facilitates most common MEI editing tasks including the insertion and manipulation of MEI elements, makes the encoded score visible and interactively accessible to the user, and provides quality-of-life conveniences including keyboard shortcuts for editing functions as well as intelligent navigation of the MEI hierarchy. We detail the tool’s implementation, describe its functionalities, and evaluate its responsiveness during the editing process, even when editing very large MEI files

    Visual Representations of Norwegian Language Learners in Norwegian Second Language Textbooks.

    Get PDF
    Master's thesis in Literacy studiesThis thesis presents a study of how Norwegian language learners are visually represented in three Norwegian second language (NSL) textbooks produced for adult learners of Norwegian language. Using social actor analysis and critical visual literacy, the study investigates whether images presenting Norwegian language learners portray them as potential members of a culturally diverse Norway, or as “exotic” and “other”. The study focuses on aspects of otherizing, stereotyping and power relations between groups of people expressed through visual discourse in the NSL textbooks. The purpose of the study is to examine whether there are any patterns in the visual representation of Norwegian language learners in the textbooks. Drawing on the assumptions of critical discourse analysis that social processes influence the modes and content of visual representations, the study connects these patterns to the socio-political situation in present Norway. The Theory of Recognition is drawn upon to further the analysis with regard to whether the images represent Norwegian language learners with recognition of their agency and potential for Norwegian society. The study also investigates to what degree the images of the Norwegian language learners promote or contradict the primary aims of the Norwegian language program for adults stipulated in The Introduction Law. The results of the study indicate that there is a tendency in the three NSL textbooks examined to show Norwegian language learners and representatives of Norwegian society as social, cultural and biological strangers. Notably, the strategy of otherizing is apparent in the visual images of the representatives of Norwegian society, who are portrayed as separate from the Norwegian language learners viewing the textbook. Additionally, the comparative analysis between the image corpora of Norwegian language learners versus representatives of Norwegian society reveals a tendency to portray Norwegian language learners in less powerful positions than representatives of Norwegian society. Consequently, the study shows that images in the selected NSL textbooks may be indicative of social ideologies and can potentially transfer an unintended hidden curriculum to Norwegian language learners that they hold lower social status and are separate from the Norwegian mainstream

    Economic Trends in Enterprise Search Solutions

    Get PDF
    Enterprise search technology retrieves information within organizations. This data can be proprietary and public, its access to it may be restricted or not. Enterprise search solutions render business processes more efficient particularly in data-intensive companies. This technology is key to increasing the competitiveness of the digital economy; thus it constitutes a strategic market for the European Union. The Enterprise Search Solution (ESS) market was worth close to one billion USD in 2008 and is expected to grow quicker than the overall market for information and knowledge management systems. Optimistic market forecasts expect market size to exceed 1,200 million USD by the end of 2010. Other market analyses see the growth rate slowing down and stabilizing at around 10% a year in 2010. Even in the least favourable case, enterprise search remains an attractive market, particularly because of the opportunities expected to arise from the convergence of ESS and Information Systems. This report looks at the demand and supply side of ESS and provides data about the market. It presents the evolution of market dynamics over the past decade and describes the current situation. Our main thesis is that ESS is currently placed at the point where two established markets, namely web search and the management of information systems, overlap. The report offers evidence that these two markets are converging and discusses the role of the different stakeholders (providers of web search engines, enterprise resource management tools, pure enterprise search tools, etc.) in this changing context.JRC.DDG.J.4-Information Societ

    Music Encoding Conference Proceedings

    Get PDF
    UIDB/00693/2020 UIDP/00693/2020publishersversionpublishe
    corecore