433 research outputs found

    Implicit Language Model in LSTM for OCR

    Full text link
    Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4\% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our configuration). We believe that this is the first ever attempt at characterizing the strength of the implicit LM in LSTM based OCR systems

    Translation-Enhanced Multilingual Text-to-Image Generation

    Full text link
    Research on text-to-image generation (TTI) still predominantly focuses on the English language due to the lack of annotated image-caption data in other languages; in the long run, this might widen inequitable access to TTI technology. In this work, we thus investigate multilingual TTI (termed mTTI) and the current potential of neural machine translation (NMT) to bootstrap mTTI systems. We provide two key contributions. 1) Relying on a multilingual multi-modal encoder, we provide a systematic empirical study of standard methods used in cross-lingual NLP when applied to mTTI: Translate Train, Translate Test, and Zero-Shot Transfer. 2) We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework, mitigating the language gap and thus improving mTTI performance. Our evaluations on standard mTTI datasets COCO-CN, Multi30K Task2, and LAION-5B demonstrate the potential of translation-enhanced mTTI systems and also validate the benefits of the proposed EnsAd which derives consistent gains across all datasets. Further investigations on model variants, ablation studies, and qualitative analyses provide additional insights on the inner workings of the proposed mTTI approaches.Comment: ACL 2023 (Main

    Collaboration in electronic resource provision in university libraries: SHEDL, a Scottish case study

    Get PDF
    This case study examines the growth of collaboration among Scottish higher education institutions. Following a summary of the work of the Scottish Confederation of University and Research Libraries (SCURL), more detailed information is provided on collaboration in the fields of acquisition, licensing, selection, and purchasing. Some of the UK background is outlined, relating to NESLi2 in particular, in order to illuminate the options within Scotland. The origins of negotiations on electronic resource provision within Scotland are described, drawing on developments in other countries including Ireland and Scandinavia. After initial setbacks, the implementation of the Scottish Higher Education Digital Library (SHEDL) from 2007 to 2009 is detailed. Current benefits arising from SHEDL are explained, and some possible future developments are discussed

    Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

    Full text link
    Scaling up weakly-supervised datasets has shown to be highly effective in the image-text domain and has contributed to most of the recent state-of-the-art computer vision and multimodal neural networks. However, existing large-scale video-text datasets and mining techniques suffer from several limitations, such as the scarcity of aligned data, the lack of diversity in the data, and the difficulty of collecting aligned data. Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content. Other mining approaches do not provide proper language descriptions (video tags) and are biased toward short clips (alt text). In this work, we show how recent advances in image captioning allow us to pre-train high-quality video models without any parallel video-text data. We pre-train several video captioning models that are based on an OPT language model and a TimeSformer visual backbone. We fine-tune these networks on several video captioning datasets. First, we demonstrate that image captioning pseudolabels work better for pre-training than the existing HowTo100M ASR captions. Second, we show that pre-training on both images and videos produces a significantly better network (+4 CIDER on MSR-VTT) than pre-training on a single modality. Our methods are complementary to the existing pre-training or data mining approaches and can be used in a variety of settings. Given the efficacy of the pseudolabeling method, we are planning to publicly release the generated captions

    Smart Sensor Demonstration Payload

    Get PDF
    Sensors are a critical element to any monitoring, control, and evaluation processes such as those needed to support ground based testing for rocket engine test. Sensor applications involve tens to thousands of sensors; their reliable performance is critical to achieving overall system goals. Many figures of merit are used to describe and evaluate sensor characteristics; for example, sensitivity and linearity. In addition, sensor selection must satisfy many trade-offs among system engineering (SE) requirements to best integrate sensors into complex systems [1]. These SE trades include the familiar constraints of power, signal conditioning, cabling, reliability, and mass, and now include considerations such as spectrum allocation and interference for wireless sensors. Our group at NASA s John C. Stennis Space Center (SSC) works in the broad area of integrated systems health management (ISHM). Core ISHM technologies include smart and intelligent sensors, anomaly detection, root cause analysis, prognosis, and interfaces to operators and other system elements [2]. Sensor technologies are the base fabric that feed data and health information to higher layers. Cost-effective operation of the complement of test stands benefits from technologies and methodologies that contribute to reductions in labor costs, improvements in efficiency, reductions in turn-around times, improved reliability, and other measures. ISHM is an active area of development at SSC because it offers the potential to achieve many of those operational goals [3-5]

    Inequality in human development across the globe

    Get PDF
    The Human Development Index is the world's most famous indicator of the level of development of societies. A disadvantage of this index is however, that only national values are available, whereas within many countries huge subnational variation exists in income, health and education. Here we present the Subnational Human Development Index (SHDI), which shows within-country variation in human development and its dimension indices for over 1600 regions within 160 countries. The newly observed variation is particularly strong in low and middle developed countries (home to 70% of the world population) but less important in the most developed ones. While education disparities explain most of the SHDI inequality within low-developed countries, income differences are increasingly responsible for SHDI inequality within more highly developed countries. The new SHDI opens the possibility of studying global socio-economic change with unprecedented coverage and detail, increasing the ability of policy-makers to monitor the Sustainable Development Goals

    Safety and activity of varlilumab, a novel and first-in-class agonist anti-CD27 antibody, for hematologic malignancies.

    Get PDF
    CD27, a costimulatory molecule on T cells, induces intracellular signals mediating cellular activation, proliferation, effector function, and cell survival on binding to its ligand, CD70. Varlilumab, a novel, first-in-class, agonist immunoglobulin G1 anti-CD27 antibody, mediates antitumor immunity and direct killing of CD27+ tumor cells in animal models. This first-in-human, dose-escalation, and expansion study evaluated varlilumab in patients with hematologic malignancies. Primary objectives were to assess safety and the maximum tolerated and optimal biologic doses of varlilumab. Secondary objectives were to evaluate pharmacokinetics, pharmacodynamics, immunogenicity, and antitumor activity. In a 3 + 3 dose-escalation design, 30 patients with B-cell (n = 25) or T-cell (n = 5) malignancies received varlilumab (0.1, 0.3, 1, 3, or 10 mg/kg IV) as a single dose with a 28-day observation period, followed by weekly dosing (4 doses per cycle, up to 5 cycles, depending on tumor response). In an expansion cohort, 4 additional patients with Hodgkin lymphoma received varlilumab at 0.3 mg/kg every 3 weeks (4 doses per cycle, up to 5 cycles). No dose-limiting toxicities were observed. Treatment-related adverse events, generally grade 1 to 2, included fatigue, decreased appetite, anemia, diarrhea, and headache. Exposure was linear and dose-proportional across dose groups and resulted in increases in proinflammatory cytokines and soluble CD27. One patient with stage IV Hodgkin lymphoma experienced a complete response and remained in remission at \u3e33 months with no further anticancer therapy. These data support further investigation of varlilumab for hematologic malignancies, particularly in combination approaches targeting nonredundant immune regulating pathways. This trial was registered at www.clinicaltrials.gov as #NCT01460134

    Education Can Compensate for Society - a Bit

    Get PDF
    In this paper I reflect on the findings of a number of loosely related research projects undertaken with colleagues over the last ten years. Their common theme is equity, in formal education and beyond, in wider family and social settings, and with inequity expressed as the stratification of a variety of educational outcomes. The projects are based on a standard mixture of pre-existing records, official documents, large-scale surveys, observations, interviews and focus groups. The numeric data were largely used to create biographical models of educational experiences, and the in-depth data were used to try to explain individual decisions and disparities at each stage of the model. Data have been collected for England and Wales, in five other countries of the European Union and for Japan. A meta-view of these various findings suggests that national school intakes tend to be at least moderately segregated by prior attainment and socio-economic factors, and that learning outcomes as assessed by formal means, such as examinations, are heavily stratified by these same factors. There is no convincing evidence that compulsory schooling does very much to overcome the initial disparity in the resources and attainment of school intakes. On the other hand, there are indications that the nature of a national school system and the social experiences of young people in schools can begin to equalise educational outcomes as more widely envisaged, including learning to trust and willingness to help others, aspirations, and attitudes to continuing in education and training. The cost-free implications of the argument in this paper, if accepted, are that everything possible should be done to make school intakes comprehensive, and that explicit consideration, by teachers and leaders, of the applied principles of equity could reduce potentially harmful misunderstandings in educational contexts

    AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

    Full text link
    In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training
    • …
    corecore