1,543 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    CamemBERT: a Tasty French Language Model

    Get PDF
    Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. We show that the use of web crawled data is preferable to the use of Wikipedia data. More surprisingly, we show that a relatively small web crawled dataset (4GB) leads to results that are as good as those obtained using larger datasets (130+GB). Our best performing model CamemBERT reaches or improves the state of the art in all four downstream tasks.Comment: ACL 2020 long paper. Web site: https://camembert-model.f

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    FM 34-54, Technical Intelligence, 30 January 1998

    Get PDF
    This manual defines and describes the technical intelligence mission. It names the key technical intelligence organizations involved at the national level and their interrelationships and responsibilities. The manual describes in detail the technical intelligence organizations and operations in a US command in the field. It discusses at length the responsibilities on the key staff sections in the command. It has extensive appendices explaining forms and procedures used by forces in the field. It has an excellent list of acronyms and glossary. Comment by the depositor A clear understanding of the evolution of technical intelligence may not be needed by the intended audience of this manual, and as far as I know, no comprehensive history exists. However, supposed historical facts included in an official manual ought to be true. This manual fails in that respect. For example, this paragraph on Page 1-5 is utter nonsense: Following the Korean War, the United States did not disband its TECHINT capability completely, as had been done at the conclusion of all previous hostilities. But neither did we maintain it at its wartime level. Three small TECHINT detachments remained in place at the Army\u27s research and development centers. By 1962 two of the detachments merged to form the Army\u27s Foreign Science and Technology Center. The third detachment established the Missile Intelligence Agency at Redstone Arsenal. The Surgeon General also operated a Medical Intelligence Center at Fort Detrick, MD. At the end of World War II, technical intelligence staffs remained in the offices of the heads of the seven Army Technical Services. Between then and the creation of the Army Foreign Science and Technical Center, many of those staffs were converted into special purpose intelligence agencies as is documented by DA General Orders. The first such agency, the Signal Corps Intelligence Agency was established at Washington, DC, according to Sec. IV, DA GO 39, 18 Aug 49, before the beginning of the Korean War. Paragraph VIII of DA GO 57, 1962, established the Army Foreign Science and Technology Center and transferred the functions, personnel, records, and equipment of the Chemical Corps, Ordnance Corps, Signal Corps, Transportation Corps, and Quartermaster intelligence agencies to it. In addition Corps of Engineer technical intelligence activities which had been housed in the Army Map Service were transferred to it. The intelligence section in the office of the commanding general of the Army Missile Command was not recognized as an official intelligence production agency until much later in the 1960s. The Medical Information and Intelligence Agency, which was not at Ft Detrick, was not affected by the reorganization of the Army intelligence activities outlined in Department of the Army Reorganization Planning Directive 381-2, 18 May 1962, which is available in the UNL Digital Commons at: http://digitalcommons.unl.edu/usarmyresearch/169/ In fact, the intelligence organization which was formed in the Office of the Surgeon General went through a complicated series of reorganizations before it became the National Center for Medical Intelligence at Ft Detrick. Robert L Bolin, Associate Professor Emeritus, UNL Librarie

    One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

    Get PDF
    When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge
    • 

    corecore