20,619 research outputs found

    Text Segmentation Using Exponential Models

    Full text link
    This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities, intended to supersede precision and recall for appraising segmentation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrate the effectiveness of our approach in two very different domains, Wall Street Journal articles and the TDT Corpus, a collection of newswire articles and broadcast news transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    A Mixed Data-Based Deep Neural Network to Estimate Leaf Area Index in Wheat Breeding Trials

    Get PDF
    Remote and non-destructive estimation of leaf area index (LAI) has been a challenge in the last few decades as the direct and indirect methods available are laborious and time-consuming. The recent emergence of high-throughput plant phenotyping platforms has increased the need to develop new phenotyping tools for better decision-making by breeders. In this paper, a novel model based on artificial intelligence algorithms and nadir-view red green blue (RGB) images taken from a terrestrial high throughput phenotyping platform is presented. The model mixes numerical data collected in a wheat breeding field and visual features extracted from the images to make rapid and accurate LAI estimations. Model-based LAI estimations were validated against LAI measurements determined non-destructively using an allometric relationship obtained in this study. The model performance was also compared with LAI estimates obtained by other classical indirect methods based on bottom-up hemispherical images and gaps fraction theory. Model-based LAI estimations were highly correlated with ground-truth LAI. The model performance was slightly better than that of the hemispherical image-based method, which tended to underestimate LAI. These results show the great potential of the developed model for near real-time LAI estimation, which can be further improved in the future by increasing the dataset used to train the model
    corecore