8 research outputs found

    Video Classification:A Literature Survey

    Get PDF
    At present, so much videos are available from many resources. But viewers want video of their interest. So for users to find a video of interest work has started for video classification. Video Classification literature is presented in this paper. There are mainly three approaches by which process of video classification can be done. For video classification, features are derived from three different modalities: Audio, Text and Visual. From these features, classification has been done. At last, these different approaches are compared. Advantages and Dis-advantages of each approach/method are described in this paper with appropriate applications

    Viznotes – Visual Summaries for videos

    Get PDF
    This project presents a method of visually summarizing TED-like videos called Viznotes. The Viznotes interface provides a structured yet organic summarization of the contents of the video. Derived from the concepts of sketchnoting, this interface provides segments of video represented as a sketch like image with summary of the segments and keywords arranged in a pre-defined template, with certain elements showing chronology and relations. Viznote also provides an interface for navigation of the videos. Further it also enables the user to customize and make a more personal visual summary. Tools like sketching, sketch components, screen image representation etc. help users to leverage additional functionality in note taking

    Automatic categorization and summarization of documentaries

    Get PDF
    In this paper, we propose automatic categorization and summarization of documentaries using subtitles of videos. We propose two methods for video categorization. The first makes unsupervised categorization by applying natural language processing techniques on video subtitles and uses the WordNet lexical database and WordNet domains. The second has the same extraction steps but uses a learning module to categorize. Experiments with documentary videos give promising results in discovering the correct categories of videos. We also propose a video summarization method using the subtitles of videos and text summarization techniques. Significant sentences in the subtitles of a video are identified using these techniques and a video summary is then composed by finding the video parts corresponding to these summary sentences. © 2010 The Author(s)

    Political-advertisement video classification using deep learning methods

    Get PDF
    Today’s digital world consists of vast multimedia contents: images, audios and videos. Thus, the availability of huge video datasets have encouraged researchers to design video classification techniques to group videos into categories of interest. One of the topics of interest to political scientists is automated classification of a video advertisement into a political campaign ad category or others. Recent years have seen a plethora of deep learning-based methods for image and video classification. These methods learn feature representation from the training data along with the classification model. We investigate the effectiveness of three recent deep-learning based video classification techniques for the political video advertisement classification. The best technique among the three yields an accuracy of 80%. In this thesis, we further improve the classification accuracy by combining the results of classification of text features with that of the best deep learning methods we studied. Our method achieves the classification accuracy of 91%

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Design of Hardware CNN Accelerators for Audio and Image Classification

    Get PDF
    Ever wondered how the world was before the internet was invented? You might soon wonder how the world would survive without self-driving cars and Advanced Driver Assistance Systems (ADAS). The extensive research taking place in this rapidly evolving field is making self-driving cars futuristic and more reliable. The goal of this research is to design and develop hardware Convolutional Neural Network (CNN) accelerators for self-driving cars, that can process audio and visual sensory information. The idea is to imitate a human brain that takes audio and visual data as input while driving. To achieve a single die that can process both audio and visual sensory information, it takes two different kinds of accelerators where one processes visual data from images captured by a camera and the other processes audio information from audio recordings. The Convolutional Neural Network AI algorithm is chosen to classify images and audio data. Image CNN (ICNN) is used to classify images and Audio CNN (ACNN) to classify any sound of significance while driving. The two networks are designed from scratch and implemented in software and hardware. The software implementation of the two AI networks utilizes the Numpy library of Python, while the hardware implementation is done in Verilog®. CNN is trained to classify between three classes of objects, Cars, Lights, and Lanes, while the other CNN is trained to classify sirens from an emergency vehicle, vehicle horn, and speech

    Integrating visual, audio and text analysis for news video

    No full text
    In this paper, we present a system developed for content-based broadcasted news video browsing for home users. There are three main factors that distinguish our work from other similar ones. First, we have integrated the image and audio analysis results in identifying news segments. Second, we use the video OCR technology to detect text from frames, which provides a good source of textual information for story classification when transcripts and close captions are not available. Finally, natural language processing (NLP) technologies are used to perform automated categorization of news stories based on the texts obtained from close caption or video OCR process. Based on these video structure and content analysis technologies, we have developed two advanced video browsers for home users: intelligent highlight player and HTML-based video browser. 1
    corecore