37,006 research outputs found

    Web Content Extraction Techniques: A survey

    Get PDF
    As technology grows everyday and the amount of research done in various fields rises exponentially the amount of this information being published on the World Wide Web rises in a similar fashion. Along with the rise in useful information being published on the world wide web the amount of excess irrelevant information termed as ‘noise’ is also published in the form of (advertisement, links, scrollers, etc.). Thus now-a-days systems are being developed for data pre-processing and cleaning for real-time applications. Also these systems help other analyzing systems such as social network mining, web mining, data mining, etc to analyze the data in real time or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. For web content extraction task, researchers have proposed many different methods, such as wrapper-based method, DOM tree rule-based method, machine learning-based method and so on. This paper presents a comparative study of 4 recently proposed methods for web content extraction. These methods have used the traditional DOM tree rule-based method as the base and worked on using other tools to express better results

    Hybridizing metric learning and case-based reasoning for adaptable clickbait detection.

    Get PDF
    [EN]The term clickbait is usually used to name web contents which are specifically designed to maximize advertisement monetization, often at the expense of quality and exactitude. The rapid proliferation of this type of content has motivated researchers to develop automatic detection methods, to effectively block clickbaits in different application domains. In this paper, we introduce a novel clickbait detection method. Our approach leverages state-of-the-art techniques from the fields of deep learning and metric learning, integrating them into the Case-Based Reasoning methodology. This provides the model with the ability to learn-over-time, adapting to different users’ criteria. Our experimental results also evidence that the proposed approach outperforms previous clickbait detection methods by a large margin

    Multimodal Content Analysis for Effective Advertisements on YouTube

    Full text link
    The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

    Audio and video processing for automatic TV advertisement detection

    Get PDF
    As a partner in the Centre for Digital Video Processing, the Visual Media Processing Group at Dublin City University conducts research and development in the area of digital video management. The current stage of development is demonstrated on our Web-based digital video system called Físchlár [1,2], which provides for efficient recording, analyzing, browsing and viewing of digitally captured television programmes. In order to make the browsing of programme material more efficient, users have requested the option of automatically deleting advertisement breaks. Our initial work on this task focused on locating ad-breaks by detecting patterns of silent black frames which separate individual advertisements and/or complete ad-breaks in most commercial TV stations. However, not all TV stations use silent, black frames to flag ad-breaks. We therefore decided to attempt to detect advertisements using the rate of shot cuts in the digitised TV signal. This paper describes the implementation and performance of both methods of ad-break detection
    corecore