1 research outputs found

    Machine learning based fusion algorithm to perform multimodal summarization

    No full text
    Video summarization is a rapidly growing research field which finds its application in various commercial and personal interests due to the massive surge in the amount of video data available in the modern world. The proposed approach uses ResNet-18 for feature extraction and with the help of temporal interest proposals generated for the video sequences, generates a video summary. The ResNet-18 is a convolutional neural network with eighteen layers. The existing methods don’t address the problem of the summary being temporally consistent. The proposed work aims to create a temporally consistent summary. The classification and regression module are implemented to get fixed length inputs of the combined features. After this, the non-maximum suppression algorithm is applied to reduce the redundancy and remove the video segments having poor quality and low confidence-scores. Video summaries are generated using the kernel temporal segmentation (KTS) algorithm which converts a given video segment into video shots. The two standard datasets TVSum and SumMe are used to evaluate the proposed model. It is seen that the F-score obtained on TVSum and SumMe datasets are 56.13 and 45.06 respectively
    corecore