Machine learning based fusion algorithm to perform multimodal summarization

Sunil S. Harakannanavar; Vidyashree R. Kanabur; Veena I. Puranikmath

doi:10.53730/ijhs.v6nS3.5411

Authors

Sunil S. Harakannanavar Department of Electronics and Communication Engineering, Nitte Meenakshi Institute of Technology, Yelahanka, Bangalore-560064, Karnataka, India
Vidyashree R. Kanabur Department of Electronics and Communication Engineering, National Institute of Technology, Surathkal, Mangalore, Karnataka, India
Veena I. Puranikmath Department of Electronics and Communication Engineering, S. G. Balekundri Institute of Technology, Shivabasav Nagar, Belagavi-590010, Karnataka, India

Keywords:

convolutional neural networks, kernel temporal segmentation, non maximum suppression, redundancy, ResNet-18

Abstract

Video summarization is a rapidly growing research field which finds its application in various commercial and personal interests due to the massive surge in the amount of video data available in the modern world. The proposed approach uses ResNet-18 for feature extraction and with the help of temporal interest proposals generated for the video sequences, generates a video summary. The ResNet-18 is a convolutional neural network with eighteen layers. The existing methods don’t address the problem of the summary being temporally consistent. The proposed work aims to create a temporally consistent summary. The classification and regression module are implemented to get fixed length inputs of the combined features. After this, the non-maximum suppression algorithm is applied to reduce the redundancy and remove the video segments having poor quality and low confidence-scores. Video summaries are generated using the kernel temporal segmentation (KTS) algorithm which converts a given video segment into video shots. The two standard datasets TVSum and SumMe are used to evaluate the proposed model. It is seen that the F-score obtained on TVSum and SumMe datasets are 56.13 and 45.06 respectively.

Downloads

Download data is not yet available.

References

Jiaxin Wu, Sheng-hua Zhong, Jianmin Jiang and Yunyun Yang, “A novel clustering method for static video summarization”, Springer, Multimedia Tools and Applications, vol. 76, no. 7, pp. 9625-9641, 2016.

S. Zhang, Y. Zhu and A. K. Roy-Chowdhury, “Context-Aware Surveillance Video Summarization,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5469-5478, 2016.

I. Mademlis, A. Tefas, N. Nikolaidis and I. Pitas, “Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics,” IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5828-5840, 2016.

Mussel Cirne, M.V and Pedrini H, “VISCOM: A robust video summarization approach using color co-occurrence matrices”, Springer, Multimedia Tools and Applications, vol. 77, no. 1, pp. 857–875, 2017.

Fairouz Hussein and Massimo Piccardi, “V-JAUNE: A Framework for Joint Action Recognition and Video Summarization”, ACM Transactions in Multimedia Computer Communication Applications, vol. 13, no. 2, pp. 1-19, 2017.

X. Li, B. Zhao and X. Lu, “A General Framework for Edited Video and Raw Video Summarization,” IEEE Transactions on Image Processing, vol. 26, no. 8, pp. 3652-3664, 2017.

S. S. Thomas, S. Gupta and V. K. Subramanian, “Perceptual Video Summarization—A New Framework for Video Summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1790-1802, 2017.

Hu T and Li, “Video summarization via exploring the global and local Importance”, Springer, Multimedia Tools and Applications, vol. 77, no. 1, pp. 22083–22098, 2018.

Martins Pereira and Almeida, “OPFSumm: on the video summarization using Optimum-Path Forest”, Springer, Multimedia Tools and Applications, vol. 79, no. 1, pp. 11195–11211, 2019.

Gharbi, Bahroun and Zagrouba, “E-Key frame extraction for video summarization using local description and repeatability graph clustering”, Signal, Image and Video Processing, vol. 13, pp. 507–515, 2019.

Y. Yuan, H. Li and Q. Wang, “Spatiotemporal Modeling for Video Summarization Using Convolutional Recurrent Neural Network,” IEEE Access, vol. 7, pp. 64676-64685, 2019.

S. S. Thomas, S. Gupta and V. K. Subramanian, “Context Driven Optimized Perceptual Video Summarization and Retrieval,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 10, pp. 3132-3145, 2019.

C. Huang and H. Wang, “A Novel Key-Frames Selection Framework for Comprehensive Video Summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577-589, 2020.

Z. Ji, K. Xiong, Y. Pang and X. Li, “Video Summarization With Attention-Based Encoder–Decoder Networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, June 2020.

Wu Jiaxin, Zhong, Sheng-hua and Liu, “Dynamic Graph Convolutional Network for Multi-video Summarization”, Elsevier Pattern Recognition” vol. 107, no. 1, pp. 1-13, 2020.

B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” IEEE Conference in Computer Vision Pattern Recognition, pp. 2513–2520, 2020.

Avila, BrandãoLopes, A. Luz and Araújo, “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Elseveir Pattern Recognition Letters, vol. 32, no. 1, pp. 56–68, 2020.

K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” IEEE International Conference Artificial Intelligence, pp. 7582–7589, 2020.

K. Zhang, W.-L. Chao, F. Sha, and K. Grau man, “Video summarization with long short-term memory,” IEEE International Conference Computer Vision, pp. 766–782, 2020.

B. Zhao, X. Li, and X. Lu, “Hierarchical recurrent neural network for video summarization,” ACM Multimedia Conference, pp. 863–871, 2020.

B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial LSTM networks,” IEEE Conference Computer Vision Pattern Recognition, pp. 2982–2991, 2020.

B. Zhao, X. Li, and X. Lu, “Property-constrained dual learning for video summarization,” IEEE Trans. on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989–4000, 2020.

A. Sahu, A.S. Chowdhury, “First person video summarization using different graph representations”, Elsevier Pattern Recognition Letters, vol. 146, pp. 185–192, 2021.

W. Zhu, J. Lu, J. Li and J. Zhou, “DSNet: A Flexible Detect-to-Summarize Network for Video Summarization”, IEEE Transactions on Image Processing, vol. 30, pp. 948-962, 2021.

C. Huang and H. Wang, “A Novel Key-Frames Selection Framework for Comprehensive Video Summarization”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577-589, 2020.

Z. Ji, K. Xiong, Y. Pang and X. Li, "Video Summarization with Attention-Based Encoder–Decoder Networks”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, 2020.

Wu, Jiaxin Zhong, Sheng-hua Liu, Yan, “Dynamic Graph Convolutional Network for Multi-video Summarization”, Elsevier Pattern Recognition, vol. 107, pp. 382, 2020.

B. Zhao, X. Li and X. Lu, “Property-Constrained Dual Learning for Video Summarization”, IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989-4000, 2020.

W. Zhu, J. Lu, J. Li and J. Zhou, “DSNet: A Flexible Detect-to-Summarize Network for Video Summarization,” IEEE Transactions on Image Processing, vol. 30, pp. 948-962, 2021.

Basavarajaiah, Madhushree Sharma and Priyanka, “GVSUM: generic video summarization using deep visual features”, Springer Multimedia Tools and Applications, vol. 80, pp. 108-120, 2021.

Z. Ji, X. Yu, Y. Yu, Y. Pang and Z. Zhang, “Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification, “IEEE Transactions on Cybernetics, pp. 1-12, 2021.

Evlampios, Eleni, Alexandros, Mezaris and Ioannis Patras, “Video Summarization Using Deep Neural Networks: A Survey”, IEEE International Conference on Circuits and Systems for Video Technology, pp. 1-6, 2021.

Kasbgar, Mokhtari and Shojaedini, “A New Wavelet Based Spatio-temporal Method for Magnification of Subtle Motions in Video”, International Journal of Engineering, vol. 29, no. 3, pp. 313-320, 2016.

Otroshi Shahreza, Amini, and Behroozi, “Predicting the Empirical Distribution of Video Quality Scores Using Recurrent Neural Networks”, International Journal of Engineering, vol. 33, no. 5, pp. 984-991, 2020.

Salih and George, “Dynamic Scene Change Detection in Video Coding”, International Journal of Engineering, vol. 33, no. 5, pp. 966-974, 2020.

Mehrgan, Ahmadyfard, and Khosravi, “Super-resolution of License-plates Using Weighted Interpolation of Neighboring Pixels from Video Frames”, International Journal of Engineering, vol. 33, no. 5, pp. 992-999, 2020.

Firouzian, Firouzian, Hashemi and Kozegar, “Pain Facial Expression Recognition from Video Sequences Using Spatio-temporal Local Binary Patterns and Tracking Fiducial Points”, International Journal of Engineering, vol. 33, no. 5, pp. 1038-1047, 2020.