Machine learning based fusion algorithm to perform multimodal summarization
Keywords:
convolutional neural networks, kernel temporal segmentation, non maximum suppression, redundancy, ResNet-18Abstract
Video summarization is a rapidly growing research field which finds its application in various commercial and personal interests due to the massive surge in the amount of video data available in the modern world. The proposed approach uses ResNet-18 for feature extraction and with the help of temporal interest proposals generated for the video sequences, generates a video summary. The ResNet-18 is a convolutional neural network with eighteen layers. The existing methods don’t address the problem of the summary being temporally consistent. The proposed work aims to create a temporally consistent summary. The classification and regression module are implemented to get fixed length inputs of the combined features. After this, the non-maximum suppression algorithm is applied to reduce the redundancy and remove the video segments having poor quality and low confidence-scores. Video summaries are generated using the kernel temporal segmentation (KTS) algorithm which converts a given video segment into video shots. The two standard datasets TVSum and SumMe are used to evaluate the proposed model. It is seen that the F-score obtained on TVSum and SumMe datasets are 56.13 and 45.06 respectively.
Downloads
References
Jiaxin Wu, Sheng-hua Zhong, Jianmin Jiang and Yunyun Yang, “A novel clustering method for static video summarization”, Springer, Multimedia Tools and Applications, vol. 76, no. 7, pp. 9625-9641, 2016.
S. Zhang, Y. Zhu and A. K. Roy-Chowdhury, “Context-Aware Surveillance Video Summarization,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5469-5478, 2016.
I. Mademlis, A. Tefas, N. Nikolaidis and I. Pitas, “Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics,” IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5828-5840, 2016.
Mussel Cirne, M.V and Pedrini H, “VISCOM: A robust video summarization approach using color co-occurrence matrices”, Springer, Multimedia Tools and Applications, vol. 77, no. 1, pp. 857–875, 2017.
Fairouz Hussein and Massimo Piccardi, “V-JAUNE: A Framework for Joint Action Recognition and Video Summarization”, ACM Transactions in Multimedia Computer Communication Applications, vol. 13, no. 2, pp. 1-19, 2017.
X. Li, B. Zhao and X. Lu, “A General Framework for Edited Video and Raw Video Summarization,” IEEE Transactions on Image Processing, vol. 26, no. 8, pp. 3652-3664, 2017.
S. S. Thomas, S. Gupta and V. K. Subramanian, “Perceptual Video Summarization—A New Framework for Video Summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1790-1802, 2017.
Hu T and Li, “Video summarization via exploring the global and local Importance”, Springer, Multimedia Tools and Applications, vol. 77, no. 1, pp. 22083–22098, 2018.
Martins Pereira and Almeida, “OPFSumm: on the video summarization using Optimum-Path Forest”, Springer, Multimedia Tools and Applications, vol. 79, no. 1, pp. 11195–11211, 2019.
Gharbi, Bahroun and Zagrouba, “E-Key frame extraction for video summarization using local description and repeatability graph clustering”, Signal, Image and Video Processing, vol. 13, pp. 507–515, 2019.
Y. Yuan, H. Li and Q. Wang, “Spatiotemporal Modeling for Video Summarization Using Convolutional Recurrent Neural Network,” IEEE Access, vol. 7, pp. 64676-64685, 2019.
S. S. Thomas, S. Gupta and V. K. Subramanian, “Context Driven Optimized Perceptual Video Summarization and Retrieval,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 10, pp. 3132-3145, 2019.
C. Huang and H. Wang, “A Novel Key-Frames Selection Framework for Comprehensive Video Summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577-589, 2020.
Z. Ji, K. Xiong, Y. Pang and X. Li, “Video Summarization With Attention-Based Encoder–Decoder Networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, June 2020.
Wu Jiaxin, Zhong, Sheng-hua and Liu, “Dynamic Graph Convolutional Network for Multi-video Summarization”, Elsevier Pattern Recognition” vol. 107, no. 1, pp. 1-13, 2020.
B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” IEEE Conference in Computer Vision Pattern Recognition, pp. 2513–2520, 2020.
Avila, BrandãoLopes, A. Luz and Araújo, “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Elseveir Pattern Recognition Letters, vol. 32, no. 1, pp. 56–68, 2020.
K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” IEEE International Conference Artificial Intelligence, pp. 7582–7589, 2020.
K. Zhang, W.-L. Chao, F. Sha, and K. Grau man, “Video summarization with long short-term memory,” IEEE International Conference Computer Vision, pp. 766–782, 2020.
B. Zhao, X. Li, and X. Lu, “Hierarchical recurrent neural network for video summarization,” ACM Multimedia Conference, pp. 863–871, 2020.
B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial LSTM networks,” IEEE Conference Computer Vision Pattern Recognition, pp. 2982–2991, 2020.
B. Zhao, X. Li, and X. Lu, “Property-constrained dual learning for video summarization,” IEEE Trans. on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989–4000, 2020.
A. Sahu, A.S. Chowdhury, “First person video summarization using different graph representations”, Elsevier Pattern Recognition Letters, vol. 146, pp. 185–192, 2021.
W. Zhu, J. Lu, J. Li and J. Zhou, “DSNet: A Flexible Detect-to-Summarize Network for Video Summarization”, IEEE Transactions on Image Processing, vol. 30, pp. 948-962, 2021.
C. Huang and H. Wang, “A Novel Key-Frames Selection Framework for Comprehensive Video Summarization”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577-589, 2020.
Z. Ji, K. Xiong, Y. Pang and X. Li, "Video Summarization with Attention-Based Encoder–Decoder Networks”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, 2020.
Wu, Jiaxin Zhong, Sheng-hua Liu, Yan, “Dynamic Graph Convolutional Network for Multi-video Summarization”, Elsevier Pattern Recognition, vol. 107, pp. 382, 2020.
B. Zhao, X. Li and X. Lu, “Property-Constrained Dual Learning for Video Summarization”, IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989-4000, 2020.
W. Zhu, J. Lu, J. Li and J. Zhou, “DSNet: A Flexible Detect-to-Summarize Network for Video Summarization,” IEEE Transactions on Image Processing, vol. 30, pp. 948-962, 2021.
Basavarajaiah, Madhushree Sharma and Priyanka, “GVSUM: generic video summarization using deep visual features”, Springer Multimedia Tools and Applications, vol. 80, pp. 108-120, 2021.
Z. Ji, X. Yu, Y. Yu, Y. Pang and Z. Zhang, “Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification, “IEEE Transactions on Cybernetics, pp. 1-12, 2021.
Evlampios, Eleni, Alexandros, Mezaris and Ioannis Patras, “Video Summarization Using Deep Neural Networks: A Survey”, IEEE International Conference on Circuits and Systems for Video Technology, pp. 1-6, 2021.
Kasbgar, Mokhtari and Shojaedini, “A New Wavelet Based Spatio-temporal Method for Magnification of Subtle Motions in Video”, International Journal of Engineering, vol. 29, no. 3, pp. 313-320, 2016.
Otroshi Shahreza, Amini, and Behroozi, “Predicting the Empirical Distribution of Video Quality Scores Using Recurrent Neural Networks”, International Journal of Engineering, vol. 33, no. 5, pp. 984-991, 2020.
Salih and George, “Dynamic Scene Change Detection in Video Coding”, International Journal of Engineering, vol. 33, no. 5, pp. 966-974, 2020.
Mehrgan, Ahmadyfard, and Khosravi, “Super-resolution of License-plates Using Weighted Interpolation of Neighboring Pixels from Video Frames”, International Journal of Engineering, vol. 33, no. 5, pp. 992-999, 2020.
Firouzian, Firouzian, Hashemi and Kozegar, “Pain Facial Expression Recognition from Video Sequences Using Spatio-temporal Local Binary Patterns and Tracking Fiducial Points”, International Journal of Engineering, vol. 33, no. 5, pp. 1038-1047, 2020.
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.
 
						 
							 
			
		 
			 
			 
				









