An approach to temporal concept localization in videos

Dilipkumar A. Borikar; Sushant Kumar; Rakshit Bhagwat Kathawate; Sarthak Prakash  Baiswar; Sourav Jagannath Roy

doi:10.53730/ijhs.v6nS1.7524

An approach to temporal concept localization in videos

https://doi.org/10.53730/ijhs.v6nS1.7524

Authors

Dilipkumar A. Borikar
borikarda@rknec.edu
Shri Ramdeobaba Colllge of Engineering and Management, Nagpur, Maharashtra, India
Sushant Kumar Ola Cabs, Bengaluru, Karnataka, India
Rakshit Bhagwat Kathawate VMware India Pvt. Ltd., Bengaluru, Karnataka, India
Sarthak Prakash Baiswar Compass Inc., Hyderabad, Telangana, India
Sourav Jagannath Roy p360 Solutions, Mumbai, Maharashtra, India

Keywords:

Video Understanding, NLP, Video Segmentation, Bi-LSTM, Random Forest, Concept Localization

Abstract

Localizing moments in the videos has been a new challenging task in the field of Computer Science to provide faster search time for video retrieval, query processing and also behavioral analysis. The process involves stages such as video understanding, video segmentation, query processing using NLP and generation of localization of the concepts in the video. Though there have been many attempts to Video understanding in the field of NLP and Computer Vision in past years, they lack to cover the large untrimmed videos in current real-life scenarios. We propose the deep learning-based solution with the use of Random Forest and Bi-LSTM approach to localize the labels in segments and also the time at which they pertain to the particular segments. We used the YouTube 8M dataset provided by YouTube in Kaggle's challenge to train our frame-based model and use it to classify the segments using sliding windows of size 5. Our approach tries to provide a naïve and robust approach to model this concept and provide a way to tackle this large problem. Further improvements in the Bi-LSTM based models and Random Forest models with VLAD would lead to better results.

Downloads

Download data is not yet available.

References

JiyangGao, Chen Sun, Zhenheng Yang, Ram Nevatia, “TALL: Temporal Activity Localization via Language Query” ICCV, 2017, pp. 5267-5275

Hendricks, Lisa Anne, et al., “Localizing Moments in Video with Temporal Language.” EMNLP (2018).

SimonyanK. and Zisserman A., “Two-stream convolutional networks for action recognition in videos”, NIPS, 2014.

Tran D., Bourdev L., Fergus R., Torresani L., and Paluri M., “Learning spatiotemporal features with 3d convolutional network”,ICCV, 2015.

Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R. and Fei-Fei L., “Large-Scale Video Classification with Convolutional Neural Networks,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1725-1732.

Oneata D., Verbeek J., andSchmid C., “Action and Event Recognition with Fisher Vectors on a Compact Feature Set”, ICCV, 2013.

Shou Z., Wang D., and Chang S. F., “Temporal action localization in untrimmed videos via multi-stage CNNs” ,CVPR, 2016.

Sun Chen, “Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images”, MM’15 Proceedings of the 23rd ACM international conference on Multimedia Pages 371-380.

Wu J., Yin B., and Qi W., “Video Motion Segmentation Based on Double Sliding Window”, IEEE, November 2011.

HochreiterS. and Schmidhuber J., 1997. “Long Short-Term Memory”. Neural Comput. 9, 8 (November 15, 1997), 1735–1780.

Donahue, Hendricks L. A., Guadarrama S., Rohrbach M., Venugopalan S., Saenko K. and Darrell T., “Long-term recurrent convolutional networks for visual recognition and description”, CVPR, 2011.

Srivastava, Mansimov E. and Salakhutdinov R., “Unsupervised learning of video representations using LSTMs”, ICML, 2015.

Wang and Schmid, “Action Recognition with Improved Trajectories”, ICCV, 2013.

Joe Yue-Hei Ng, Matthew Hausknecht, SudheendraVijayanarasimhan, OriolVinyals, RajatMonga, George Toderici, “Video Understanding with Deep Networks”, University of Cornell, 2015.

Published

18-05-2022

How to Cite

Borikar, D. A., Kumar, S., Kathawate, R. B., Baiswar, S. P. ., & Roy, S. J. (2022). An approach to temporal concept localization in videos. International Journal of Health Sciences, 6(S1), 10473–10483. https://doi.org/10.53730/ijhs.v6nS1.7524

Download Citation

Issue

Vol. 6 No. S1 (2022)

Section

Peer Review Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.

Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.

This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.