A REVIEW ON MACHINE LEARNING ALGORITHMS ON HUMAN ACTION RECOGNITION

Ankush Rai; Jagadeesh Kannan R

doi:10.22159/ajpcr.2017.v10s1.19977

Authors

Ankush Rai School of Computer Science & Engineering, VIT University, Chennai, Tamil Nadu, India.
Jagadeesh Kannan R School of Computer Science & Engineering, VIT University, Chennai, Tamil Nadu, India.

DOI:

https://doi.org/10.22159/ajpcr.2017.v10s1.19977

Keywords:

Algorithms, computer vision, human activity recognition, event detection, activity analysis, video recognition

Abstract

Human action recognition is a vital field of computer vision research. Its applications incorporate observation frameworks, patient monitoring frameworks, and an assortment of frameworks that include interactions between persons and electronic gadgets, for example, human-computer interfaces. The vast majority of these applications require an automated recognition of abnormal or anomalistic action states, made out of various straightforward (or nuclear) actions of persons. This study gives an overview of different best in class research papers on human movement recognition. Open datasets intended for the assessment of the recognition procedures are also discussed in this paper too, for comparing results of several methodologies on this datasets. We examine both the approaches produced for basic human actions and those for abnormal action states. These methodologies are taxonomically classified based on looking at the points of interest and constraints of every methodology. Space-time volume approaches and sequential methodologies that represent actions and perceive such action sets straightforwardly from images are discussed. Next, hierarchical recognition approaches for abnormal action states are introduced and looked at. Statistics based methodologies, syntactic methodologies, and description based methodologies for hierarchical recognition is examined in the paper.

Downloads

Download data is not yet available.

References

Aggarwal J, Ryoo M. Human activity analysis: A survey. ACM Comput Surv 2011;43:1-43.

Poppe R. A survey on vision-based human action recognition. Image Vis Comput 2010;28:976-90.

Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 2011;115:224-41.

Turaga P, Chellappa R, Subrahmanian VS, Udrea O. Machine recognition of human activities: A survey. IEEE Trans Circuits System Video Technol 2008;18:1473-88.

Candamo J, Shreve M, Goldgof DB, Sapper DB, Kasturi R. Understanding transit scenes: A survey on human behavior recognition algorithms. IEEE Trans Intell Transp Syst 2010;11:206-24.

Chaudhary A, Raheja JL, Das K, Raheja S. A survey on hand gesture recognition in context of soft computing. In: Meghanathan N, Kaushik BK, Nagamalai D, editors. Advanced Computing. Berlin: Springer; 2011. p. 46-55.

Schuldt C, Barbara I. Recognizing Human Actions: A local SVM Approach. IEEE Computer Society; 2004.

Blank M, Gorelick L, Shechtman E, Irani M, Basri R. Actions as space-time shapes. In: IEEE International Conference on Computer Vision (ICCV); 2005. p. 1395-402.

Weinland D, Ronfard R, Boyer E. Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 2006;104:249-57.

Gross R, Shi J. The CMU motion of body (MoBo) database. Technical Report CMU-RI-TR-01-18. Pittsburgh, PA: Robotics Institute; 2001.

Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2008.

Marszalek M, Laptev I, Schmid C. Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2009.

Sigal L, Black MJ. HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Int J Comput Vis 2006;87:4.

University CM. CMU graphics lab Motion Capture Database; 2006. Available from: http://www.mocap.cs.cmu.edu.Technical Report.

Bobick AF, Davis JW. The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 2001;23:257-67.

Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang T. Action detection in complex scenes with spatial and temporal ambiguities. In: IEEE International Conference on Computer Vision (ICCV); 2009. p. 128-35.

Qian H, Mao Y, Xiang W, Wang Z. Recognition of human activities using SVM multi-class classifier. Pattern Recognit Lett 2010;31:100-11.

Roh MC, Shin HK, Lee SW. View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognit Lett 2010;31:639-47.

Han J, Bhanu B. Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 2006;28:316-22.

Kim W, Lee J, Kim M, Oh D, Kim C. Human action recognition using ordinal measure of accumulated motion. EURASIP J Adv Signal Process 2010;2010:1-12.

Ikizler N, Duygulu P. Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image Vis Comput 2009;27:1515-26.

Fang CH, Chen JC, Tseng CC, Lien JJ. Human Action Recognition Using Spatio-Temporal Classification; 2010. p. 98-109.

Ziaeefard M, Ebrahimnezhad H. Hierarchical human action recognition by normalized-polar histogram. In: International Conference on Pattern Recognition (ICPR); 2010. p. 3720-3.

Wang Y, Mori G. Human action recognition by Semi latent topic models. IEEE Trans Pattern Anal Mach Intell 2009;31:1762-74.

Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res 2003;3:993-1022.

Blei D, Lafferty J. Correlated topic models. Adv Neural Inf Process Syst 2006;18:147.

Guo K, Ishwar P, Konrad J. Action recognition in video by covariance matching of silhouette tunnels. In: The 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing; 2009. p. 299-306.

Kim TK, Cipolla R. Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 2009;31:1415-28.

Liu C, Yuen PC. Human action recognition using boosted Eigen actions. Image Vis Comput 2010;28:825-35.

Cao L, Luo J, Liang F, Huang TS. Heterogeneous feature machines for visual recognition. In: International Conference on Computer Vision (ICCV); 2009. p. 1095-102.

Johansson G. Visual motion perception. Sci Am 1975;232:76-88.

Messing R, Kautz H. Activity recognition using the velocity histories of tracked key points. In: IEEE International Conference on Computer Vision (CVPR); 2009. p. 104-11.

Lucas BD, Kanade T. An iterative image registration technique with an application to stereo vision. In: The 7th International Joint Conference on Artificial Intelligence. Vol. 2; 1981. p. 674-9.

Wang H, KlÃ¤ser A, Schmid C, Cheng-Lin L. Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA: Colorado Springs; 2011. p. 3169-76.

Laptev I, Lindeberg T. Space-time interest points. In: IEEE International Conference on Computer Vision (ICCV); 2003. p. 432-9.

Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance; 2005.

Bregonzio M, Gong S, Xiang T. Recognising function as clouds of space-time interest points. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on; 2009. p. 1948-55.

Jones S, Shao L, Zhang J, Liu Y. Relevance feedback for real-world human action retrieval. Pattern Recognit Lett 2012;33:446-52.

Thi TH, Zhang J, Cheng L, Wang L, Satoh S. Human action recognition and localization in video using structured learning of local space-time features. In: IEEE International Conference on Advanced Video and Signal Based Surveillance; 2010. p. 204-11.

Gilbert A, Illingworth J, Bowden R. Fast realistic multi action recognition using mined dense spatio-temporal features. In: IEEE International Conference on Computer Vision (ICCV); 2009. p. 925-31.

Harris C, Stephens M. A combined corner and edge detector. In: Alvey Vision Conference; 1988. p. 189-92.

Sadek S, Al-Hamadi A, Michaelis B, Sayed U. An action recognition scheme using fuzzy log-polar histogram and temporal self-similarity. EURASIP J Adv Signal Process 2011;AQ5 ???:???.

Holte MB, Moeslund TB, Nikolaidis N, Pitas I. 3D human action recognition for multi-view camera systems. In: International Conference on 3D Imaging, Modeling, Processing and Transmission; 2011. p. 342-9.

Ikizler-Cinbis N, Sclaroff S. Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer vision (ECCV): Part I; 2010. p. 494-507.

Oikonomopoulos A, Pantic M, Patras I. Sparse b-spline polynomial descriptors for human activity recognition. Image Vis Comput 2009;27:1814-25.

Le QV, Zou WY, Yeung SY, Ng AY. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2011. p. 3361-8.

Lui YM, Beveridge JR. Tangent bundle for human action recognition. In: IEEE International Conference on Automatic Face and Gesture Recognition; 2011. p. 97-102.

Minhas R, Baradarani A, Seifzadeh S, Wu QM. Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 2010;73:1906-17.

Rapantzikos K, Avrithis Y, Kollias S. Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition; 2009.

Shao L, Ji L, Liu Y, Zhang J. Human action segmentation and recognition via motion and shape analysis. Pattern Recognit Lett 2012;33:438-45.

Yu TH, Kim TK, Cipolla R. Real-time action recognition by spatiotemporal semantic and structural forest. In: Proceedings of the British Machine Vision Conference (BMVC); 2010. p. 52.1-12.

Zhu G, Yang M, Yu K, Xu W, Gong Y. Detecting video events based on action recognition in complex scenes using spatiotemporal descriptor. In: 17th ACM International Conference on Multimedia; 2009. p. 165-74.

Rosten E, Drummond T. Machine learning for high-speed corner detection. In: European Conference on Computer Vision (ECCV); 2006. p. 430-43.

Darrell T, Pentland A. Space-time gestures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 1993. p. 335-40.

Gavrila DM, Davis LS. Towards 3-D model-based tracking and recognition of human movement: A multi-view approach. In: In International Workshop on Automatic Face - And Gesture-Recognition. IEEE Computer Society; 1995. p. 272-7.

Veeraraghavan A, Roy-Chowdhury AK. The function space of an activity. In: In Proceedings Computer Vision Pattern Recognition; 2007. p. 959-68.

Yacoob Y, Black MJ. Parameterized modeling and recognition of activities. In: IEEE International Conference on Computer Vision (ICCV); 1998. p. 120-7.

Lublinerman R, Ozay N, Zarpalas D, Camps O. Activity recognition from silhouettes using linear systems and model invalidation techniques. In: International Conference on Pattern Recognition (ICPR); 2006. p. 347-50

Lin Z, Jiang Z, Davis LS. Recognizing actions by shape motion prototype trees. In: IEEE International Conference on Computer Vision; 2009. p. 444-51.

Bobick AF, Wilson AD. A state-based approach to the representation and recognition of gesture. IEEE Trans Pattern Anal Mach Intell 1997;19:1325-37.

Starner T, Weaver J, Pentland A. Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 1998;20:1371-5.

Yamato J, Ohya J, Ishii K. Recognizing human action in time-sequential images using hidden Markov model. In: Computer Vision and Pattern Recognition, Proceedings CVPR â€˜92, 1992 IEEE Computer Society Conference on; 1992. p. 379-85.

Lv F, Nevatia R. Single view human action recognition using key pose matching and Viterbi path searching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2007. p. 1-8.

Natarajan P, Nevatia R. Coupled hidden semi Markov models for activity recognition. In: IEEE Workshop on Motion and Video Computing; 2007. p. 10-7.

Yu E, Aggarwal JK. Human Action Recognition with Extremities as Semantic Posture Representation. Vision Research; 2009. p. 1-8.

Kellokumpu V, Zhao G, Pietikainen M. Recognition of human actions using texture descriptors. Mach Vis Appl 2009;22:767-80.

Shi Q, Cheng L, Wang L, Smola A. Human action segmentation and recognition using discriminative semi-Markov models. Int J Comput Vis 2010;93:22-32.

Oliver N, Horvitz E, Garg A. Layered representations for human activity recognition. In: IEEE International Conference on Multimodal Interfaces; 2002. p. 3-8.

Yu E, Aggarwal JK. Detection of fence climbing from monocular video. In: The 18th International Conference on Pattern Recognition (ICPR); 2014.

Zhang D, Gatica-Perez D, Bengio S, Mccowan I, Lathoud G. Modeling individual and group actions in meetings: A two-layer hmm framework. In: IEEE Workshop on Event Mining in Video (CVPR EVENT); 2004.

Dai P, Di H, Dong L, Tao L, Xu G. Group interaction analysis in dynamic context. IEEE Trans Syst Man Cybern B 2008;39:34-42.

Gong S, Xiang T. Recognition of group activities using dynamic probabilistic networks. In: IEEE International Conference on Computer Vision (ICCV); 2003. p. 742.

Shi Y, Huang Y, Minnen D, Bobick A, Essa I. Propagation networks for recognition of partially ordered sequential action. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2004. p. 862-9.

Yin J, Meng Y. Human activity recognition in video using a hierarchical probabilistic latent model. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition - Workshops; 2010. p. 15-20.

Han L, Wu X, Liang W, Hou G, Jia Y. Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput 2010;28:836-49.

Mauthner T, Roth PM, Bischof H. Temporal feature weighting for prototype-based action recognition. In: The 10th Asian Conference on Computer Vision; 2011. p. 566-79.

Zeng Z, Ji Q. Knowledge based activity recognition with dynamic Bayesian network. In: The 11th European conference on Computer Vision (ECCV); 2010. p. 532-46.

Ivanov Y, Bobick A. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 2000;22:852-72.

Joo SW, Chellappa R. Attribute grammar-based event recognition and anomaly detection. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop; 2006. p. 107-14.

Minnen D, Essa I, Starner T. Expectation grammars: Leveraging high-level expectations for activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2003. p. 626-32.

Moore D, Essa I. Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI National Conference on Artificial Intelligence; 2002. p. 770-6.

Kitani K, Sato Y, Sugimoto A. Recovering the basic structure of human activities from a video-based symbol string. In: IEEE Workshop on Motion and Video Computing; 2007. p. 9.

Wang L, Wang Y, Gao W. Mining layered grammar rules for action recognition. Int J Comput Vis 2010;93:162-82.

Nevatia R, Hobbs J, Bolles B. An ontology for video event representation. In: IEEE Conference Computer Vision and Pattern Recognition Workshop; 2004. p. 119.

Ryoo MS, Aggarwal JK. Recognition of composite human activities through context-free grammar based representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2006. p. 1709-18.

Pinhanez C, Bobick A. Human action detection using PNF propagation of temporal constraints. In: In Proceedings of the Conference on Computer Vision and Pattern Recognition; 1997. p. 898-904.

Intille SS, Bobick AF. A Framework for Recognizing Multi Agent Action from Visual Evidence. In: Proceedings AAAI-99. AAAI Press; 1999. p. 518-25.

Ghanem N, Dementhon D, Doermann D, Davis L. Representation and recognition of events in surveillance video using petri nets. In: Proceedings of Conference on Computer Vision and Pattern Recognition Workshops CVPRW; 2004.

Siskind JM. Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J Artif Intell Res 2001;15:31-90.

Aggarwal JK, Ryoo MS. Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 2009;82:1-24.

Gupta A, Srinivasan P, Shi J, Davis L. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2009. p. 2012-9.

Tran SD, Davis LS. Event modeling and recognition using Markov logic networks. In: The 10th European Conference on Computer Vision: Part II; 2008. p. 610-23.

Ijsselmuiden J, Stiefelhagen R. Towards high-level human activity recognition through computer vision and temporal logic. In: The 33rd Annual German Conference on Advances in Artificial Intelligence; 2010. p. 426-35.

Morariu VI, Davis LS. Multi-Agent Event Recognition in Structured Scenarios. In: CVPR; 2011.