Activity detection and recognition are crucial tasks in various industries, including surveillance and sports analytics. In this talk, we’ll provide an in-depth exploration of human activity understanding, covering the fundamentals of activity detection and recognition, and the challenges of individual and group activity analysis. We’ll use examples from the sports domain, which provides a unique test bed requiring analysis of activities involving multiple people, including complex interactions among them. We will trace the evolution of technologies from early deep learning models to large-scale architectures, with a focus on recent technologies such as graph neural networks, transformer-based models, spatial and temporal attention and vision-language approaches, including their strengths and shortcomings. Additionally, we will examine the computational and deployment challenges associated with dataset scale, annotation complexity, generalization and real-time implementation constraints. We will conclude by outlining potential challenges and future research directions in activity detection and recognition.