666,728 research outputs found

    Action Recognition using High-Level Action Units

    Get PDF
    Vision-based human recognition is the process of naming image sequences with action labels. In this project, a model is developed for human activity detection using high-level action units to represent human activity. Training phase learns the model for action units and action classifiers. Testing phase uses the learned model for action prediction.Three components are used to classify activities such as New spatial- temporal descriptor, Statistics of the context-aware descriptors, Suppress noise in the action units. Representing human activities by a set of intermediary concepts called action units which are automatically learned from the training data. At low-level, we have existing a locally weighted word context descriptor to progress the traditional interest-point-based representation. The proposed descriptor incorporates the neighborhood details effectively. At high-level, we have introduced the GNMF-based action units to bridge the semantic gap in activity representation. Moreover, we have proposed a new joint l2,1-norm based sparse model for action unit selection in a discriminative manner. Broad experiments have been passed out to authorize our claims and have confirmed our intuition that the action unit based representation is dangerous for modeling difficult activities from videos. DOI: 10.17762/ijritcc2321-8169.16042

    Representation and recognition of human actions in video

    Get PDF
    PhDAutomated human action recognition plays a critical role in the development of human-machine communication, by aiming for a more natural interaction between artificial intelligence and the human society. Recent developments in technology have permitted a shift from a traditional human action recognition performed in a well-constrained laboratory environment to realistic unconstrained scenarios. This advancement has given rise to new problems and challenges still not addressed by the available methods. Thus, the aim of this thesis is to study innovative approaches that address the challenging problems of human action recognition from video captured in unconstrained scenarios. To this end, novel action representations, feature selection methods, fusion strategies and classification approaches are formulated. More specifically, a novel interest points based action representation is firstly introduced, this representation seeks to describe actions as clouds of interest points accumulated at different temporal scales. The idea behind this method consists of extracting holistic features from the point clouds and explicitly and globally describing the spatial and temporal action dynamic. Since the proposed clouds of points representation exploits alternative and complementary information compared to the conventional interest points-based methods, a more solid representation is then obtained by fusing the two representations, adopting a Multiple Kernel Learning strategy. The validity of the proposed approach in recognising action from a well-known benchmark dataset is demonstrated as well as the superior performance achieved by fusing representations. Since the proposed method appears limited by the presence of a dynamic background and fast camera movements, a novel trajectory-based representation is formulated. Different from interest points, trajectories can simultaneously retain motion and appearance information even in noisy and crowded scenarios. Additionally, they can handle drastic camera movements and a robust region of interest estimation. An equally important contribution is the proposed collaborative feature selection performed to remove redundant and noisy components. In particular, a novel feature selection method based on Multi-Class Delta Latent Dirichlet Allocation (MC-DLDA) is introduced. Crucial, to enrich the final action representation, the trajectory representation is adaptively fused with a conventional interest point representation. The proposed approach is extensively validated on different datasets, and the reported performances are comparable with the best state-of-the-art. The obtained results also confirm the fundamental contribution of both collaborative feature selection and adaptive fusion. Finally, the problem of realistic human action classification in very ambiguous scenarios is taken into account. In these circumstances, standard feature selection methods and multi-class classifiers appear inadequate due to: sparse training set, high intra-class variation and inter-class similarity. Thus, both the feature selection and classification problems need to be redesigned. The proposed idea is to iteratively decompose the classification task in subtasks and select the optimal feature set and classifier in accordance with the subtask context. To this end, a cascaded feature selection and action classification approach is introduced. The proposed cascade aims to classify actions by exploiting as much information as possible, and at the same time trying to simplify the multi-class classification in a cascade of binary separations. Specifically, instead of separating multiple action classes simultaneously, the overall task is automatically divided into easier binary sub-tasks. Experiments have been carried out using challenging public datasets; the obtained results demonstrate that with identical action representation, the cascaded classifier significantly outperforms standard multi-class classifiers

    LALM: Long-Term Action Anticipation with Language Models

    Full text link
    Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. While traditional methods heavily rely on representation learning trained on extensive video data, there exists a significant limitation: obtaining effective video representations proves challenging due to the inherent complexity and variability in human activities.Furthermore, exclusive dependence on video-based learning may constrain a model's capability to generalize across long-tail classes and out-of-distribution scenarios. In this study, we introduce a novel approach for long-term action anticipation using language models (LALM), adept at addressing the complex challenges of long-term activity understanding without the need for extensive training. Our method incorporates an action recognition model to track previous action sequences and a vision-language model to articulate relevant environmental details. By leveraging the context provided by these past events, we devise a prompting strategy for action anticipation using large language models (LLMs). Moreover, we implement Maximal Marginal Relevance for example selection to facilitate in-context learning of the LLMs. Our experimental results demonstrate that LALM surpasses the state-of-the-art methods in the task of long-term action anticipation on the Ego4D benchmark. We further validate LALM on two additional benchmarks, affirming its capacity for generalization across intricate activities with different sets of taxonomies. These are achieved without specific fine-tuning

    Contextualizing action observation in the predictive brain: Causal contributions of prefrontal and middle temporal areas

    Get PDF
    Available online 16 May 2018Context facilitates the recognition of forthcoming actions by pointing to which intention is likely to drive them. This intention is thought to be estimated in a ventral pathway linking MTG with frontal regions and to further impact on the implementation of sensory predictions within the action observation network (AON). Additionally, when conflicting intentions are estimated from context, the DLPFC may bias action selection. However, direct evidence for the contribution of these areas to context-embedded action representations in the AON is still lacking. Here, we used a perturb-and-measure TMS-approach to disrupt neural activity, separately in MTG and DLPFC and subsequently measure cortico-spinal excitability while observing actions embedded in congruent, incongruent or ambiguous contexts. Context congruency was manipulated in terms of compatibility between observed kinematics and the action goal suggested by the ensemble of objects depicted in the environment. In the control session (vertex), we found an early facilitation and later inhibition for kinematics embedded in congruent and incongruent contexts, respectively. MTG stimulation altered the differential modulation of M1 response to congruent vs. incongruent contexts, suggesting this area specifies prior representations about appropriate object graspability. Interestingly, all effects were abolished after DLPFC stimulation highlighting its critical role in broader contextual modulation of the AON activity.This work was supported by grants from the European Commission (MCSA-H2020-NBUCA, grant N. 656881), the Ministero Istruzione Universita' e Ricerca (Futuro In Ricerca, FIR 2012, Prot. N. RBFR12F0BD; to C.U.), and from Istituto di Ricovero e Cura a Carattere Scientifico ‘E. Medea’ (Ricerca Corrente 2014, Ministero Italiano della Salute; to C.U.)

    Efficient Video Transformers with Spatial-Temporal Token Selection

    Full text link
    Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost. In this paper, we present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples. Specifically, we formulate token selection as a ranking problem, which estimates the importance of each token through a lightweight scorer network and only those with top scores will be used for downstream evaluation. In the temporal dimension, we keep the frames that are most relevant to the action categories, while in the spatial dimension, we identify the most discriminative region in feature maps without affecting the spatial context used in a hierarchical way in most video transformers. Since the decision of token selection is non-differentiable, we employ a perturbed-maximum based differentiable Top-K operator for end-to-end training. We mainly conduct extensive experiments on Kinetics-400 with a recently introduced video transformer backbone, MViT. Our framework achieves similar results while requiring 20% less computation. We also demonstrate our approach is generic for different transformer architectures and video datasets. Code is available at https://github.com/wangjk666/STTS.Comment: Accepted by ECCV 202

    Recognition as a distinguishing criterion of IS journals

    Full text link
    The number of journals publishing information systems (IS) research has grown dramatically over the past few decades. This has resulted in an environment where authors have a wider choice of journals in which to place articles. Electronic journals are now as readily recognised by authorities as print journals. This paper provides firm evidence in support of the assertion that the number of journals publishing IS research has increased. The paper also examines the Australian context where the selection of a journal in which to place an article is influenced by recognition from the Department of Education Science and Training (DEST). In Australia, obtaining DEST recognition as a recognised research journal is not an onerous task, and yet a significant number of IS journals have not done this. Publishing in a DEST recognised journal is essential for Australian researchers to contribute to their organisation&rsquo;s research quantum and hence research funding. Attention is drawn to an increasing number of IS journals not recognised by DEST, and consequent action is recommended.<br /

    Minimum-risk sequence alignment for the alignment and recognition of action videos

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Temporal alignment of videos is an important requirement of tasks such as video comparison, analysis and classification. In the context of action analysis and action recognition, the main guiding element for the temporal alignment are the human actions depicted in the videos. While well-established alignment algorithms such as dynamic time warping are available, they still heavily rely on basic linear cost models and heuristic parameter tuning. Inspired by the success of the hidden Markov support vector machine for pairwise alignment of protein sequences, in this thesis we present a novel framework which combines the flexibility of a pair hidden Markov model (PHMM) with the effective parameter training of the structural support vector machine (SSVM). The framework extends the scoring function of SSVM to capture the similarity of two input frame sequences and introduces suitable feature and loss functions. During learning, we leverage these loss functions for regularised empirical risk minimisation and effective parameter selection. We have carried out extensive experiments with the proposed technique (nicknamed as EHMM-SSVM) against state-of-the-art algorithms such as dynamic time warping (DTW) and generalized canonical time warping (GCTW) on pairs of human actions from four well-known datasets. The results show that the proposed model has been able to outperform the compared algorithms by a large margin in terms of alignment accuracy. In the second part of this thesis we employ our alignment approach to tackle the task of human action recognition in video. This task is highly challenging due to the substantial variations in motion performance, recording settings and inter-personal differences. Most current research focuses on the extraction of effective features and the design of suitable classifiers. Conversely, in this thesis we tackle this problem by a dissimilarity-based approach where classification is performed in terms of minimum distance from templates and where the distance is based on the score of our alignment model, the EHMM-SSVM. In turn, the templates are chosen by means of prototype selection techniques from the available samples of each class. Experimental results over two popular human action datasets have showed that the proposed approach has been capable of achieving an accuracy higher than many existing methods and comparable to a state-of-the-art action classification algorithm

    How does community context influence coalitions in the formation stage? a multiple case study based on the Community Coalition Action Theory

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Community coalitions are rooted in complex and dynamic community systems. Despite recognition that environmental factors affect coalition behavior, few studies have examined how community context impacts coalition formation. Using the Community Coalition Action theory as an organizing framework, the current study employs multiple case study methodology to examine how five domains of community context affect coalitions in the formation stage of coalition development. Domains are history of collaboration, geography, community demographics and economic conditions, community politics and history, and community norms and values.</p> <p>Methods</p> <p>Data were from 8 sites that participated in an evaluation of a healthy cities and communities initiative in California. Twenty-three focus groups were conducted with coalition members, and 76 semi-structured interviews were conducted with local coordinators and coalition leaders. Cross-site analyses were conducted to identify the ways contextual domains influenced selection of the lead agency, coalition membership, staffing and leadership, and coalition processes and structures.</p> <p>Results</p> <p>History of collaboration influenced all four coalition factors examined, from lead agency selection to coalition structure. Geography influenced coalition formation largely through membership and staffing, whereas the demographic and economic makeup of the community had an impact on coalition membership, staffing, and infrastructure for coalition processes. The influence of community politics, history, norms and values was most noticeable on coalition membership.</p> <p>Conclusions</p> <p>Findings contribute to an ecologic and theory-based understanding of the range of ways community context influences coalitions in their formative stage.</p

    Cue-centric model of the fireground incident commander's decision making process

    Get PDF
    Pattern recognition based models propose that in highly routine situations, the FireGround Incident Commanders (FGC) make decisions using experiences of the past similar incidents (Klein et al, 1986), which are stored in memory as schemas (Klein et al, 2006). Due to the nonsystematic development of schemas that guide pattern recognition (Beach & Mitchell, 1978) and the biases attached with pattern recognition (Tversky & Kahnmen, 1974), this approach is least favorable candidate for decision making in nonroutine situations. The nonroutine situations are characterized by: failure to clearly recognize relevant past episodes (Bousfield & Sedgewick, 1944), deliberate avoiding of recalling the past episodes (Jacoby et al, 1989) or time constraint and ambiguity of available information for decision making. This research proposes that in nonroutine situations, the FGCs rely on thorough search and assessment of diagnostic, relevant, and important cues. Therefore, one aim of this research is to propose a model of the FGCs' decision making process for nonroutine situations; the model will base on the use of cues rather than the pattern recognition approach. This research also aims to provide a robust and coherent definition of the FGC’s decision making process and will subsequently specify the structure and the underlying phases of it. The context of the research is the decisions made by the FGCs during large fires, involving at least 5 fire appliances. 20 FGCs from 2 of the UK’s large firebrigades with at least 7 years of experience in command position participated in a fieldwork carried over a period of 1 year. For the data collection, multiple case studies in the form of critical incident reports are obtained from the participants. Each critical incident is explored further through semi-structured interviews. For the data analysis, theoretical or deductive thematic approach and process reconstruction method (Nutt, 1983) are used. Results indicate that the current definition of the term ‘FGC’s decision making process’ is incomplete. The definition of the FGC’s decision making process proposed in this research now, recognizes that each process of selection and evaluation of a course of action to solve a problem (Klein et al, 1986) is preceded by a process of identification of a problem. This definition commensurate with the widely acceptable definition of decision making process proposed in Nutt (1984). This research also found that the FGCs make decisions in 2 cyclic and distinguishable phases, which are the ‘problem recognition’ phase, and the ‘solution generation’ phase. Finally, a cue-centric model of the FGC's decision making process is proposed. The model showed that in nonroutine situations, when pattern recognition fails to guide the decision making process, the FGCs develop a mental model of a situation through thorough search and assessment of the valuable cues based on their diagnosticity, importance and relevance. The mental model assists in identifying problems and selecting a course of action to solve that problem. This research fulfills the need of developing descriptive models for clarifying issues arising in the areas of training, selection, and in developing decision support systems (Klein et al, 1986)
    • …
    corecore