My research focuses on action recognition and understanding its links with language.
I presented a talk on Towards an Unequivocal Representation of Actions at BMVA Symposium: Robotics meets Semantics: Enabling Human-Level Understanding in Robots on 18th July. Slides.
Myself along with two other authors demoed EPIC at CVPR 2018.
I presented a poster of Towards an Unequivocal Representation of Actions at the Brave New Ideas for Video Understanding workshop at CVPR2018.
We have just released the largest egocentric dataset for action and object recognition. More info can be found here.
We have released a shortform version of Towards an Unequivocal Representation of Actions on ArXiv here.
Davide Moltisanti presented the poster for the paper in ICCV2017.
Our paper SEMBED was presented at the first EPIC workshop during ECCV2016.
This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e.g. 'open door', 'open cupboard'), and distinguish differing ones (e.g. 'open door' vs 'open bottle') using verb-only labels. Current approaches for action recognition neglect legitimate semantic ambiguities and class overlaps between verbs, relying on the objects to disambiguate interactions.
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen
Manual annotations of temporal bounds for object interactions (i.e. start and end times) are typical training input to recognition, localization and detection algorithms. For three publicly available egocentric datasets, we uncover inconsistencies in ground truth temporal bounds within and across annotators and datasets. We systematically assess the robustness of state-of-the-art approaches to changes in labeled temporal bounds, for object interaction recognition.
we present sembed, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels.
Unmanned Aerial Vehicles (UAVs) require complex control and significant experience for piloting. While these devices continue to improve, there is, as yet, no device that affords six degrees of freedom (6-DoF) control and directional haptic feedback. We present The Cage, a 6-DoF controller for piloting an unmanned aerial vehicle (UAV).
3 Month Internship
Whilst not working on completing my PhD I enjoy reading - primarily Science Fiction and Fantasy. Below are few books/series I would recommend: