Behind the Scenes

To return home, click here.

This page contains a list of "behind the scenes" of a few papers that I have worked on over the years. Hopefully, you may find it interesting to gain some insights of the paper creation process and what we thought about during the project that cannot be included within a paper setting.

Disclaimer

As a note, the below are from my fallible memory and should not be taken as the ultimate source of truth!

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

This paper was techinically started on during my time in Naver Labs Europe with Gabriela and Diane, but (as with a lot of research) very little was kept from what I worked on initially. Whilst this was always going to be a video retrieval paper in some form, the idea to use Parts-of-Speech didn't come until after deciding to not submit to CVPR 2019.

Having worked on the verb and noun clusters on EPIC-Kitchens-55 as well as a previous work all about verbs, it might have seemed natural to disentangle the representations, yet took until winter 2018 for the idea to form. From then the method roughly fell in place quite quickly (though the tensorflow implementation was anything but quick when running...) Some of my favourite results of the paper is the maximum activation neuron figure (Fig. 4). These were (of course) some of the better neurons chosen, but there were a good number of (mostly) monosemantic neurons in both embedding spaces.

The MSR-VTT results came after Gabriela's initial suggestion, and was a lot quicker to set up than I had thought thanks to Antoine's easy to use codebase and gave a nice complement to the story given how different retrieval was on EPIC compared to other video retrieval settings.

Given the horrifically slow code, after starting the Post-Doc one of the first tasks was to recreate the code in PyTorch, making it more modern and removing a lot of the information tracking used for debugging/figures (a fun lockdown task!) I still remember my relief after the first run of the MMEN showed the results were similar to the Tensorflow version. It was the fourth time re-writing the codebase as the project had changed so there was some precedent, but always good to confirm.

Learning Visual Actions Using Multiple Verb-Only Labels

This paper was published in BMVC in 2019, yet work started on this paper around the end of 2016/early 2017. The previous work, SEMBED, really opened pandora's box of how to deal with an open set verb vocabulary, and so the natural next step was to explore this.

As you may have realised, there was almost a 3 year gap between beginning the work and publication, because of the paper's many rejections due to us needing the time to understand this problem (and perhaps for closed set classification to lose some popularity...)

Whilst the idea didn't change much throughout development, i.e. the multiple labels in hard/soft settings across the datasets and the baselines, the framing did. It was only after a research visit to Naver Labs Europe (and helpful discussions with Gabriela Csurka and Diane Larlus) that certain aspects such as the action retrieval setting, manner/result verbs, and the cross-dataset retrieval really came about - tying the whole paper together. Another change since the initial version of the paper was using sigmoid cross-entropy instead of an L2 loss, as suggested by a reviewer. I remember going down a bit of a rabbit hole trying to understand theoretically why this should work better for the soft-assigned labels in a continuous setting, but the reviewer knew what they were talking about and it lead to a nice improvement!

There was a lot of exploration that didn't make the cut in the final project, and some of this was included within my thesis. An example thread of this was including losses to include information from word2vec and WordNet, but these never helped the training and were dropped from the final paper.

This paper ended up being a bit of a resilience test (though what isn't in research) but when presenting this work, the feedback was always positive which led to a continued drive to see the project through to the end. Finally, a huge thanks to Dima for her continued support during the project.

To return home, click here.