This is my postdoctoral project. In this I am trying to achieve self-supervised learning in vision using contrastive learning. Yet, in this project I am using foveated fixations as a natural augmentation method.
To that end I designed my own active foveated system using NVIDIA DALI library.
This is a fixation output from my foveated saccadic system.
Some of the big questions here are
Is this foveated system paramount to acquire good visual representations in our visual system?
How does it affect our perception of the world and our production and processing of higher cognitive phenomena such as language?
Previous Related Research
This research is based on the work conducted by Ting Chen et al. in which the authors systematically study the major components of their self-supervised framework. The authors show that:
- composition of data augmentations plays a critical role in defining effective predictive tasks,
- introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and
- contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
My Project Status
I am currently testing the visual representations generated by the self-supervised contrastive learning algorithm receiving the foveated input. Basically I have to test how different variations in the augmentations of my foveated system affect the performance of a linear classifier trained on the self-supervised representations learned by my system.
The code of this project is living in the following GitHub repository.