Active Foveated Vision

In this project I approach multimodal intelligence by means of active foveated vision

This is my postdoctoral project. In this I am trying to achieve self-supervised learning in vision using contrastive learning. Yet, in this project I am using foveated fixations as a natural augmentation method.

To that end I designed my own active foveated system using NVIDIA DALI library.

This is a fixation output from my foveated saccadic system.

The smaller the section of the cropped image, the bigger the resolution. Just like in our retina, the larger the span, the lower the resolution. Highest resolution is located in a tini section of the retina called fovea.
This system mimics such phenomenon and images have to be scanned by the system by means of several fixations.

Some of the big questions here are

  • Is this foveated system paramount to acquire good visual representations in our visual system?

  • How does it affect our perception of the world and our production and processing of higher cognitive phenomena such as language?

This research is based on the work conducted by Ting Chen et al. in which the authors systematically study the major components of their self-supervised framework. The authors show that:

  1. composition of data augmentations plays a critical role in defining effective predictive tasks,
  2. introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and
  3. contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

My Project Status

I am currently testing the visual representations generated by the self-supervised contrastive learning algorithm receiving the foveated input. Basically I have to test how different variations in the augmentations of my foveated system affect the performance of a linear classifier trained on the self-supervised representations learned by my system.

I am running this project on Cooley Nodes at the ALCF. To that end I am using containerization on Cooley.

The code of this project is living in the following GitHub repository.

Who else is participating?