Transfer learning takes a leap with $R^2$-Tuning.
$R^2$-Tuning marks a leap forward in transfer learning, specifically in video temporal grounding. This framework uses the strength of CLIP features for spatio-temporal modeling in a novel way. It introduces a lightweight $R^2$ Block, which gradually combines and improves spatial features from initial layers. As a result, it sets new high standards on three VTG tasks—without relying on extra backbones.
Read more: Arxiv