Swin Transformer from Scratch
Compared to the standard ViT, the Swin Transformer introduces thye ability to consider local and hierarchical features. This is important as CNN have been very effective to capture local and hierarchically relevant features and ViT has been very effective to capture global features. SwinTransformer’s hybrid method combines the best of both worlds.