Second Brain

❯

000 Zettelkasten

❯

Dynamic Resource Allocation in Vision

Dynamic Resource Allocation in Vision

Jan 01, 20251 min read

efficient_dl
efficient_vision

At the embedding level

1D tokens (truncating patch suffix (at the sequence dim))
- Principal Components Enable A New Language of Images
- FlexTok - Resampling Images into 1D Token Sequences of Flexible Length
2D tokens (truncating patch suffixes (at the embed dim))
- Matryoshka Representation Learning
- Franca - Nested Matryoshka Clustering for Scalable Visual Representation Learning

At the network level (embed_dim, depth, etc)

EA-ViT - Efficient Adaptation for Elastic Vision Transformer
MatFormer - Nested Transformer for Elastic Inference
HydraViT - Stacking Heads for a Scalable ViT
Slicing Vision Transformer for Flexible Inference

At the network level we also have patch pruning and merging methods

Graph View

Created with Quartz v4.5.2 © 2026