vllm.model_executor.models.minimax_vl_01 ¶
MiniMaxVL01ImageEmbeddingInputs ¶
Bases: TensorSchema
Dimensions
- bn: Batch size * number of images
- ifs: Image feature size
- hs: Hidden size (must match language model backbone)
Source code in vllm/model_executor/models/minimax_vl_01.py
MiniMaxVL01ImagePixelInputs ¶
Bases: TensorSchema
Dimensions
- bn: Batch size * number of images
- np: Number of patches + 1
- c: Number of channels (3)
- h: Height
- w: Width
Note that num_patches may be different per batch and image, in which case the data is passed as a list instead of a batched tensor.