vllm.model_executor.models.minimax_vl_01 ¶

MiniMaxVL01ImageEmbeddingInputs ¶

Bases: TensorSchema

Dimensions

bn: Batch size * number of images
ifs: Image feature size
hs: Hidden size (must match language model backbone)

Source code in vllm/model_executor/models/minimax_vl_01.py

class MiniMaxVL01ImageEmbeddingInputs(TensorSchema):
    """
    Dimensions:
        - bn: Batch size * number of images
        - ifs: Image feature size
        - hs: Hidden size (must match language model backbone)
    """

    type: Literal["image_embeds"] = "image_embeds"
    data: Annotated[torch.Tensor, TensorShape("bn", "ifs", "hs")]

MiniMaxVL01ImagePixelInputs ¶

Bases: TensorSchema

Dimensions

bn: Batch size * number of images
np: Number of patches + 1
c: Number of channels (3)
h: Height
w: Width

Note that num_patches may be different per batch and image, in which case the data is passed as a list instead of a batched tensor.

Source code in vllm/model_executor/models/minimax_vl_01.py

class MiniMaxVL01ImagePixelInputs(TensorSchema):
    """
    Dimensions:
        - bn: Batch size * number of images
        - np: Number of patches + 1
        - c: Number of channels (3)
        - h: Height
        - w: Width

    Note that `num_patches` may be different per batch and image,
    in which case the data is passed as a list instead of a batched tensor.
    """

    type: Literal["pixel_values"] = "pixel_values"
    pixel_values: Annotated[
        torch.Tensor | list[torch.Tensor],
        TensorShape("bn", "np", 3, "h", "w", dynamic_dims={"np", "h", "w"}),
    ]

    image_sizes: Annotated[torch.Tensor | None, TensorShape("bn", 2)]