Skip to content

vllm.model_executor.models.minimax_vl_01

MiniMaxVL01ImageEmbeddingInputs

Bases: TensorSchema

Dimensions
  • bn: Batch size * number of images
  • ifs: Image feature size
  • hs: Hidden size (must match language model backbone)
Source code in vllm/model_executor/models/minimax_vl_01.py
class MiniMaxVL01ImageEmbeddingInputs(TensorSchema):
    """
    Dimensions:
        - bn: Batch size * number of images
        - ifs: Image feature size
        - hs: Hidden size (must match language model backbone)
    """

    type: Literal["image_embeds"] = "image_embeds"
    data: Annotated[torch.Tensor, TensorShape("bn", "ifs", "hs")]

MiniMaxVL01ImagePixelInputs

Bases: TensorSchema

Dimensions
  • bn: Batch size * number of images
  • np: Number of patches + 1
  • c: Number of channels (3)
  • h: Height
  • w: Width

Note that num_patches may be different per batch and image, in which case the data is passed as a list instead of a batched tensor.

Source code in vllm/model_executor/models/minimax_vl_01.py
class MiniMaxVL01ImagePixelInputs(TensorSchema):
    """
    Dimensions:
        - bn: Batch size * number of images
        - np: Number of patches + 1
        - c: Number of channels (3)
        - h: Height
        - w: Width

    Note that `num_patches` may be different per batch and image,
    in which case the data is passed as a list instead of a batched tensor.
    """

    type: Literal["pixel_values"] = "pixel_values"
    pixel_values: Annotated[
        torch.Tensor | list[torch.Tensor],
        TensorShape("bn", "np", 3, "h", "w", dynamic_dims={"np", "h", "w"}),
    ]

    image_sizes: Annotated[torch.Tensor | None, TensorShape("bn", 2)]