vllm.transformers_utils.processors.hunyuan_vl ¶
split_image_into_patch_blocks ¶
split_image_into_patch_blocks(
pixel_values: Tensor,
patch_size: int = 16,
adaptor_patch_div: int = 4,
) -> Tensor
Split the input image tensor (supporting batch) into large patches of size patch_size, and then further divide each large patch into smaller regions of size (patch_size // adaptor_patch_div) x (patch_size // adaptor_patch_div). Each small region is extracted as a tensor of shape [3, patch_size, patch_size]. The final output contains all such small region tensors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pixel_values | Tensor | Input image tensor of shape [batch_size, 3, H, W]. | required |
patch_size | int | Size of the large patch, e.g., 16. | 16 |
adaptor_patch_div | int | Each large patch is divided into (patch_size // adaptor_patch_div) x (patch_size // adaptor_patch_div) smaller regions. | 4 |
Returns:
| Name | Type | Description |
|---|---|---|
patches | Tensor | A tensor of shape [N, 3, patch_size, patch_size], where N = batch_size * (H // patch_size) * (W // patch_size) * (patch_size // adaptor_patch_div)^2. Each element in the batch corresponds to one small image region. |