vllm.model_executor.models.clip ¶
CLIPAttention ¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
forward ¶
forward(hidden_states: Tensor)
Input shape: Batch x Time x Channel
Source code in vllm/model_executor/models/clip.py
CLIPEncoder ¶
Bases: Module
Transformer encoder consisting of config.num_hidden_layers self attention layers. Each layer is a [CLIPEncoderLayer].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config | CLIPTextConfig | CLIPVisionConfig | CLIPConfig | required |
Source code in vllm/model_executor/models/clip.py
CLIPImagePixelInputs ¶
Bases: TensorSchema
Dimensions
- bn: Batch size * number of images
- c: Number of channels (3)
- h: Height of each image
- w: Width of each image