Given Marlin packed weight matrices w1_packed, and w2_packed, return the MoE intermediate size N
Source code in vllm/model_executor/layers/quantization/utils/marlin_utils.py
| def marlin_moe_intermediate_size(w1_packed: torch.Tensor, w2_packed: torch.Tensor):
"""
Given Marlin packed weight matrices w1_packed, and w2_packed,
return the MoE intermediate size N
"""
marlin_tile_size = 16
return w2_packed.size(1) * marlin_tile_size
|