--

Gating model and experts will be trained simultaneously on the same data to perform the same task. for every input, the gating model assigns weights to one of the experts that should handle this input. the gating model learns this during training. During testing, the input passes through gating model which assigns the best expert to predict the output of that specific input. Finally, the MoE output can be the sum of experts outputs*gating coefficients. But this final step is not always the same. one can take the final MoE output as just the output of the expert assigned by the gating model multiplied by the gating coefficient, or you can sum all Gcoeff*expertsOutputs. it is up tp the task itself.

--

--