Scrub through training to see how the distribution adapts to modality imbalance ⟳ drag · scroll to zoom
Interactive MIAM masking distribution for three modalities. Left: Synthetic validation curves — modality 1 converges rapidly to high performance, modality 2 converges more gradually, and modality 3 remains comparatively underperforming throughout training. Right: Resulting MIAM distribution p = (p1, p2, p3), where pm denotes the probability of masking modality m, updated live. 5 000 samples are drawn from the distribution at each epoch. Bottom: Adjust the epoch and the MIAM parameters. The sharpness κ controls concentration toward the hypercube corners, while λ determines the strength of imbalance correction. Dominant modalities (high sm, low dm) are masked more frequently. This behavior is reflected in the ratio ρsm/ρdm = (sm/dm) / geo(s/d), where geo(s/d) is the geometric mean of the per-modality ratios sm/dm across all modalities. As shown in the statistics panel, the larger this ratio, the more strongly modality m is masked compared to a balanced strategy. Adjusting the parameters illustrates how MIAM adapts to different training dynamics.