Automatically optimizing execution of unfamiliar tensor operations

At this year’s Conference on Machine Learning and Systems (MLSys), we and our colleagues presented a new auto-scheduler called DietCode, which handles dynamic-shape workloads much more efficiently than its predecessors. Where existing auto-encoders have to optimize each possible shape individually, DietCode constructs a shape-generic search space that enables it to optimize all possible shapes simultaneously.

We tested our approach on a natural-language-processing (NLP) task that could take inputs ranging in size from 1 to 128 tokens. When we use a random sampling of input sizes that reflects a plausible real-world distribution, we speed up the optimization process almost sixfold relative to the best prior auto-scheduler. That speedup increases to more than 94-fold when we consider all possible shapes.

Despite being much faster, DietCode also improves the performance of the resulting code, by up to 70% relative to prior auto-schedulers and up to 19% relative to hand-optimized code in existing tensor operation libraries. It thus promises to speed up our customers’ dynamic-shaped machine learning workloads.

Lifeboat Foundation

Safeguarding Humanity

Blog

Sep 9, 2022