Pretraining a modern large language model (LLM), often with ~100B parameters or more, typically involves thousands of ...