“CPU threading and TorchScript inference”

Flowchart showing the Application Thread Pool and an inter-op thread pool. Inter- and intra-op parallelism with PyTorch. Reproduced from the docs (source).

I estimated timing on an Apple M2 Pro, with 10 CPU cores (6 “performance” and 4 “efficiency”) and 16GB memory.

Both my Apple M2 Pro and AWS Fargate systems use an OpenMP backend.

Multi-processing vs multi-threading

StackOverflow has a useful answer explaining the distinction between threads and processes.

torch.compile

“torch.compile, the missing manual” by Edward Yang