Multi-threading for CPU inference in PyTorch
“CPU threading and TorchScript inference”
Inter- and intra-op parallelism with PyTorch. Reproduced from the docs (source).
I estimated timing on an Apple M2 Pro, with 10 CPU cores (6 “performance” and 4 “efficiency”) and 16GB memory.
Both my Apple M2 Pro and AWS Fargate systems use an OpenMP backend.
Multi-processing vs multi-threading
StackOverflow has a useful answer explaining the distinction between threads and processes.
torch.compile
“torch.compile, the missing manual” by Edward Yang