Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)
OpenAI released MRC, a new networking protocol for AI training clusters that enables multipath reliability, improving resilience and performance at scale through the Open Compute Project.
Excerpt
OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.
Read at source: https://openai.com/index/mrc-supercomputer-networking