Using PyTorch on MedicineBow
This page is concerned with functionally using PyTorch on the MedicineBow cluster.
It is not about learning PyTorch.
This page was initially created in October 2024 and is based around stable version 2.4.1
PyTorch is under constant development - versions and functionality updating - ARCC will endeavor to try and keep this page up-to-date. But if with later versions you come across something that is no longer working, please reach out to us.
Overview
PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license
It is an optimized tensor library for deep learning using GPUs and CPUs, one single and/or distributed across multiple nodes.
It provides “a rich ecosystem of tools, libraries, and more to support, accelerate, and explore AI development.”
Multiple GPU and Distributed Examples
The follow pages are based heavily on the examples provided within the Distributed Data Parallel in PyTorch - Video Tutorials