PyTorch Based Environments and Issues

Projects Built Upon or Using PyTorch

There are more and more open source projects that use the PyTorch ecosystem as the underlying machine learning engine or have direct integration with, which they build upon and extend - typically based around LLMs.

For example:

Issues and Expectations

When submitting any issue, please be mindful of Submitting Useful Tickets via the Portal.

ARCC is getting more and more questions and issues when these appear to not be running as expected.

Please bear in mind, and consider before submitting questions/issues:

Memory Issues

Remember, GPU devices only have a finite amount of memory, the same as CPUs.

If you are running out of CUDA memory, then:

  • Check how much you’ve requested - can you request more?

  • If you’re using the full amount, then either:

    • Look at reducing the size of your data set.

    • Request a GPU device with more memory.

  • Use the nvidia-smi tool to monitor memory utilization across your allocated devices.