Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Goals:

  • Provide the other side of jupyter so users know what to look out for.



Some people hate Jupyter!

shouting-development.gif


But why?

  • If you only learn to program in Jupyter notebooks, it’s possible that you’ll develop bad coding practices.

  • It’s somewhat counterintuitive to object oriented programming and can discourage OOP practices.

    • As part of that, it can discourage modularity.

  • Most jupyter notebooks and their output are not easily reproducible. Even if you don’t have cells that use randomization, if the original creator only ran some cell a few times and then just kept the state without running it in subsequent runs, the next person who runs it won’t be able to reproduce it running it from scratch.

    • You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. You’ll end up with different output.

    • Cells executed in different orders give you different output. You can override the linear run of cells in jupyter.


There are great things about Jupyter

  1. Encourages well documented/commented code

  2. Great visualization

  3. Provides a good mechanism for users explain their workflow and processes to others


It’s wasn’t originally intended to be used on an HPC

In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows.

You can however use them together, and tools are available if you want to do this:

  • ipython parallel

  • dask

  • spark

In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. Instead what you may have is two separate jobs running simultaneously with information communicated between them.


Next Steps

Use the following link to provide feedback on this training: https://forms.gle/qBBwXpKeTNqSR5516 or use the QR code below.

jupyter.png

  • No labels