Some people hate Jupyter!

Why?

Jupyter notebooks can encourage bad software development practices.
- Can discourage software written for modularity, reproducibility
- Without knowing the original notebook creators environment, you cannot reproduce their their results
- Linters (tools that check code for correctness) are difficult to use in jupyter’s disjointed cell interface.
It’s somewhat counterintuitive to object oriented programming and can discourage OOP practices.
- As part of that, it can discourage modularity.
- Classes cannot be defined across multiple cells
Most jupyter notebooks and their output are not easily reproducible due to hidden cell states.
- Each cell is in it’s current state due to the order it was run in the notebook. This is not always straightforward.
- Even if you don’t have cells that use randomization, if the original creator ran cells maintaining their states, then adjusted their order when running or skipping a cell in subsequent runs, the next person who runs it won’t be able to reproduce it running it from scratch.
- You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. In these situations, you’ll end up with different output.
- Cells executed in different orders give you different output. You can override the linear run of cells in jupyter.
- Being able to run snippets of code in arbitrary order can be unintuitive.
Jupyter version control can be an issue.
- There’s no easy way to determine whether a cell has been edited and when.

There are great things about Jupyter

Encourages well documented/commented code
Great visualization
Provides a good mechanism for users explain their workflow and processes to others

Jupyter wasn’t originally intended for use on an HPC

In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows. Classic HPC runs in batches, with long running jobs through terminal access.

You can however use them together, and tools are available if you want to do this:

OpenOndemand - makes it easier to launch from HPC with requested resources.
ipython parallel (designed to integrate with MPI libraries)
dask
spark

In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. In these cases, you usually have your classic hpc jobs spawned from a jupyter session. These jobs run simultaneously with jupyter and information gets communicated between them.

Next Steps

Previous

Dive into Jupyter Notebooks

Workshop Home

Jupyter with OnDemand

Use the following link to provide feedback on this training: https://forms.gle/qBBwXpKeTNqSR5516 or use the QR code below.

Problems with Jupyter

Some people hate Jupyter!

Why?

There are great things about Jupyter

Jupyter wasn’t originally intended for use on an HPC

Next Steps