Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Goals:

  • Provide the other side of jupyter so users know what to look out for.



Not everyone loves Jupyter!

shouting-development.gif


Why?

  • Jupyter notebooks can encourage bad software development practices.

    • Can discourage software written for modularity, reproducibility

    • Without knowing the original notebook creators environment, you cannot reproduce their their results

    • Linters (tools that check code for correctness) are difficult to use in jupyter’s disjointed cell interface.

  • It’s somewhat counterintuitive to object oriented programming and can discourage OOP practices.

    • As part of that, it can discourage modularity.

    • Classes cannot be defined across multiple cells

  • Most jupyter notebooks and their output are not easily reproducible due to hidden cell states.

    • Each cell is in it’s current state due to the order it was run in the notebook. This is not always straightforward.

    • Even if you don’t have cells that use randomization, if the original creator ran cells maintaining their states, then adjusted their order when running or skipping a cell in subsequent runs, the next person who runs it won’t be able to reproduce it running it from scratch.

    • You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. In these situations, you’ll end up with different output.

    • Cells executed in different orders give you different output. You can override the linear run of cells in jupyter.

    • Being able to run snippets of code in arbitrary order can be unintuitive.

  • Jupyter version control can be an issue.

    • There’s no easy way to determine whether a cell has been edited and when.


There are great things about Jupyter

  1. Encourages well documented/commented code

  2. Great visualization

  3. Provides a good mechanism for users explain their workflow and processes to others


Jupyter wasn’t originally intended for use on an HPC

In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows. Classic HPC runs in batches, with long running jobs through terminal access.

You can however use them together, and tools are available if you want to do this:

  • OpenOndemand - makes it easier to launch from HPC with requested resources.

  • ipython parallel (designed to integrate with MPI libraries)

  • dask

  • spark

In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. In these cases, you usually have your classic hpc jobs spawned from a jupyter session. These jobs run simultaneously with jupyter and information gets communicated between them.


Next Steps

Use the following link to provide feedback on this training: https://forms.gle/qBBwXpKeTNqSR5516 or use the QR code below.

jupyter.png

  • No labels