Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Current »

Goals:

  • Provide the other side of jupyter so users know what to look out for.



Not everyone loves Jupyter!

shouting-development.gif


Why?

  • Jupyter notebooks can encourage bad software development practices.

    • Can discourage software written for modularity, reproducibility

    • Without knowing the original notebook creators environment, it is often difficult to reproduce their results

    • Linters (tools that check code for correctness) are difficult to use in jupyter’s disjointed cell interface.

  • It’s somewhat counterintuitive to object oriented programming and can discourage OOP practices.

    • As part of that, it can discourage modularity.

    • Classes cannot be defined across multiple cells

  • Most jupyter notebooks and their output are not easily reproducible due to hidden cell states.

    • Each cell is in it’s current state due to the order it was run in the notebook. This is not always straightforward.

    • Even if you don’t have cells that use randomization, if the original creator ran cells maintaining their states, then adjusted their order when running or skipping a cell in subsequent runs, the next person who runs it won’t be able to reproduce it running it from scratch.

    • You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. In these situations, you’ll end up with different output.

    • Cells executed in different orders give you different output. You can override the linear run of cells in jupyter.

    • Being able to run snippets of code in arbitrary order can be unintuitive.

  • Jupyter version control can be an issue.

    • There’s no easy way to determine whether a cell has been edited and when.


We can’t tell you what to use

While we may recommend best practices, and provide reasoning, the tools you use for your research are entirely up to you.

Over the long term, this boils down to discipline and using the right tools for the right jobs.
**Generally, these rules will often apply, but one cannot predict every use case:

  • Jupyter is probably not the right tool to share application code**

  • Jupyter was built for collaboration, communication, and interactivity. It is not meant to running critical code**

  • It's convenient to show ideas to others, or experiment with your code

    • Once you’re done experimenting, code it properly in a serious application if you plan to use the code in production**

    • If the code needs to run for a long time, it’s likely not a good idea to write it or run it from a jupyter notebook**


Jupyter wasn’t originally intended for use on an HPC

In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows. Classic HPC runs in batches, with long running jobs through terminal access.

You can however use them together, and tools are available if you want to do this:

  • OpenOndemand - makes it easier to launch from HPC with requested resources.

  • ipython parallel (designed to integrate with MPI libraries)

  • dask

  • spark

In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. In these cases, you usually have your classic hpc jobs spawned from a jupyter session. These jobs run simultaneously with jupyter and information gets communicated between them.


Next Steps

  • No labels