Problems with Jupyter
Goals:
Provide the other side of jupyter so users know what to look out for.
Not everyone loves Jupyter!
|
---|
Why is this?
Jupyter notebooks can encourage bad software development practices. |
|
---|---|
Counterintuitive to object oriented programming and can discourage OOP practices. |
|
Most jupyter notebooks and their output are not easily reproducible. |
|
Jupyter lacks version control |
|
Jupyter can be slowwwwwwww…. |
|
We can’t tell you what to use, but can offer some best practices
While we may recommend best practices, and provide reasoning, the tools you use for your research are entirely up to you.
Over the long term, this boils down to discipline and using the right tools for the right jobs.
**Generally, these rules will often apply, but one cannot predict every use case:
Jupyter is probably not the right tool to share application code**
Jupyter was built for collaboration, communication, and interactivity. It is not meant to running critical code**
It's convenient to show ideas to others, or experiment with your code, or for draft code
Once you’re done experimenting, code it properly in a serious application if you plan to use the code in production**
If the code needs to run for a long time, it’s likely not a good idea to write it or run it from a jupyter notebook**
Jupyter is not intended for asynchronous tasks**
It's designed to keep all cells in a notebook running in the same kernel.
This means that if one cell is running a long, asynchronous task, it will block the execution of other cells.
This can be a major problem when you're working with data that takes a long time to process, or when you're working with real-time data that needs to be updated regularly.
In these cases, it can be much better to use a tool designed for parallel computing.
Jupyter wasn’t originally intended for use on an HPC
In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows. Classic HPC runs in batches, with long running jobs through terminal access.
You can however use them together, and tools are available if you want to do this:
OpenOndemand - makes it easier to launch from HPC with requested resources.
ipython parallel (designed to integrate with MPI libraries)
dask
spark
In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. In these cases, you usually have your classic hpc jobs spawned from a jupyter session. These jobs run simultaneously with jupyter and information gets communicated between them.
Next Steps
Previous | Workshop Home |