...
Table of Contents | ||
---|---|---|
|
...
Not everyone loves Jupyter!
...
Why is this?
If you only learn to program in Jupyter notebooks, it’s possible that you’ll develop bad coding practices.
Jupyter notebooks can encourage bad software development practices. |
|
---|---|
Counterintuitive to object oriented programming and can discourage OOP practices. |
|
Most jupyter notebooks and their output are not easily reproducible. |
|
|
You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. You’ll end up with different output.
Cells executed in different orders give you different output. You can override the linear run of cells in jupyter.
There are great things about Jupyter
...
Encourages well documented/commented code
...
Great visualization
...
| |
Jupyter lacks version control |
|
---|---|
Jupyter can be slowwwwwwww…. |
|
...
We can’t tell you what to use, but can offer some best practices
While we may recommend best practices, and provide reasoning, the tools you use for your research are entirely up to you.
Over the long term, this boils down to discipline and using the right tools for the right jobs.
**Generally, these rules will often apply, but one cannot predict every use case:
Jupyter is probably not the right tool to share application code**
Jupyter was built for collaboration, communication, and interactivity. It is not meant to running critical code**
It's convenient to show ideas to others, or experiment with your code, or for draft code
Once you’re done experimenting, code it properly in a serious application if you plan to use the code in production**
If the code needs to run for a long time, it’s likely not a good idea to write it or run it from a jupyter notebook**
Jupyter is not intended for asynchronous tasks**
It's designed to keep all cells in a notebook running in the same kernel.
This means that if one cell is running a long, asynchronous task, it will block the execution of other cells.
This can be a major problem when you're working with data that takes a long time to process, or when you're working with real-time data that needs to be updated regularly.
In these cases, it can be much better to use a tool designed for parallel computing.
...
Jupyter wasn’t originally intended for use on an HPC
In many ways, HPC computations and Jupyter notebooks don’t suit each other’s strengths. Their use cases and original intentions are very different. Jupyter Notebooks can be powerful development and collaboration tools, but they often aren’t suitable for long-running, computationally intensive workflows. Classic HPC runs in batches, with long running jobs through terminal access.
You can however use them together, and tools are available if you want to do this:
OpenOndemand - makes it easier to launch from HPC with requested resources.
ipython parallel (designed to integrate with MPI libraries)
dask
spark
In some cases the tools end up being more of a “workaround” and don’t really allow your computation to be run as one job inside the notebook. Instead what you may have is two separate jobs running simultaneously with information In these cases, you usually have your classic hpc jobs spawned from a jupyter session. These jobs run simultaneously with jupyter and information gets communicated between them.
...
Previous | Workshop Home Starting Jupyter with OnDemand |
Use the following link to provide feedback on this training: https://forms.gle/qBBwXpKeTNqSR5516 or use the QR code below.
...