Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
stylenone

...

Not everyone loves Jupyter!

shouting-development.gif

...

Why is this?

Jupyter notebooks can encourage bad software development practices.

  • Can discourage software written for modularity, reproducibility

  • Without knowing the original notebook creators environment, you cannot reproduce their their results

    • Linters (tools that check code for correctness) are difficult to use in jupyter’s disjointed cell interface.

    It’s somewhat counterintuitive

    Counterintuitive to object oriented programming and can discourage OOP practices.

    • As part of that, it can discourage modularity.

    • Classes cannot be defined across multiple cells

    Most jupyter notebooks and their output are not easily reproducible

    due to hidden cell states.Each cell is in it’s current state due to the order it was

    .

    why.gifImage Added
    • You can’t necessarily tell what order cells get run in the notebook

    . This is not always straightforward. Even if you don’t have cells that use randomization, if the original creator ran cells maintaining their states, then adjusted their order when running or skipping a cell in subsequent runs,
    • resulting in specific changes to variables. This can result in “hidden states” of a cell and it’s variables.

      • Skipping cells, or running out of order results in different cell states in individual instances of the same notebook.

      • Changes to run order are not retained once the kernel is destroyed.

      • Therefore the next person who runs it

    won’t be able to reproduce it Cells executed in different orders give you different output. You can override
      • can’t reproduce results running it from scratch.

  • You may run a notebook always skipping a cell, while the next person to run it doesn’t skip that cell. In these situations, you’ll end up with different output.

    • Overriding the linear run of cells in jupyter

    .
  • Being able to run snippets of code in arbitrary order can be unintuitive.

  • Jupyter version control can be an issue.
    • is one of it’s best characteristics. It is also one of it’s biggest weaknesses.

    • Without knowing the original notebook creators environment, and installed software stack, it’s difficult to reproduce their results

    Jupyter lacks version control

    • There’s no easy way to determine whether a cell has been edited and when.

    There are great things about Jupyter

    ...

    Encourages well documented/commented code

    ...

    Great visualization

    ...

    Jupyter can be slowwwwwwww….

    • Jupyter is an interactive tool. It must therefore to load the entire notebook in memory in order to provide the interactive features.

    • If you're working with extremely large data sets or large notebooks, this can become a problem.

    • Jupyter is not designed to be used with extremely large data sets.

    ...

    We can’t tell you what to use, but can offer some best practices

    While we may recommend best practices, and provide reasoning, the tools you use for your research are entirely up to you.

    Over the long term, this boils down to discipline and using the right tools for the right jobs.
    **Generally, these rules will often apply, but one cannot predict every use case:

    • Jupyter is probably not the right tool to share application code**

    • Jupyter was built for collaboration, communication, and interactivity. It is not meant to running critical code**

    • It's convenient to show ideas to others, or experiment with your code, or for draft code

      • Once you’re done experimenting, code it properly in a serious application if you plan to use the code in production**

    • If the code needs to run for a long time, it’s likely not a good idea to write it or run it from a jupyter notebook**

    • Jupyter is not intended for asynchronous tasks**

      • It's designed to keep all cells in a notebook running in the same kernel.

      • This means that if one cell is running a long, asynchronous task, it will block the execution of other cells.

      • This can be a major problem when you're working with data that takes a long time to process, or when you're working with real-time data that needs to be updated regularly.

      • In these cases, it can be much better to use a tool designed for parallel computing.

    ...

    Jupyter wasn’t originally intended for use on an HPC

    ...

    Previous

    Dive into Jupyter Notebooks

    Workshop Home 

    Starting Jupyter with OnDemand

    Use the following link to provide feedback on this training: https://forms.gle/qBBwXpKeTNqSR5516 or use the QR code below.

    ...