The Jupyter Notebook is a fantastic tool for generating and interactively presenting data science projects. This post will show you how to set up Jupyter Notebooks on your system and use it for data science projects.
But first, what exactly is a “notebook”?
A notebook combines graphics, narrative prose, mathematical equations, and other rich media with code and output in a single document. To put it another way, it’s a single page where you can run code, see the results, and add explanations, formulas, and charts to make your work more transparent, repeatable, and shared.
At firms worldwide, using Notebooks is now an essential element of the data science workflow. If you want to work with data, a Notebook will streamline your process and make it easier to communicate and share your findings.
Jupyter Notebooks are also absolutely free because they are part of the open-source Project Jupyter. The software is available separately or as part of the Anaconda data science toolset.
Although you can use Jupyter Notebooks with various computer languages, this article will focus on Python because it is the most popular use case. On the contrary, R Studio is the more popular choice among R users.
Jupyter Notebooks are a helpful tool for writing and iterating on Python data analysis code. You can write several code lines and run them one time rather than writing and rewriting a complete program. Then, if you need to modify, you may return to the same window and make your changes while rerunning the program.
IPython refers to an interactive way of running Python code in the terminal using the REPL concept, the foundation of the Jupyter Notebook (Read-Eval-Print-Loop). The computations are done via the IPython Kernel, connecting with the Jupyter Notebook front-end interface. It also enables Jupyter Notebook to work in a variety of languages. Jupyter Notebooks give additional features to IPython, such as storing code and output and allowing you to keep markdown comments.
Jupyter Notebook Installation
Installing Anaconda is the most straightforward approach for newcomers to get started with Jupyter Notebooks. Anaconda is the most popular Python data science distribution, and it comes pre-installed with all of the most popular libraries and tools.
NumPy, pandas, and Matplotlib are among the most popular Python libraries included with Anaconda; however, the 1000+ list is extensive. As a result, Anaconda allows us to get right into a fully equipped data science workshop without the effort of managing several installs or worrying about dependencies or OS-specific (read: Windows-specific) installation concerns.
To obtain Anaconda, follow these steps:
- Anaconda for Python 3.9 is now available. Go to the official website & download it.
- Follow the instructions on the download page and in the executable to install Anaconda.
If you’re a more advanced user who already has Python installed and prefers to handle your packages manually, use pip:
pip3 install jupyter
Putting together your first Notebook
We’ll learn how to execute and save notebooks and familiarize ourselves with their structure and interface in this section. We’ll learn some basic vocabulary that will help you gain a practical grasp of how to utilize Jupyter Notebooks on your own.
Creating a Notebook
To start a Jupyter notebook, open your terminal and go to where you want to save your Notebook. The application will create a local server at localhost:8888 if you execute the command jupyter notebook (or another specified port).
tuts@codeunderscored$: jupyter notebook
Otherwise, you can alternatively opt to use the address it gives you to open a browser window using the Jupyter Notebook interface. Each has its token because the software employs pre-built Docker containers to put notebooks on their unique path. Hit the Control-C twice from the terminal to stop the server and shut down the kernel.
On Windows, you can start Jupyter by pressing the shortcut Anaconda adds to your start menu, which will open a new tab in your default web browser, similar to the screenshot below.
This is the Notebook Dashboard, which is dedicated to keeping track of your Jupyter Notebooks. Consider it a starting point for exploring, editing, and creating notebooks. Remember that the dashboard will only offer you access to the files and folders in Jupyter’s start-up directory (i.e., where Jupyter or Anaconda is installed). The start-up directory, on the other hand, can be altered.
The dashboard’s URL is https://localhost:8888/tree when Jupyter Notebook is open in your browser. The term “localhost” does not refer to a website but rather to the material supplied from your computer.
Jupyter’s Notebooks and dashboards are web apps, and Jupyter creates a local Python server to provide these apps to your web browser, effectively making it platform-independent and allowing for more direct online sharing. The crucial thing is that, while Jupyter Notebooks appears in your browser, it is hosted and operated locally on your computer. Until you choose to share your notebooks, they aren’t technically on the internet.)
The interface of the dashboard is mainly self-explanatory.
Jupyter Notebook
You’ve now entered the Jupyter Notebook interface, where you may view all of the files in your current location. The notebook icon next to their name distinguishes all Jupyter Notebooks. If you already have a Jupyter Notebook open in your current directory, locate it in your files list and double-click it to launch it.
How to start a new notebook
Go to New and pick Notebook – Python 3 to start a new notebook. If you have any other Jupyter Notebooks on your system that you’d like to use, click Upload and navigate to that file.
Notebooks that are currently running will have a green icon, while those that are not will have a grey icon. To see a list of all currently running notebooks, go to the Running tab.
The Notebook’s Interior
At first glance of opening a Jupyter notebook, you’ll notice that it has a cell in it.
Cells are the building blocks of notebooks, and they’re where you write your code. Click on a cell to select it, then hit SHIFT+ENTER or the play button in the toolbar above to run the code. In addition, the Cell dropdown menu offers many cell-running options, including running one cell at a time or all cells at once. The output of the cell’s code appears in the space below when you run it. To stop a section of code from running, press the stop button.
Use the addition (+) button in the toolbar to add new cells or press SHIFT+ENTER on the last cell in the Notebook. Select the cell you wish to change and go to the Edit button in the navigation bar to see your options for cutting, copying, deleting, or just editing it in general.
You can include text-only cells that use markdown to create and organize your notebooks in addition to running lines of code. It will be a Code cell when creating a new cell by default. To make a markdown-based cell, go to the Cell menu in the navigation bar, scroll down to Cell Type, and select markdown.
You may need to restart the kernel on occasion. Restart the kernel by going to the Kernel dropdown menu. You can shut down a kernel by clicking Shutdown, which will prompt you with a dialogue box asking if you want to do so. To force a shutdown, go to the File menu and select Terminate and Halt, and the browser window will close on its own. Be aware that restarting and shutting down kernels will impact your variables.
You’ll discover important information like keyboard shortcuts and links to other documentation for modules like Numpy, SciPy, and Matplotlib in the Help submenu.
There are various shortcut buttons for everyday activities on the toolbar. For example, save, create a new cell, cut chosen cells, copy selected cells, paste cells below, move selected cells up, move selected cells down, run, interrupt the kernel, restart the kernel, a dropdown to alter the cell type, and a shortcut to open the command palette are all available from left to right.
Jupyter Notebook files are automatically saved as you type. They’ll appear as a JSON file with the extension .ipynb in your directory. Jupyter Notebooks are also exported in various forms, such as HTML. Go to the File menu, scroll down to Download as, and choose the file format you want. A popup will open, asking where you want to save the new file. Save and Checkpoint when you’ve browsed to the correct directory.
What is an ipynb file, and how do I open one?
The short answer is that each .ipynb file represents a single notebook. Thus, a new .ipynb file is also created when creating a new notebook.
The lengthier explanation is that each .ipynb file is a text file that uses the JSON format to represent the contents of your Notebook. Each cell and its contents, such as picture attachments converted to text strings, are listed, along with some metadata.
You can edit this yourself if you know what you’re doing by going to the Notebook’s menu bar and selecting “Edit > Edit Notebook Metadata.” By selecting “Edit” from the dashboard controls, you may also see the contents of your notebook files. However, the term “can” is crucial. In most circumstances, manually editing your notebook metadata is unnecessary.
Shortcuts on the Keyboard
Finally, you may have noticed that when you run your cells, their border turns blue when it was green while editing. There is always one “active” cell marked with a border whose color defines its current mode in a Jupyter Notebook:
The cell has a green outline, indicating that it is in “edit mode.” Cell in “command mode” has a blue outline.
So, what can we do with a cell in command mode with it? We’ve seen how to run a cell using Ctrl + Enter so far, but there are plenty of other options. Keyboard shortcuts are the best way to use them. Because they enable a quick cell-based workflow, keyboard shortcuts are a popular feature of the Jupyter environment. When the active cell is in command mode, you can do several of these activities.
A list of Jupyter’s keyboard shortcuts is provided below. You don’t have to memorize all of them right away, but this list should give you a fair idea.
First, start by toggling between the edit and the command mode with the help of the Esc and Enter keys, respectively.
What you can do in command mode:
- Scrolling up and down the cells using both the Up and Down keys.
- Pressing the A or B keys to insert a new cell above or below the active cell.
- M transforms the currently active cell to a Markdown cell.
- Y is responsible for setting the active cell to a code cell.
- D + D or D double deletes the currently active cell.
- Z is responsible for undoing a given deletion of cells.
- Hold Shift and subsequently press Up or Down to select many cells simultaneously. After that, use Shift + M to merge your multiple cells selection.
Ctrl + Shift + -, while you are in edit mode, split the currently active cell at the cursor point. Alternatively, click Shift + Click in the margin to the left of your cells to select them.
Feel free to experiment with these in your Notebook. Create a new Markdown cell when you’re ready, and we’ll learn how to format the text in our notes.
Markdown
Markdown is a simple markup language for styling plain text that is lightweight and easy to learn. Because its syntax is identical to HTML tags, some prior knowledge would be advantageous, although it is not required.
Because you produced this article in a Jupyter notebook, you’ll notice that all of the narrative text and graphics you’ve seen so far were created using markdown.
There are three conspicuous options when attaching images:
- Make use of an online image’s URL.
- Use a local URL to a picture you’ll maintain with your Notebook, such as in the same git repository.
- Add an attachment by selecting “Edit > Insert Image”; the image will be converted to a string and stored in your notebook’s.ipynb file.
Please keep in mind that this will significantly increase the size of your .ipynb file!
There’s a lot more to markdown, especially for hyperlinking, and you may also use plain HTML. You can refer to John Gruber’s website if you find yourself stretching the limitations of the basics above.
Cells
We’ll return to kernels later, but now, let’s get a handle on cells. A notebook’s body is made up of cells. The box with the green outline in the screenshot of a new notebook in the previous section is an empty cell. We’ll look at two different types of cells:
A code cell is a block of code that the kernel will execute. When the code is run, the output is displayed underneath the code cell that generated it in the Notebook. The output is displayed when a Markdown cell is run, and the content is formatted using markdown. A code cell is always the first cell in a new notebook.
Let’s now put it to the test with a traditional print scenario:
Type print(‘Hello Codeunderscored!’) and press the run button in the cell. Select the option RUN in the toolbar above or press Ctrl + Enter.
The result will appear as follows:
print('Hello Codeunderscored!')
The cell’s output is displayed below, and the label to its left has changed from In [] to In [1].
Because the output of a code cell is part of the document, it can be seen in this article. Because that label is contained in code cells have on the left, and Markdown cells do not, you can always discern the difference between them. The label’s “In” part stands for “Input,” while the label number showed when you run the cell on the kernel — in this case, the cell was run first.
When you rerun the cell, the label will change to In [2], indicating that it was the second cell to be executed on the kernel. When we look into kernels more closely, it will become evident why this is so useful.
To create a new code cell beneath your first, click Insert and pick Insert Cell Below from the menu bar. Next, let’s try the following code to see what occurs. Is there anything you’ve noticed that’s different?
import time time.sleep(5)
This cell does not provide any output, but it does execute in five seconds. Notice how Jupyter changes the cell label to In [*] when it is currently running. In general, a cell’s output is made up of any text data that was mainly written during the cell’s execution. In addition, the value of the cell’s last line could be a single variable, a function call, or something else. Consider the following scenario:
def code_greetings(owner): return 'Hello, {}!'.format(owner) code_greetings('Codeunderscored')
You’ll find yourself using this in practically every project you work on, and we’ll see more of it in the future.
Kernels
Every Notebook has its kernel. The code in a code cell is executed within the kernel when you run it. Any output is returned to the cell that will be shown. The state of the kernel endures across time and between cells; it applies to the entire document rather than individual cells. Importing libraries or declaring variables in one cell, for example, will make them available in another. You can use the Interrupt option if your kernel becomes stuck on computation and you want to stop it.
Selecting a Kernel
You may have noticed that Jupyter allows you to alter the kernel, and there are various options to pick from. When choosing a Python version to use when creating a new notebook from the dashboard, you decide which kernel to use.
There are kernels for various Python versions and over 100 languages such as Java, C, and even Fortran. R’s and Julia’s kernels and imatlab and the Calysto MATLAB Kernel for Matlab may be of particular interest to data scientists.
Within a single notebook, the SoS kernel supports many languages. Each kernel has its own set of instructions for installation, although you’ll almost certainly need to run specific commands on your computer.
How Can you Share Your Notebooks?
When people talk about sharing their notebooks, they’re usually thinking about one of two models. Individuals frequently publish the result of their work, such as this article, which typically entails sharing non-interactive, pre-rendered versions of their notebooks. Alternatively, use version control tools such as Git or online platforms such as Google Colab to collaborate on notebooks.
What to do before sharing your Notebook
When you export or save a shared notebook, it will look exactly as it was when you last saved it, including the output of any code cells. As a result, performing a few actions before sharing your Notebook to verify that it is share-ready is crucial:
Select "Cell > All Output > Clear". Select "Kernel > Restart & Run All" command.
Patiently await completion of your code cells execution before verifying that everything went as planned. It ensures that your notebooks don’t have any intermediate output, aren’t in a stale state, and execute in the correct order when shared.
Getting Your Notebooks Out There
Jupyter has innate support for exporting to HTML, PDF, and various other formats, which you may find in the “File > Download As” menu.
This feature may be all you need if you only want to share your notebooks with a small group of people. Indeed, because many university researchers are allocated public or internal webspace, and because you may export a notebook to an HTML file, Jupyter Notebooks can be a convenient way for researchers to share their findings with their peers. If sharing exported files isn’t enough for you, there are a few other widely used means of sharing.
ipynb files are accessible more easily via the web.
GitHub
With over 1.8 million public notebooks on GitHub as of early 2018, it is unquestionably the most popular independent platform for sharing Jupyter projects with the rest of the world.
On its website, GitHub has incorporated functionality for rendering .ipynb files directly in repositories and gists. If you’re unfamiliar with GitHub, it’s a code hosting platform for version control and collaboration for Git projects. To access their services, you’ll need an account. However, regular accounts are free.
The most straightforward approach to sharing a notebook on GitHub doesn’t require Git once you have a GitHub account. Since 2008, GitHub has offered its Gist service, allowing users to store and share code snippets in their repository.
- Go to gist.github.com after logging in.
- In a text editor, open your .ipynb file, select all, and copy the JSON inside.
- In the Gist, paste the notebook JSON.
- Give your Gist a name, including the .iypnb extension, or otherwise, it won’t work.
- Select “Create secret gist” or “Create public gist” from the dropdown menu.
You’ll be able to share the URL of your public Gist with anyone, and others will be able to fork and clone your work.
This tutorial does not cover creating your Git repository and posting it on GitHub, but GitHub has many resources to help you get started. If you’re using Git, make a .gitignore exception for the hidden.ipynb checkpoints directories Jupyter creates so you don’t commit checkpoint files unnecessarily to your repo.
Nbviewer
NBViewer is outstanding as a notebook renderer on the web has rendered hundreds of thousands of notebooks every week by 2015. If you already have a place to put your Jupyter Notebooks online, such as GitHub or another service, NBViewer will display your Notebook and provide you with a shareable URL. It is supplied at nbviewer.jupyter.org as a free service as part of Project Jupyter.
NBViewer, created before GitHub’s Jupyter Notebook integration, allows anyone to enter a URL, Gist ID, or GitHub username/repo/file, and the Notebook will be shown as a webpage. The ID of a Gist is the number at the end of its URL; for example, in https://gist.github.com/username/50896401c23h675g347e89cd57e89e1wx, the string of letters after the final backslash is the ID. You’ll see a rudimentary file browser if you enter a GitHub login or username/repo, allowing you to explore a user’s repository and contents.
The URL displayed by NBViewer when showing a notebook is a constant depending on the URL of the Notebook it is rendering. So, share it with anyone, and it will work as long as the original files stay available — NBViewer doesn’t cache data for very long.
Conclusion
Jupyter Notebook files, as we’ve seen, are convenient. You can use your mouse to browse between dropdown menus and buttons or utilize keyboard shortcuts. They let you run little chunks of code at a time, store them in their present state, or restart them and have them revert to their previous state. We can use markdown to organize our notebooks cleanly, so they are presentable to others in addition to running code.