Sharing and Publishing

In this workshop we cover using GitHub for sharing your source code, Git for version control, and Quarto for publishing outputs. Specifically, we look at:

Quarto

Quarto is a publishing system that allows creating documents, presentations, websites and dashboards that contain prose, code and code outputs. This means that such outputs can detail exactly what happened to the data, and outputs can be re-generated very quickly if, for example, the underlying dataset was updated, or if the analysis needs to change.

Installing Quarto

The simplest way to install Quarto is with Anaconda.

  1. Open an “Anaconda prompt” from the navigator
  2. Run conda install conda-forge::quarto
  3. When prompted with Continue? ([y]/n), type y and press enter
  4. Restart the Spyder kernel and type
!quarto version

If you get a number, then it’s worked!

Alternatively, follow the instructions to install Quarto on your computer. Quarto is a command line tool available for all major operating systems.

Note

We can write Quarto files in Spyder, but there is (at the time of writing) no integration of it into the Spyder interface. Other IDEs make it easier to interact with Quarto functions and write Quarto files, like R Studio.

If the IPython console can’t find Quarto, try to run quarto version (without the !) in a command line interface outside Spyder (bash, macOS’s Terminal, PowerShell…). You will have to use the tool that finds Quarto to run Quarto commands later.

Create a Quarto file

Let’s try to create a document based on the visualisations we created earlier. First, create a new script. Then, save it as “document.qmd” - be sure to change the extension!

Warning

When rendering individual Quarto files, paths to files (like the one you use to import some data) will be understood as relative to the file itself. This means that, to avoid confusion, it is best to save your Quarto file at the top of your project directory, so you can use the same relative paths as in the rest of your scripts.

This new .qmd script is a Quarto Markdown file. Markdown is a formatting language which we’ll go through shortly.

At the top of our script, we need to include a header which contains the document settings

---
title: Reproducible Output
author: Your Name
date: 2025-01-01
---

The language used for the header is actually YAML (yet another markup language).

The remainder of the document uses the Markdown language, interspersed by Python chunks. Markdown works by using symbols to indicate formatting. For example, headings use hashtags # and bold text uses asterisks * *:

# Markdown example

This contains **bold** and *italicised* text. You can also ~~strikethrough~~, create

## Subheading

- unordered 
- lists

and 

1. ordered
2. lists
    1. with
    2. levels

This contains bold and italicised text. You can also strikethrough, create

Subheading

  • unordered
  • lists

and

  1. ordered
  2. lists
    1. with
    2. levels

Let’s make a simple markdown file and render it with quarto. In your file, write a simple document like the following:

# Quarto document workshop

This is a simple Quarto markdown file, which will render **formatted text**!

Then, then in your Spyder terminal, render the document to produce an HTML output:

!quarto render document.qmd

Troubleshooting

You may encounter a few issues which can be fixed:

Either you’ve misspelt the filename, you’ve put he file in a folder, or the working directory is wrong. Once you’ve checked the spelling,

If document.qmd is in a folder

Specify the filepath, not just the filename:

!quarto render folder_name/document.qmd

If the working directory is wrong

Make sure you’re in the Spyder project for intensives. You can check this by opening Spyder’s “files” pane (above the console) to see where your working directory is.

If you need to change it, the easiest way is to open the Spyder project you created in a previous workshop.

If this doesn’t work, or you want to be somewhere else, you can either create a new project or change your working directory manually with the file button in the top right.

You’re trying to import a module that Python can’t find. This is a Python problem, not a Quarto problem. Check the erroneous line, which will start with

import some_module

and check that you’ve spelt the module correctly. If it’s still broken, then you need to install the package (you don’t have it on your computer).

Installing modules for Anaconda users

If you installed the Anaconda distribution to set up Spyder (and you haven’t manually changed environments - you’d know if you did), then use the following command to install the module

conda install some_module

Installing modules with pip

If you aren’t using Anaconda then you can use the pip package to install the module. You can do so with

pip install some_module

You’ve probably forgotten the exclamation mark: make sure to use it if you’re in Spyder

!quarto render document.qmd

You’ve probably forgotten the exclamation mark: make sure to use it if you’re in Spyder

!quarto render document.qmd

Try running

!quarto help

If this returns a list of commands

Then something’s wrong with your render command. Double check it’s the same as here, and make sure nothing else is on the line.

If this returns the same error

Then you won’t be able to use Spyder, and will need to use a command prompt instead. See the note Not using Spyder below.

Something’s wrong with your Quarto installation. Check that you’ve installed it correctly and ask for help from a trainer.

You might have to install jupyter. See options for different package managers in the Quarto documentation about Python. If you’re running into this issue and using Anaconda, please ask for assistance.

Sometimes Quarto is a bit slow, particularly if you’ve got a lot to render. If you’ve waited for over one minute, and there’s a red square in the console’s top right, then ask a trainer for help. To troubleshoot, you can try the following steps:

  1. Cancel the operation (press the red square)
  2. Run !quarto help. This should be quick - if it’s not, then you might need to use a terminal, see the Not using Spyder note below.

If you’re using a different IDE, you might not be able to use !quarto ....

If you’re using Jupyter notebooks

Then you probably can. Try running

!quarto render document.qmd

and seeing if an error arises.

If you’re using an alternative or have encountered issues within Spyder

You’ll need to use a command prompt to run the Quarto commands. Give it a go yourself, and then ask a trainer for assistance if you’re having trouble.

  1. Open a command prompt / shell on your device.
  2. Navigate to your project folder with the command cd to change directories:
cd path/to/your/project/

You can find the path inside Spyder - it’s the address in the top right.

  1. Use the render command without the exclamation mark:
quarto render document.qmd

Once it’s finished, you should see a new file in your project folder: “document.html”. Open it in your browser.

Tip

Right click on the file and press “Open externally” within Spyder.

Including Python

We can include Python chunks for quarto to execute before rendering our file. Returning to the .qmd file, we fence code chunks with backticks

```{python}
print("This code will be executed")
```

Placing this chunk into our file and then rendering again with !quarto render document.qmd, you should see the code block and its output in the document, like this

print("This code will be executed")
This code will be executed

Building an example report

We’ll now look to using Python and markdown to develop a report with Quarto. Let’s start by bringing in some of the work we did yesterday.

We’ll start with set up message and import the packages and data:

## Import the data

Let's import the packages and data

```{python}
import pandas as pd
import seaborn as sns

df = pd.read_csv("Players2024.csv")
```

We can then follow by plotting it the data

## Plot the data

Let's generate that **swarm plot** again:

```{python}
sns.catplot(data = df, x = "positions", y = "height_cm")
```

We now have a plot!

Cell options

As the default Quarto output is a HTML file, we can include interactive visualisations too.

Let’s say we also want to let our readers know that they need to install Plotly in order to create interactive visualisations. If you want to show the corresponding code in your document but don’t want to run it, you can add the cell option #| eval: false. (And if you want to show the output but not show the underlying code, use #| echo: false.)

## Interactive plots

You will need the plotly module:

```{python}
#| eval: false
conda install plotly
```

An interactive plot:

```{python}
#| echo: false
import plotly.express as px
px.scatter(data_frame = df, x = "age", y = "height_cm")
```

And for adding a caption and alternative text to a figure:

```{python}
#| fig-cap: "Goalkeepers tend to be taller."
#| fig-alt: "A scatterplot of the relationsip between height and position."
import seaborn as sns
sns.catplot(data = df, x = "positions", y = "height_cm")
```

Many more cell options exist, including captioning and formatting visualisations. Note that these options can be used at the cell level as well as globally (by modifying the front matter at the top of the document).

For example, to make sure error and warning messages are never shown:

---
title: Reproducible Output
author: Your Name
date: 2025-01-01
warning: false
error: false
---

Output formats

The default output format in Quarto is HTML, which is by far the most flexible. However, Quarto is a very versatile publishing system and can generate many different output formats, including PDF, DOCX and ODT, slide formats, Markdown suited for GitHub… and even whole blogs, books and dashboards.

Let’s try rendering a PDF by including the format: pdf option at the top of the file:

---
title: Reproducible Output
author: Your Name
date: 2025-01-01
format: pdf
---

When rendering PDFs, the first issue we might run into is the lack of a LaTeX distribution. If Quarto didn’t detect one, it will suggest to install tinytex (a minimal LaTeX distribution) with:

!quarto install tinytex

Once that is installed, Quarto should render a PDF.

Another issue with our example document is that an interactive HTML visualisation won’t be renderd in the PDF. You can supress it by using the #| eval: false option:

```{python}
#| eval: false
import plotly.express as px
px.scatter(data_frame = df, x = "age", y = "height_cm")
```

Git and GitHub

Git is a version control system that allows to record a clean history of your project, track precise authorship, and collaborate asynchronously with others. It can be used offline, from the command line or with integration into Integrated Desktop Environments (like RStudio, VS Code… Unfortunately, Spyder does not have Git integration).

GitHub is one of many websites that allow you to host project that are tracked with Git. But even without using Git at all, it is possible to use GitHub to share and make your project public. Many researchers use it to make their code public alongside a published paper, to increase reproducibility and transparency. It can also be useful to build and share a portfolio of your work.

Learning about the ins and out of Git takes time, so in this section we will mainly use GitHub as a place to upload and share your code and outputs, and as a starting point for learning more about Git in the future.

GitHub

GitHub is currently the most popular place for hosting, sharing and collaborating on code. You can create an account for free, and then create a repository for your project.

  1. Create an account and log in
  2. Click on the “+” button (top right of the page)
  3. Select “New repository”
  4. Choose a name and description for your repository
  5. Tick “Add a README file” - this will be where you introduce your project
  6. Click “Create repository”

From there, you can upload your files, and edit text-based files straight from your web browser if you need to.

The README file is a markdown file that can contain the most important information about your project. It’s important to populate it as it is the first document most people see. It could contain:

  • Name and description of the project
  • How to use it (installation process if any, examples…)
  • Who is the author, who maintains it
  • How to contribute

For inspiration, see the pandas README file.

To practice managing a git repository on GitHub, try creating a personal portfolio repository where you can showcase what you have worked on and the outputs your are most proud of.

Further resources