Animated Visualizations in Python using Matplotlib

Visualizations are a great way to convey insights from data. Animated visualizations are, in suitable cases, more engaging and better at storytelling than their static counterparts. This article is a walk-through of animating cumulative confirmed COVID-19 cases in each country per day. The final result looks something like this:

This animation shows the total number of COVID-19 cases till date from 22nd Jan 2020 to 10th May 2020, showing the top ten countries with confirmed cases. After this walk-through, we’ll end up with a boilerplate code that can be used to animate other types of visualizations(say, histograms or pie-charts) and different data sets.

Getting Data

For this visualization, I chose this dataset(covid_19_clean_complete.csv) on COVID-19 cases, which is already extracted and cleansed. It is maintained by an individual by scraping various official sources and putting it together in proper form. It should not be considered as accurate as the official sources, but for our animation it’ll suffice.

Importing necessary libraries

We’ll be using the following libraries for formatting the data and plotting.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
%matplotlib widget

%matplotlib is an IPython magic function that lets us set the graphing backend that matplotlib uses. Animation support depends on the backend you’re using, make sure to use a backend that supports animation. Usually, if you’re using Jupyter notebook, use %matplotlib notebook and if on Jupyter lab, use %matplotlib widget.

Structure of data

Reading the .csv file obtained into a pandas dataframe, and printing the first five rows using pd.read_csv gives us:

pd.read_csv('covid_19_clean_complete.csv',parse_dates=['Date']).head()

Province/State Country/Region Lat Long Date Confirmed Deaths Recovered
0 NaN Afghanistan 33.0000 65.0000 2020-01-22 0 0 0
1 NaN Albania 41.1533 20.1683 2020-01-22 0 0 0
2 NaN Algeria 28.0339 1.6596 2020-01-22 0 0 0
3 NaN Andorra 42.5063 1.5218 2020-01-22 0 0 0
4 NaN Angola -11.2027 17.8739 2020-01-22 0 0 0
Each row is the cumulative number of Confirmed, Recovered, and Deaths on that particular Date at the given Province/State.
We are interested in top 10 countries by Confirmed cases on each particular date. We can use the pandas groupby and aggregate function to get total confirmed cases per day in each country as:

#read the dataset
covid_df = pd.read_csv('covid_19_clean_complete.csv',parse_dates=['Date'])
# Total by date and country
covid_df = covid_df.groupby(['Date','Country/Region']).agg({'Confirmed':np.sum})
# Get Top 10 Contries based on confirmed cases per date
covid_df = covid_df.groupby('Date')['Confirmed'].nlargest(10).reset_index(level=1,drop=True)

covid_df is now a Series with two indices(date and country). Getting the first 20 entries using covid_df.head(20) gives:

Date        Country/Region
2020-01-22  China             548
            Japan               2
            Thailand            2
            South Korea         1
            Taiwan*             1
            US                  1
            Afghanistan         0
            Albania             0
            Algeria             0
            Andorra             0
2020-01-23  China             643
            Thailand            3
            Japan               2
            Vietnam             2
            Singapore           1
            South Korea         1
            Taiwan*             1
            US                  1
            Afghanistan         0
            Albania             0
Name: Confirmed, dtype: int64

This is the data we need for the visualization. With the data done, let’s move on to plotting.

Visualization

As seen on the introduction video, we’re planning to plot a horizontal bar chart of the top ten countries based on cumulative confirmed COVID-19 cases. First we’ll plot the data for a single day, then extend it as an animation showing the change on each successive day.

Choosing a color for a country

Since we have several countries and the bars will be changing and switching positions as the animation progresses, it makes sense to give each country a particular color to be used while plotting the bar chart. We can use matplotlib’s colormap as follows for that:

#Country is the second level index of_covid_df
countries = np.array([ country[1] for country in covid_df.index])
countries = np.unique(countries)  #Countries can be repeated on different Dates
cmap = plt.get_cmap('tab20')
colors = cmap(np.linspace(0, 1, len(countries)))
color_dict = dict(zip(countries,colors))

There are some predefined colormaps, I chose to go with tab20. A list of which can be found here. After running the above code, color_dict is a python dictionary that has colors corresponding to each country’s name in covid_df. We will use it later when plotting the bar charts.

Plotting a bar chart

Let’s plot a horizontal bar chart for the date 4/22/20. We’ll be using pyplot’s barh function.

plt.figure()
date = pd.to_datetime('4/22/20',format="%m/%d/%y")
xvals = covid_df.loc[date].index
data = covid_df.loc[date].values
bars = plt.barh(xvals,data,color=[ color_dict[country] for country in xvals])
plt.suptitle('Cumulative Confirmed Covid-19 Cases')
plt.title(date.strftime("%d %b %Y"))
ax = plt.gca()
ax.invert_yaxis()

The first two positional arguments of barh are the x-values(in our case, the top 10 countries) and corresponding y-values(number of confirmed cases). Corresponding colors(from color_dict) of each country in x-value is given as array to color argument.

Output of the above code snippet

Output of the above code snippet

Improving our bar chart

Before moving on to animating the visualization, let’s make a few more improvements. The improvements we’ll make are: removing the borders, removing the tick marks, and making the values on the beside the bar itself rather than on the x-axis below.

# Removing borders
for spine in ax.spines.values():
    spine.set_visible(False)
# Removing Tickmarks and values in X-axis
plt.tick_params(left=False, bottom=False, labelbottom=False)
# Labelling The bars directly
for bar in bars:
    ax.text(bar.get_width(), bar.get_y() + bar.get_height()/2, '  ' + str(bar.get_width()), va='center')

After this, the chart looks like this:

Animating the plot

For animating the plot, we will use matplotlib’s FuncAnimation function. FuncAnimation animates by repeatedly calling an update function. We will write the update function such that it will render the bar chart for successive days based on the index given to it. We’ll be reusing the code for single bar chart of the previous section.

# Each uniuqe date we have in covid_df
dates = np.sort(np.unique(covid_df.index.get_level_values(level=0)))
n = len(dates)

def update(curr):
    if curr == n:
        # Last date we have in covid_df
        a.event_source.stop()
    plt.cla()
    date = dates[curr]
    xvals = covid_df.loc[date].index
    data = covid_df.loc[date].values
    bars = plt.barh(xvals,data,color=[ color_dict[country] for country in xvals])
    plt.suptitle('Cumulative Confirmed Covid-19 Cases')
    plt.title(pd.to_datetime(date).strftime("%d %b %Y"))
    ax = plt.gca()
    ax.invert_yaxis()
    # Removing borders
    for spine in ax.spines.values():
        spine.set_visible(False)
    # Removing Tickmarks and values in X-axis
    plt.tick_params(left=False, bottom=False, labelbottom=False)
    # Labelling The bars directly
    for bar in bars:
        ax.text(bar.get_width(), bar.get_y() + bar.get_height()/2, '  ' + str(bar.get_width()), va='center')
fig = plt.figure(figsize=[11,5]) #Adjusting margins
a = animation.FuncAnimation(fig, update, interval=100, frames=n,repeat=False)

Each date we have from covid_df is stored in the dates array and sorted. FuncAnimation repeatedly calls our update function with loop number as parameter(curr). Therefore on each call, the bar chart corresponding to Date is called and incremented. Interval is the time delay between each calls in ms. That gives our output:

We can use a.save('filename.mp4') for saving the animation as a video, more on that is available here. The update function above can be treated as a boilerplate for animating different types of plots, replace barh with any other suitable plotting functions. View the complete code in this Jupyter Notebook or download it. It is also possible to create simple interactive visualizations having sliders and buttons using matplotlib, I’ll save that for another post.


Data Science
pythondata-visualizationmatplotlib