The aim of this blog post is to show the evolution of COVID numbers in a way that captures the global behaviour of the epidemic and lets us answer key questions like:

  • Are the current measures working?
  • Is the epidemic accelerating or slowing down?

Far from sentionalist headlines or opinions dismissing SARS-CoV-2 as just another flu, we try to extract reliable patterns from the sometimes erratic data which comes of stretched health and testing systems.

This project was started in April 2020, analysing the data for France as part of data against covid19 citizen's initiative. I aim to keep updating the data in this article every week along with the commentary of the dynamic of the epidemic.

import pandas as pd
import numpy as np
from uk_covid19 import Cov19API
import datetime
from matplotlib import pyplot as plt
from IPython.display import display, Markdown

You can find and inspect the code used for visualisation at the following address: payoto/covid19-viz.

import viz

Updates

06/01/2020

All indicators show the prevalence of each state of the disease is increasing at 5% per day (doubling time of 2 weeks).

Data access

For this analysis, the government's API for COVID-19 data is used to access data, as installed from pypi.

all_nations = [
    "areaType=nation"
]
cases_and_deaths = {
    "date": "date",
    "maille_code": "areaName",
    "cas_confirmes": "cumCasesByPublishDate",
    # prefer publish date for deaths as the reporting lag gives a false impression of improvement
    "deces":"cumDeaths28DaysByPublishDate",  
    "reanimation": "covidOccupiedMVBeds",
    "hospitalises": "hospitalCases"
}

api = Cov19API(
    filters=all_nations,
    structure={c : c for c in cases_and_deaths.values()}
)
df = api.get_dataframe()
df = df.rename(
    {v : k for k, v in cases_and_deaths.items()},
    axis="columns"
)
df = df.iloc[::-1]

Our analysis is based on my previous visualisation of French data, in that respect we get it in the same format.

df = viz.enable_time_series_plot(
    df, timein_field="date", timeseries_field_out="t"
)
df["source_nom"] = "PHE"
df["maille_nom"] = df["maille_code"]

Latest data for England

In this section we look at the processed data for England. In the first plot on the left the raw data is presented, then the data smoothed over is smoothed using a moving average over 7 days. On the right the number on the vertical axis is the daily change as a proportion of the total.

maille_active = 'England'
data = viz.oc19_data_preproc(
    df, maille_active,
    rows=["t", "deces", "reanimation", "hospitalises", "cas_confirmes"],
    no_negatives=["deces"],
)
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
data.tail()
deces reanimation hospitalises cas_confirmes reanimation_cumul hospitalises_cumul deces_jour deces_jour_jour reanimation_jour reanimation_jour_jour ... reanimation_solde_vivant_jour_jour_mma deces_jour_mma_jour deces_jour_prop reanimation_jour_prop hospitalises_jour_prop cas_confirmes_jour_prop reanimation_cumul_jour_prop hospitalises_cumul_jour_prop reanimation_solde_vivant_jour_prop deces_jour_mma_jour_prop
t
2021-01-02 65080.0 2017.0 23557.0 2239950.0 221404.0 2145654.0 349.0 -264.0 77.0 36.0 ... 27.142857 24.857143 0.007966 0.036759 0.031577 0.019283 0.008153 0.010188 0.293789 0.047947
2021-01-03 65472.0 2181.0 24957.0 2286803.0 223585.0 2170611.0 392.0 43.0 164.0 87.0 ... 36.000000 20.857143 0.008237 0.040938 0.032513 0.020214 0.008472 0.010445 0.288203 0.038675
2021-01-04 65842.0 2310.0 26626.0 2339983.0 225895.0 2197237.0 370.0 -22.0 129.0 -35.0 ... 13.142857 6.857143 0.008295 0.041373 0.033265 0.020773 0.008809 0.010721 0.277798 0.012556
2021-01-05 66626.0 2378.0 26467.0 2394923.0 228273.0 2223704.0 784.0 414.0 68.0 -61.0 ... 57.857143 60.571429 0.009106 0.039048 0.025261 0.020760 0.009124 0.010894 0.294185 0.099835
2021-01-06 67510.0 2378.0 26467.0 2450983.0 230651.0 2250171.0 884.0 100.0 0.0 -68.0 ... -23.000000 -5.000000 0.008913 0.031479 0.020262 0.021014 0.009354 0.011005 0.284513 -0.008310

5 rows × 43 columns

def plots_maille_code(maille_active='England', **kwargs):
    fra = viz.oc19_data_preproc(
        df, maille_active,
        rows=["t", "deces", "reanimation", "hospitalises", "cas_confirmes"],
        no_negatives=["deces"],
    )
    plt.close()
    # plot_field_loops(fra, "deces_ehpad", center=False, maille_active=maille_active)
    viz.plot_field_loops(fra, "hospitalises_cumul", [7], center=True, maille_active=maille_active, **kwargs)
    viz.plot_field_loops(fra, "reanimation_cumul", [7], center=True, maille_active=maille_active, **kwargs)
    viz.plot_field_loops(fra, "deces", center=True, maille_active=maille_active, **kwargs)
    # if maille_active == "FRA":
    plt.show()
    display(Markdown(
        f"For {maille_active} the number of cases can be analysed. Contrary to to other quantities"
        + " the number of cases is avareged over 14 days with a triangular window."
    ))
    viz.plot_field_loops(
        fra, "cas_confirmes", [14], center=True, maille_active=maille_active,
        win_type='triang', **kwargs
    )
    return fra

The next analysis we look at are trend lines which indicate the acceleration of the disease: the daily increase of a disease state (hospitalised, ICU, deceased or number of positive test) is plotted against their total number. Numbers above 0 indicate the SARS-CoV-2 is progressing, while numbers below indicate a reduction.

The raw data is smoothed with a rolling average of 7 days as most indicators have strong weekly periodicity.

These plots evolve by starting from the origin and go towards the top right of the graph. As the disease is brought under controll the lines curve towards the bottom until they cross the x axis and march to the left. Each wave of the disease manifests itself as a loop on the graph.

_ = plots_maille_code(maille_active='England')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

For England the number of cases can be analysed. Contrary to to other quantities the number of cases is avareged over 14 days with a triangular window.

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
_ = plots_maille_code(maille_active='Wales')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

For Wales the number of cases can be analysed. Contrary to to other quantities the number of cases is avareged over 14 days with a triangular window.

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
_ = plots_maille_code(maille_active='Scotland')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

For Scotland the number of cases can be analysed. Contrary to to other quantities the number of cases is avareged over 14 days with a triangular window.

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

Rough estimate of R

Using the number of cases it is possible to arrive at a rough estimate of the reproduction number of the disease the now famous: $R_t$.

The calculation is fairly simple we first calculate the number of infectious people on a given day by saying that people are contagious if they have tested positive in the last 24h or if they will test positive in the next 48h. This number of infectious people is compared to the number of people that will be infected in 5 days.

eng, _ = viz.oc19_data_preparation(
    df, "England",
    rows=["t", "deces", "reanimation", "hospitalises", "cas_confirmes"],
    no_negatives=["deces"],
)

eng = eng.loc["2020-07-15" < eng.index, :]

viz.plot_R(
    eng,
    infectious_period_around_test = [-1, 2],
    infect_to_test_base = -5,
)
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

The period up to octover shows a very high reproduction rate, this is certainly due to a ramping up of the test capacity.

Looking in more detail at the period since the start of October we see a brief dip below 1 during the second lockdown.

viz.plot_R(
    eng.loc["2020-10-15" < eng.index, :],
    infectious_period_around_test = [-1, 2],
    infect_to_test_base = -5,
)
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

At the moment these are draft indicators and the exact offset of the dates has not been calculated.