New update of the recent second wave (or third?) of COVID-19 in Indonesia
It has been a while since i last wrote about COVID-19. Today i’d like to check out Indonesia’s statistic on COVID-19, especially since it seem to get worse these days, unfortunately, and so many people talked about possibility of the government intentionally undertest to push down new cases at the cost of human lives.
Penambahan kasus Covid-19 harian cenderung menurun. Hal ini terjadi seiring dengan turunnya jumlah pemeriksaan secara signifikan. Masih terlalu dini untuk menyimpulkan bahwa gelombang Covid-19 telah terkendali. #Humaniora #AdadiKompas @aik_arif https://t.co/eYvloMmpIC
— Harian Kompas (@hariankompas) July 21, 2021
I rely heavily on Our World in Data 1 which give free access of COVID-19 data.
Grab the data and show 6 tops
url='https://covid.ourworldindata.org/data/owid-covid-data.csv' # simpan url
df=pd.read_csv(url, parse_dates=['date']) # download dari url. parse_dates untuk menjadikan kolom date jadi tipe waktu
df.head(6) # menampilkan 10 baris paling atas
iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | excess_mortality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | NaN | NaN | 0.0 | ... | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 | NaN |
6 rows × 60 columns
I am not super familiar with its variable. So let’s check them out with df.columns
.
df.columns # untuk panggil list dari nama-nama variabel
Index(['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases',
'new_cases_smoothed', 'total_deaths', 'new_deaths',
'new_deaths_smoothed', 'total_cases_per_million',
'new_cases_per_million', 'new_cases_smoothed_per_million',
'total_deaths_per_million', 'new_deaths_per_million',
'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients',
'icu_patients_per_million', 'hosp_patients',
'hosp_patients_per_million', 'weekly_icu_admissions',
'weekly_icu_admissions_per_million', 'weekly_hosp_admissions',
'weekly_hosp_admissions_per_million', 'new_tests', 'total_tests',
'total_tests_per_thousand', 'new_tests_per_thousand',
'new_tests_smoothed', 'new_tests_smoothed_per_thousand',
'positive_rate', 'tests_per_case', 'tests_units', 'total_vaccinations',
'people_vaccinated', 'people_fully_vaccinated', 'new_vaccinations',
'new_vaccinations_smoothed', 'total_vaccinations_per_hundred',
'people_vaccinated_per_hundred', 'people_fully_vaccinated_per_hundred',
'new_vaccinations_smoothed_per_million', 'stringency_index',
'population', 'population_density', 'median_age', 'aged_65_older',
'aged_70_older', 'gdp_per_capita', 'extreme_poverty',
'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers',
'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand',
'life_expectancy', 'human_development_index', 'excess_mortality'],
dtype='object')
There’s a huge chunk of variable names! Musta been a super hard work collecting all the data. Shout out to Hannah Ritchie et al.
Aight now let’s check new cases! New cases tends to be volatile, especially if there’s seasonality in the data itself. It is quite common to see seasonality on daily data just because of weekends. Thankfully, there’s new_cases_smoothed
which I imagine take into account seasonality by plotting 7-day rolling average. I only take Indonesian data for this post.
indo=df[["iso_code","date","new_cases","new_cases_smoothed"]].query('iso_code == "IDN"')
Plot time!
sns.lineplot(data=indo,x='date',y='new_cases')
sns.lineplot(data=indo,x='date',y='new_cases_smoothed')
plt.xticks(rotation=45)
(array([18322., 18383., 18444., 18506., 18567., 18628., 18687., 18748.,
18809.]),
[Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, '')])
I try to make my own 7-day rolling average by copying codes from here
indo['cases_7day_ave'] = indo.new_cases.rolling(7).mean().shift(-3)
indo.head(10)
iso_code | date | new_cases | new_cases_smoothed | cases_7day_ave | |
---|---|---|---|---|---|
44074 | IDN | 2020-03-02 | 2.0 | NaN | NaN |
44075 | IDN | 2020-03-03 | 0.0 | NaN | NaN |
44076 | IDN | 2020-03-04 | 0.0 | NaN | NaN |
44077 | IDN | 2020-03-05 | 0.0 | NaN | 0.857143 |
44078 | IDN | 2020-03-06 | 2.0 | NaN | 2.428571 |
44079 | IDN | 2020-03-07 | 0.0 | 0.571 | 3.571429 |
44080 | IDN | 2020-03-08 | 2.0 | 0.857 | 4.571429 |
44081 | IDN | 2020-03-09 | 13.0 | 2.429 | 4.571429 |
44082 | IDN | 2020-03-10 | 8.0 | 3.571 | 9.285714 |
44083 | IDN | 2020-03-11 | 7.0 | 4.571 | 13.142857 |
Which confirms that new_cases_smoothed
is indeed 7-day rolling average.
sns.lineplot(data=indo,x='date',y='new_cases')
sns.lineplot(data=indo,x='date',y='new_cases_smoothed')
sns.lineplot(data=indo,x='date',y='cases_7day_ave')
plt.xticks(rotation=45)
plt.legend(['new cases','new cases smoothed','7-day average bikinan sendiri'])
plt.ylabel('kasus')
plt.xlabel('tanggal')
Text(0.5, 0, 'tanggal')
A year and a half is a bit too long (dear god it’s already a year and a half??), so let’s cut it to just 2021.
indo2=indo.query('date>20210101') # ambil hanya setelah 1 Januari 2021
# lalu kita plot persis seperti di atas
sns.lineplot(data=indo2,x='date',y='new_cases')
sns.lineplot(data=indo2,x='date',y='new_cases_smoothed')
plt.xticks(rotation=45)
plt.legend(['new cases','new cases smoothed','7-day average bikinan sendiri'])
plt.ylabel('kasus')
plt.xlabel('tanggal')
Text(0.5, 0, 'tanggal')
Cases is indeed seem to go down even with the smoothed one. But is this because of undertesting? We can also see it from our dataset. We add positive rate to really make sure.
indo=df[["iso_code","date","new_tests","new_tests_smoothed",
"new_cases","new_cases_smoothed","positive_rate"]].query('iso_code == "IDN"')
indo2=indo.query('date>20210101')
fig, axes = plt.subplots(1, 2, figsize=(18, 10))
fig.suptitle('Data tes baru dan positive rate Indonesia')
sns.lineplot(ax=axes[0],data=indo2,x='date',y='new_tests')
sns.lineplot(ax=axes[0],data=indo2,x='date',y='new_tests_smoothed')
axes[0].tick_params(labelrotation=45)
axes[0].legend(['new tests','new tests smoothed'])
axes[0].set_ylabel('tes baru')
axes[0].set_xlabel('tanggal')
axes[0].set_title('new cases')
sns.lineplot(ax=axes[1],data=indo2,x='date',y='positive_rate')
plt.xticks(rotation=45)
plt.ylabel('0-1')
plt.xlabel('tanggal')
axes[1].set_title('positive rate')
Text(0.5, 1.0, 'positive rate')
And yes test is indeed goes down. At the same time, positive rate seem to be trending down as well. This will depend on how testing is conducted in terms of selecting who gets to be tested and who’s not. We can be sure if we check hospitalisation and death. Unfortunately Indonesian hospitalisation number is non-existent in this dataset.
df.query('iso_code=="IDN"')[['weekly_icu_admissions','weekly_hosp_admissions']]
weekly_icu_admissions | weekly_hosp_admissions | |
---|---|---|
44074 | NaN | NaN |
44075 | NaN | NaN |
44076 | NaN | NaN |
44077 | NaN | NaN |
44078 | NaN | NaN |
... | ... | ... |
44582 | NaN | NaN |
44583 | NaN | NaN |
44584 | NaN | NaN |
44585 | NaN | NaN |
44586 | NaN | NaN |
513 rows × 2 columns
On death (Dear God, bless all the lost souls and those who they left), situation is rather gloom.
indo=df[["iso_code","date","new_deaths","new_deaths_smoothed"]].query('iso_code == "IDN"')
indo2=indo.query('date>20210101')
sns.lineplot(data=indo2,x='date',y='new_deaths')
sns.lineplot(data=indo2,x='date',y='new_deaths_smoothed')
plt.xticks(rotation=45)
plt.legend(['kematian baru','kematian baru rerata bergerak 7 hari'])
plt.ylabel('kasus')
plt.xlabel('tanggal')
Text(0.5, 0, 'tanggal')
Judging from the death data, pandemic still far from over. Note that death may follow new cases, hence have a lag in its trending down. However, if we cannot trust test data, death data is also hard to be trusted. I think with unreliable data, it is hard to react on any news really, whether cases go up or down. It is hard to make a good case for the government, because people’s like: low cases: bad data! bad testing!. High cases: Government is stupid!
So yeah. I guess it is helping if we don’t overreact over the new cases because it might not reveal the true state of Indonesian COVID-19 Pandemic situation.
What about vaccination? Judging from all of our graph up there, new cases and positive rate shot up during June-ish. What happen during that month? Delta entrance? What kind of crowdy events happen during that time? What high mobility event took place during that date? The government might let high mobility events to take place amid vaccination program has started. So let me end this blog by posting vaccination speed between countries, including Indonesia.
-
Hannah Ritchie, Esteban Ortiz-Ospina, Diana Beltekian, Edouard Mathieu, Joe Hasell, Bobbie Macdonald, Charlie Giattino, Cameron Appel, Lucas Rodés-Guirao and Max Roser (2020) - “Coronavirus Pandemic (COVID-19)”. Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/coronavirus' [Online Resource] ↩︎