1. Background Information

This analysis aims to analyse the spatio-temporal patterns of monthly cumulative confirmed COVID-19 rate and death cases rate at keluruhan (sub-district) level of Daerah Khusus Ibukota Jakarta (DKI Jakarta). In other words, this analysis is done to detect which sub-districts had a relatively higher number of confirmed cases and death cases rates and how they changed over time. Additionally, the temporal interval will be at the last day of the month while the geographic will be at the sub-district.

2. Dataset

The data set used for this analysis includes:

Aspatial: Daily updates of COVID-19 statistics in DKI Jakarta at https://riwayat-file-covid-19-dki-jakarta-jakartagis.hub.arcgis.com. The cumulative data is available in the “data” worksheet.
Geospatial: Shapefile (SHP) Batas Desa Provinsi DKI Jakarta provided at
PODES 2019. There are other geospatial data in ESRI shapefile format at different geographical levels available at Indonesia Geospatial. But for the purpose of this analysis, only SHP Batas Desa Provinsi DKI Jakarta will be used.

Before we proceed with the analysis, there were some issues found when downloading the aspatial dataset which we need to take note of:

Although we are expected to download 19 files from 31 January 2020 - 31 July 2021 (19 months), the link only has data files starting from 25 March 2020 onwards.
Additionally, the link for 31st January 2021 Riwayat File Covid-19 DKI Jakarta was inaccessible.

Hence, the remedy is to:

Take data files starting from 31st March 2020 instead
For January 2021, use the data file for 30th January 2021 instead.
Altogether, instead of 19 files, there are a total of 17 files to be downloaded.

3. Install and Load R packages

This code chunk performs 3 tasks:

A list called packages will be created and will consists of all the R packages required to accomplish this hands-on exercise.
Check if R packages on package have been installed in R and if not, they will be installed.
After all the R packages have been installed, they will be loaded.

packages <- c('sf', 'tidyverse', 'tmap', 'kableExtra', 'ggpubr', 'devtools', 'gifski')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

devtools::install_github("gadenbuie/xaringanExtra")
library(xaringanExtra)

4. Importing Geospatial Data

4.1 Importing polygon feature data in shapefile format

Code Chunk

dkij_sf <- st_read("data/geospatial", layer="BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")

Reading layer `BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA' from data source `C:\aisyahajit2018\IS415\IS415_blog\_posts\2021-09-10-take-home-exercise-1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 269 features and 161 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 106.3831 ymin: -6.370815 xmax: 106.9728 ymax: -5.184322
Geodetic CRS:  WGS 84

Glimpse

glimpse(dkij_sf)

Rows: 269
Columns: 162
$ OBJECT_ID  <dbl> 25477, 25478, 25397, 25400, 25378, 25379, 25390, ~
$ KODE_DESA  <chr> "3173031006", "3173031007", "3171031003", "317103~
$ DESA       <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMPAKA ~
$ KODE       <dbl> 317303, 317303, 317103, 317103, 310101, 310101, 3~
$ PROVINSI   <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", "DKI~
$ KAB_KOTA   <chr> "JAKARTA BARAT", "JAKARTA BARAT", "JAKARTA PUSAT"~
$ KECAMATAN  <chr> "TAMAN SARI", "TAMAN SARI", "KEMAYORAN", "KEMAYOR~
$ DESA_KELUR <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMPAKA ~
$ JUMLAH_PEN <dbl> 21609, 9069, 29085, 41913, 6947, 7059, 15793, 589~
$ JUMLAH_KK  <dbl> 7255, 3273, 9217, 13766, 2026, 2056, 5599, 1658, ~
$ LUAS_WILAY <dbl> 0.36, 0.37, 0.53, 0.97, 0.93, 0.95, 1.76, 1.14, 0~
$ KEPADATAN  <dbl> 60504, 24527, 54465, 42993, 7497, 7401, 8971, 515~
$ PERPINDAHA <dbl> 102, 25, 131, 170, 17, 26, 58, 13, 113, 178, 13, ~
$ JUMLAH_MEN <dbl> 68, 52, 104, 151, 14, 32, 36, 10, 60, 92, 5, 83, ~
$ PERUBAHAN  <dbl> 20464, 8724, 27497, 38323, 6853, 6993, 15006, 580~
$ WAJIB_KTP  <dbl> 16027, 7375, 20926, 30264, 4775, 4812, 12559, 398~
$ SILAM      <dbl> 15735, 1842, 26328, 36813, 6941, 7057, 7401, 5891~
$ KRISTEN    <dbl> 2042, 2041, 1710, 3392, 6, 0, 3696, 0, 4058, 5130~
$ KHATOLIK   <dbl> 927, 1460, 531, 1082, 0, 0, 1602, 0, 2100, 2575, ~
$ HINDU      <dbl> 15, 9, 42, 127, 0, 0, 622, 0, 25, 27, 0, 9, 115, ~
$ BUDHA      <dbl> 2888, 3716, 469, 495, 0, 2, 2462, 0, 4134, 4740, ~
$ KONGHUCU   <dbl> 2, 1, 5, 1, 0, 0, 10, 0, 9, 10, 0, 4, 1, 1, 4, 0,~
$ KEPERCAYAA <dbl> 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 2, 0, 22, 0, 3, ~
$ PRIA       <dbl> 11049, 4404, 14696, 21063, 3547, 3551, 7833, 2954~
$ WANITA     <dbl> 10560, 4665, 14389, 20850, 3400, 3508, 7960, 2937~
$ BELUM_KAWI <dbl> 10193, 4240, 14022, 20336, 3366, 3334, 7578, 2836~
$ KAWIN      <dbl> 10652, 4364, 13450, 19487, 3224, 3404, 7321, 2791~
$ CERAI_HIDU <dbl> 255, 136, 430, 523, 101, 80, 217, 44, 381, 476, 3~
$ CERAI_MATI <dbl> 509, 329, 1183, 1567, 256, 241, 677, 220, 1197, 9~
$ U0         <dbl> 1572, 438, 2232, 3092, 640, 648, 802, 585, 2220, ~
$ U5         <dbl> 1751, 545, 2515, 3657, 645, 684, 995, 588, 2687, ~
$ U10        <dbl> 1703, 524, 2461, 3501, 620, 630, 1016, 513, 2653,~
$ U15        <dbl> 1493, 521, 2318, 3486, 669, 671, 1106, 548, 2549,~
$ U20        <dbl> 1542, 543, 2113, 3098, 619, 609, 1081, 491, 2313,~
$ U25        <dbl> 1665, 628, 2170, 3024, 639, 582, 1002, 523, 2446,~
$ U30        <dbl> 1819, 691, 2363, 3188, 564, 592, 1236, 478, 2735,~
$ U35        <dbl> 1932, 782, 2595, 3662, 590, 572, 1422, 504, 3034,~
$ U40        <dbl> 1828, 675, 2371, 3507, 480, 486, 1200, 397, 2689,~
$ U45        <dbl> 1600, 607, 2250, 3391, 421, 457, 1163, 365, 2470,~
$ U50        <dbl> 1408, 619, 1779, 2696, 346, 369, 1099, 288, 2129,~
$ U55        <dbl> 1146, 602, 1379, 1909, 252, 318, 979, 235, 1843, ~
$ U60        <dbl> 836, 614, 1054, 1397, 197, 211, 880, 162, 1386, 1~
$ U65        <dbl> 587, 555, 654, 970, 122, 114, 747, 111, 958, 932,~
$ U70        <dbl> 312, 311, 411, 631, 69, 55, 488, 65, 554, 573, 38~
$ U75        <dbl> 415, 414, 420, 704, 74, 61, 577, 38, 717, 642, 37~
$ TIDAK_BELU <dbl> 3426, 1200, 4935, 7328, 1306, 1318, 2121, 973, 50~
$ BELUM_TAMA <dbl> 1964, 481, 2610, 3763, 730, 676, 1278, 732, 3241,~
$ TAMAT_SD   <dbl> 2265, 655, 2346, 2950, 1518, 2054, 1169, 1266, 44~
$ SLTP       <dbl> 3660, 1414, 3167, 5138, 906, 1357, 2236, 852, 585~
$ SLTA       <dbl> 8463, 3734, 12172, 16320, 2040, 1380, 5993, 1570,~
$ DIPLOMA_I  <dbl> 81, 23, 84, 179, 22, 15, 43, 36, 85, 83, 4, 63, 2~
$ DIPLOMA_II <dbl> 428, 273, 1121, 1718, 101, 59, 573, 97, 604, 740,~
$ DIPLOMA_IV <dbl> 1244, 1241, 2477, 4181, 314, 191, 2199, 357, 1582~
$ STRATA_II  <dbl> 74, 46, 166, 315, 10, 8, 168, 8, 63, 92, 5, 174, ~
$ STRATA_III <dbl> 4, 2, 7, 21, 0, 1, 13, 0, 3, 9, 0, 16, 8, 7, 75, ~
$ BELUM_TIDA <dbl> 3927, 1388, 5335, 8105, 1788, 1627, 2676, 1129, 5~
$ APARATUR_P <dbl> 81, 10, 513, 931, 246, 75, 156, 160, 132, 79, 23,~
$ TENAGA_PEN <dbl> 70, 43, 288, 402, 130, 93, 81, 123, 123, 73, 45, ~
$ WIRASWASTA <dbl> 8974, 3832, 10662, 14925, 788, 728, 6145, 819, 12~
$ PERTANIAN  <dbl> 1, 0, 1, 3, 2, 2, 1, 3, 2, 5, 1, 1, 0, 0, 2, 5, 2~
$ NELAYAN    <dbl> 0, 0, 2, 0, 960, 1126, 1, 761, 1, 2, 673, 0, 0, 0~
$ AGAMA_DAN  <dbl> 6, 6, 5, 40, 0, 0, 49, 2, 10, 11, 0, 54, 15, 16, ~
$ PELAJAR_MA <dbl> 4018, 1701, 6214, 9068, 1342, 1576, 3135, 1501, 6~
$ TENAGA_KES <dbl> 28, 29, 80, 142, 34, 26, 60, 11, 48, 55, 16, 68, ~
$ PENSIUNAN  <dbl> 57, 50, 276, 498, 20, 7, 59, 14, 56, 75, 2, 97, 5~
$ LAINNYA    <dbl> 4447, 2010, 5709, 7799, 1637, 1799, 3430, 1368, 7~
$ GENERATED  <chr> "30 Juni 2019", "30 Juni 2019", "30 Juni 2019", "~
$ KODE_DES_1 <chr> "3173031006", "3173031007", "3171031003", "317103~
$ BELUM_     <dbl> 3099, 1032, 4830, 7355, 1663, 1704, 2390, 1213, 5~
$ MENGUR_    <dbl> 4447, 2026, 5692, 7692, 1576, 1731, 3500, 1323, 7~
$ PELAJAR_   <dbl> 3254, 1506, 6429, 8957, 1476, 1469, 3185, 1223, 6~
$ PENSIUNA_1 <dbl> 80, 65, 322, 603, 24, 8, 70, 20, 75, 97, 2, 132, ~
$ PEGAWAI_   <dbl> 48, 5, 366, 612, 223, 72, 65, 143, 73, 48, 15, 89~
$ TENTARA    <dbl> 4, 0, 41, 57, 3, 0, 74, 1, 20, 12, 2, 11, 90, 340~
$ KEPOLISIAN <dbl> 10, 1, 16, 42, 11, 8, 2, 9, 17, 7, 3, 9, 165, 15,~
$ PERDAG_    <dbl> 31, 5, 1, 3, 6, 1, 2, 4, 3, 1, 4, 0, 1, 2, 9, 2, ~
$ PETANI     <dbl> 0, 0, 1, 2, 0, 1, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0~
$ PETERN_    <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ NELAYAN_1  <dbl> 1, 0, 1, 0, 914, 1071, 0, 794, 0, 1, 663, 0, 0, 0~
$ INDUSTR_   <dbl> 7, 3, 4, 3, 1, 3, 0, 0, 1, 7, 0, 0, 2, 2, 1, 3, 1~
$ KONSTR_    <dbl> 3, 0, 2, 6, 3, 8, 1, 6, 1, 5, 10, 0, 2, 5, 7, 4, ~
$ TRANSP_    <dbl> 2, 0, 7, 4, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 6, 3, 2~
$ KARYAW_    <dbl> 6735, 3034, 7347, 10185, 237, 264, 4319, 184, 940~
$ KARYAW1    <dbl> 9, 2, 74, 231, 4, 0, 16, 1, 13, 10, 1, 24, 17, 29~
$ KARYAW1_1  <dbl> 0, 0, 5, 15, 0, 0, 0, 1, 0, 1, 0, 0, 2, 4, 7, 9, ~
$ KARYAW1_12 <dbl> 23, 4, 25, 35, 141, 50, 16, 157, 6, 9, 40, 11, 11~
$ BURUH      <dbl> 515, 155, 971, 636, 63, 218, 265, 55, 1085, 652, ~
$ BURUH_     <dbl> 1, 0, 0, 0, 2, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 2, 1~
$ BURUH1     <dbl> 0, 0, 1, 0, 1, 25, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, ~
$ BURUH1_1   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PEMBANT_   <dbl> 1, 1, 4, 1, 1, 0, 7, 0, 5, 1, 0, 6, 1, 10, 11, 9,~
$ TUKANG     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0~
$ TUKANG_1   <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ TUKANG_12  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ TUKANG__13 <dbl> 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0~
$ TUKANG__14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ TUKANG__15 <dbl> 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0~
$ TUKANG__16 <dbl> 7, 4, 10, 14, 0, 0, 2, 0, 7, 8, 0, 8, 1, 0, 3, 2,~
$ TUKANG__17 <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0~
$ PENATA     <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0~
$ PENATA_    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PENATA1_1  <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0~
$ MEKANIK    <dbl> 11, 1, 10, 8, 0, 0, 4, 0, 7, 8, 0, 9, 0, 15, 10, ~
$ SENIMAN_   <dbl> 4, 0, 12, 28, 0, 0, 2, 0, 3, 4, 0, 9, 6, 7, 14, 1~
$ TABIB      <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0~
$ PARAJI_    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PERANCA_   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 1, 0, 1~
$ PENTER_    <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2~
$ IMAM_M     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PENDETA    <dbl> 2, 4, 5, 33, 0, 0, 20, 0, 10, 8, 0, 30, 14, 14, 1~
$ PASTOR     <dbl> 0, 1, 0, 1, 0, 0, 8, 0, 0, 0, 0, 23, 0, 0, 0, 0, ~
$ WARTAWAN   <dbl> 7, 1, 16, 27, 0, 0, 4, 0, 8, 6, 0, 9, 5, 9, 26, 3~
$ USTADZ     <dbl> 6, 1, 1, 5, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0~
$ JURU_M     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0~
$ PROMOT     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ ANGGOTA_   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 2, 1, 2~
$ ANGGOTA1   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0~
$ ANGGOTA1_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PRESIDEN   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ WAKIL_PRES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1~
$ ANGGOTA1_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0~
$ ANGGOTA1_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0~
$ DUTA_B     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1~
$ GUBERNUR   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1~
$ WAKIL_GUBE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ BUPATI     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ WAKIL_BUPA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ WALIKOTA   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ WAKIL_WALI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ ANGGOTA1_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1~
$ ANGGOTA1_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ DOSEN      <dbl> 3, 2, 23, 36, 1, 2, 11, 0, 3, 5, 0, 14, 6, 28, 69~
$ GURU       <dbl> 72, 40, 272, 378, 118, 72, 69, 116, 126, 71, 36, ~
$ PILOT      <dbl> 1, 0, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 1~
$ PENGACARA_ <dbl> 4, 1, 8, 22, 0, 0, 5, 0, 5, 4, 0, 4, 3, 12, 24, 2~
$ NOTARIS    <dbl> 0, 0, 3, 5, 0, 0, 4, 0, 0, 0, 0, 5, 0, 5, 10, 3, ~
$ ARSITEK    <dbl> 1, 0, 2, 3, 0, 0, 2, 0, 0, 0, 0, 4, 1, 2, 7, 3, 9~
$ AKUNTA_    <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 2~
$ KONSUL_    <dbl> 1, 0, 2, 11, 0, 0, 4, 0, 0, 0, 0, 6, 2, 3, 10, 8,~
$ DOKTER     <dbl> 16, 32, 35, 68, 0, 1, 63, 0, 27, 32, 1, 63, 48, 6~
$ BIDAN      <dbl> 3, 1, 9, 18, 12, 8, 1, 3, 3, 3, 7, 3, 10, 10, 7, ~
$ PERAWAT    <dbl> 7, 0, 25, 44, 12, 10, 3, 6, 12, 20, 6, 7, 26, 16,~
$ APOTEK_    <dbl> 0, 0, 2, 3, 1, 0, 0, 0, 1, 2, 0, 1, 2, 3, 3, 3, 3~
$ PSIKIATER  <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 1~
$ PENYIA_    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0~
$ PENYIA1    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PELAUT     <dbl> 0, 0, 6, 16, 1, 1, 0, 14, 2, 4, 1, 2, 4, 2, 10, 1~
$ PENELITI   <dbl> 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1~
$ SOPIR      <dbl> 65, 3, 94, 123, 0, 1, 61, 0, 76, 79, 0, 63, 44, 1~
$ PIALAN     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PARANORMAL <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0~
$ PEDAGA_    <dbl> 379, 126, 321, 562, 11, 10, 412, 15, 202, 225, 0,~
$ PERANG_    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ KEPALA_    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ BIARAW_    <dbl> 0, 1, 0, 0, 0, 0, 22, 0, 3, 0, 0, 2, 1, 0, 4, 0, ~
$ WIRASWAST_ <dbl> 1370, 611, 1723, 3099, 131, 119, 1128, 259, 2321,~
$ LAINNYA_12 <dbl> 94, 57, 82, 122, 12, 10, 41, 6, 89, 158, 24, 37, ~
$ LUAS_DESA  <dbl> 25476, 25477, 25396, 25399, 25377, 25378, 25389, ~
$ KODE_DES_3 <chr> "3173031006", "3173031007", "3171031003", "317103~
$ DESA_KEL_1 <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMPAKA ~
$ KODE_12    <dbl> 317303, 317303, 317103, 317103, 310101, 310101, 3~
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((106.8164 -6..., MULT~

From the results above we can see that:
- There are 269 multipolygon features and 161 fields in the sf data frame.
- WGS 84 is the Geodetic Coordinate Reference Systems

4.2 Retrieve coordinate reference system from object using st_crs of sf package

st_crs(dkij_sf)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

From the results above we can see that:
- The EPSG is 4326 due to the Geodetic Coordinate Reference Systems being WGS 84.
- Hence, what we need to do next is to assign the correct EPSG code to this simple feature data frame.

4.3 Assign EPSG code to a simple feature data frame

4.3.1 Assign correct EPSG code

The national Projected Coordinates Systems of Indonesia is DGN95 / Indonesia TM-3 zone 54.1.
Hence, when we search this information up in https://epsg.io/ to retrieve the EPSG code, we get 23845 as the EPSG code.
So, we can assign this EPSG code to st_crs by using st_set_crs of sf package as seen in the code chunk below:

dkij_sf23845<- st_transform(dkij_sf, crs = 23845)

4.3.2 Check CSR again

st_crs(dkij_sf23845)

Coordinate Reference System:
  User input: EPSG:23845 
  wkt:
PROJCRS["DGN95 / Indonesia TM-3 zone 54.1",
    BASEGEOGCRS["DGN95",
        DATUM["Datum Geodesi Nasional 1995",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4755]],
    CONVERSION["Indonesia TM-3 zone 54.1",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",139.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9999,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",200000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",1500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre."],
        AREA["Indonesia - onshore east of 138Â°E."],
        BBOX[-9.19,138,-1.49,141.01]],
    ID["EPSG",23845]]

4.3.3 Plot dkij_sf23845

plot(dkij_sf23845)

From the plots above, we can roughly see that there are outer islands in map. Hence, we would need to exclude them from our analysis.

5. Geospatial Data Wrangling

5.1 Exclude outer islands from DKI Jakarta

With prior research to what exactly are the outer islands of DKI Jakarta, my initial findings was that these outer islands are called Thousand Islands which can be translated to Kepulauan Seribu.
However, since this research is not enough, to help us confirm on what exactly defines the outer islands of DKI Jakarta, we can make use of tmap_mode of tmap package to look into the islands outside of DKI jakarta and check the field name.
The idea is that if there is a pattern found within these outer islands (ie. a common field that defines them), we will use that field value to exclude them from the geospatial data.
We will also be using the polygon of the KAB_KOTA field to look into this as the English translation for KAB_KOTA is Cities.

5.1.1 Check islands using tmap_mode

tmap_mode('view')
tm_shape(dkij_sf23845)+
  tm_polygons("KAB_KOTA")

Here we are switching back to plot mode since each interactive mode will consume a connection:

tmap_mode('plot')

From the interactive map above, we can see that:

From the colors legend, there are 6 unique values (Excluding missing value) for the KAB_KOTA field.
5 out of these 6 unique values begins with “JAKARTA”. To understand these fields better, the translations are:
- JAKARTA BARAT: West Jakarta
- JAKARTA PUSAT: Center Jakarta
- JAKARTA SELATAN: South Jakarta
- JAKARTA TIMUR: East Jakarta
- JAKARTA UTARA: North Jakarta
The color legend also helps us to see better the cardinal directions of these cities and whether there is any overlap. For example, JAKARTA PUSAT represented in Yellow is at the center while JAKARTA UTARA represented in Blue is at the north.
When researching on these cities further, they are actually called the 5 administrative cities (kota administrasi) which form the Special Capital Region of Jakarta, Indonesia also known as DKI Jakarta in Indonesian language.
Hence when zooming in to the only value that does not begin with “JAKARTA” which is “KEPULAUAN SERIBU”, we can see for ourselves that it is in fact this value that is referring to the outer islands of Jakarta.

5.1.2 Exclude outer islands (“KEPULAUAN SERIBU”) from DKI Jakarta using filter function

dkij_5cities <- filter(dkij_sf23845, KAB_KOTA != "KEPULAUAN SERIBU")

5.1.3 Plot dkij_5cities

plot(dkij_5cities)

plot(dkij_5cities["KAB_KOTA"])

5.2 Retain first 9 fields using basic indexing of baseR.

Code Chunk

dkij_slice<- dkij_5cities[1:9]

Head

head(dkij_slice)

Simple feature collection with 6 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -3627698 ymin: 689029.1 xmax: -3619528 ymax: 693390.8
Projected CRS: DGN95 / Indonesia TM-3 zone 54.1
  OBJECT_ID  KODE_DESA          DESA   KODE    PROVINSI      KAB_KOTA
1     25477 3173031006     KEAGUNGAN 317303 DKI JAKARTA JAKARTA BARAT
2     25478 3173031007        GLODOK 317303 DKI JAKARTA JAKARTA BARAT
3     25397 3171031003 HARAPAN MULIA 317103 DKI JAKARTA JAKARTA PUSAT
4     25400 3171031006  CEMPAKA BARU 317103 DKI JAKARTA JAKARTA PUSAT
5     25390 3171021001    PASAR BARU 317102 DKI JAKARTA JAKARTA PUSAT
6     25391 3171021002  KARANG ANYAR 317102 DKI JAKARTA JAKARTA PUSAT
    KECAMATAN    DESA_KELUR JUMLAH_PEN                       geometry
1  TAMAN SARI     KEAGUNGAN      21609 MULTIPOLYGON (((-3626874 69...
2  TAMAN SARI        GLODOK       9069 MULTIPOLYGON (((-3627130 69...
3   KEMAYORAN HARAPAN MULIA      29085 MULTIPOLYGON (((-3621251 68...
4   KEMAYORAN  CEMPAKA BARU      41913 MULTIPOLYGON (((-3620608 69...
5 SAWAH BESAR    PASAR BARU      15793 MULTIPOLYGON (((-3624097 69...
6 SAWAH BESAR  KARANG ANYAR      33383 MULTIPOLYGON (((-3624785 69...

From the results above, we can see that the remaining fields after slicing are as follows:

OBJECT_ID
KODE_DESA
DESA
KODE
PROVINSI
KAB_KOTA
KECAMATAN
DESA_KELUR
JUMLAH_PEN

6. Importing, Extracting and Wrangling Aspatial Data

During the first try of reading the excel files, there were some data issues encountered:

Duplicated column names: Some of the excel files either contained 2 ID_KEL columns or 2 Meninggal columns.
Required columns containing NA: Some of the excel files that contained 2 of the Meninggal columns, have different value where the first column consists of all NA values while the second column was without NA values.
Required column containing other values: For ID_KEL column, there are some string values (ie. “LUAR DKI JAKARTA”, “PROSES UPDATE DATA”, “BELUM DIKETAHUI”) which we need to remove in order to successfully integrate with the geospatial data later.

Hence, when reading the excel files, we should aim to resolve these data issues first before successfully integrating the monthly data into 1 dataframe.

5.1 Import and extract required columns

The following code chunk performs 6 tasks:

Read the “data” sheet of all the COVID excel files in the aspatial folder
- Although there are 2 excel sheets in each excel, the data that we require is only in the sheet named data
Remove duplicated columns except the last instance
- This is to cater to both the duplicated columns: ID_KEL and Meninggal
- The pattern noticed in the duplicated Meninggal columns is that the column with the non NA values are always in the last duplicated column. For example, if there are 2 Meninggal (Meninggal, Meninggal.1) where Meninggal comes before Meninggal.1, the non NA column is always in the second/last column Meninggal.1.
- The pattern noticed for ID_KEL is that both its duplicated columns are always the same. Hence, we can choose to retain either one of them.
Convert tibbles to dataframe

Extract the 7 required columns from COVID aspatial data:

1. ID_KEL, 
2. Nama_provinsi, 
3. nama_kota, 
4. nama_kecamatan 
5. nama_kelurahan, 
6. POSITIF (cumulative confirmed cases),
7. meninggal (cumulative death cases) from data worksheet of the daily COVID-19 data

Extract and create Month and Year columns derived from file name
Drop rows where values in ID_KEL column are not digits
- Some examples of string values found in the ID_KEL column include: “LUAR DKI JAKARTA”, “PROSES UPDATE DATA”, “BELUM DIKETAHUI”

aps_path = "data/aspatial"

filenames <- list.files(path = aps_path, pattern = "*.xlsx", full.names = T)

aspatial <- lapply(filenames, function(i){
    data <- which(readxl::excel_sheets(i) == "data") 
    xl <- readxl::read_xlsx(i, sheet=data, .name_repair = "minimal")
    non_dup <- xl[, !duplicated(colnames(xl), fromLast = TRUE)]
    df <-data.frame(non_dup)
    req_col_df <-  select(df, `ID_KEL`, `Nama_provinsi`, `nama_kota`, `nama_kecamatan`, `nama_kelurahan`, `POSITIF`, `Meninggal`)
    full_date <- gsub(".*\\([0-9]{2} (.*)\\).*", "\\1", i)
    month <- gsub('(^\\w+)\\s.+','\\1',full_date)
    year <- gsub('^.*([0-9]{4}).*','\\1', full_date)
    req_col_df$Month = toupper(month)
    req_col_df$Year = strtoi(year)
    req_col_df$MthYr <- paste(req_col_df$Month, req_col_df$Year, sep="_")
    req_col_df <- req_col_df %>% filter(grepl("[[:digit:]]", ID_KEL))

    return(req_col_df)

})

6.2 Integrate the daily data into a data frame by month using rbind.data.frame

covid_df <- do.call("rbind.data.frame", aspatial)

6.3 Save Aspatial file as RDS

Before proceeding with our analysis, it is important to save our aspatial data as an RDS file so that we can avoid:

Reading multiple excel files and performing the same set of data transformation process everytime we run the rmd file
Uploading multiple excel files onto Github
- Github has a maximum size limit of 100mb for individual files.
- Although the current files do not exceed this limit, it is good practice to still save it as RDS file so that there won’t be any complications when we eventually work with relatively large files.

covid_df_rds <- write_rds(covid_df, "data/aspatial/rds/covid_df.rds")

6.4 Read RDS Aspatial file

Code Chunk

covid_df <- read_rds("data/aspatial/rds/covid_df.rds")

Glimpse

glimpse(covid_df)

Rows: 4,539
Columns: 10
$ ID_KEL         <chr> "3172051003", "3173041007", "3175041005", "31~
$ Nama_provinsi  <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", ~
$ nama_kota      <chr> "JAKARTA UTARA", "JAKARTA BARAT", "JAKARTA TI~
$ nama_kecamatan <chr> "PADEMANGAN", "TAMBORA", "KRAMAT JATI", "JATI~
$ nama_kelurahan <chr> "ANCOL", "ANGKE", "BALE KAMBANG", "BALI MESTE~
$ POSITIF        <dbl> 834, 617, 755, 358, 870, 811, 1038, 1447, 898~
$ Meninggal      <dbl> 9, 8, 15, 8, 13, 10, 10, 25, 12, 22, 30, 16, ~
$ Month          <chr> "FEBRUARI", "FEBRUARI", "FEBRUARI", "FEBRUARI~
$ Year           <int> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 202~
$ MthYr          <chr> "FEBRUARI_2021", "FEBRUARI_2021", "FEBRUARI_2~

We can see that there are a total of 4,539 rows in the COVID aspatial data.

7. Integrate Aspatial and Geospatial data

7.1 Combine COVID aspatial data and DKI Jakarta geospatial frame into simple feature data frame using left_join

dkij_slice will be used as the main table since the outer islands have already been excluded from it.

7.1.1 Choosing the join key

Although there are multiple fields that can be mapped like - KECAMATAN->nama_kecamatan, - DESA_KELUR->nama_kelurahan,

We will only be using KODE_DESA from dkij_slice and ID_KEL from covid_df to join since there are some discrepancies found with the other fields. Below are some examples of the discrepancies found:

In KECAMATAN, there is a value “SETIABUDI” but in nama_kecamatan, the value is “SETIA BUDI”.
In DESA_KELUR, there is a value “HALIM PERDANA KUSUMA” but in nama_kelurahan, the value is “HALIM PERDANA KUSUMAH” .

The following code chunk helps us to see how we can observe these discrepancies in the columns:

7.1.1.1 Discrepancies between KECAMATAN and nama_kecamatan

KECAMATAN <- unique(dkij_slice$KECAMATAN)
nama_kecamatan <- unique(covid_df$nama_kecamatan)

Values in KECAMATAN not found in nama_kecamatan:

KECAMATAN[!(KECAMATAN %in% nama_kecamatan)]

[1] "KALIDERES"  "PAL MERAH"  "SETIABUDI"  "PULOGADUNG" "KRAMATJATI"

Values in nama_kecamatan not found in KECAMATAN:

nama_kecamatan[!(nama_kecamatan %in% KECAMATAN)]

[1] "KRAMAT JATI"         "PULO GADUNG"         "SETIA BUDI"         
[4] "PALMERAH"            "KALI DERES"          "KEP. SERIBU UTARA"  
[7] "KEP. SERIBU SELATAN"

7.1.1.2 Discrepancies between DESA_KELUR and nama_kelurahan

DESA_KELUR <- unique(dkij_slice$DESA_KELUR)
nama_kelurahan <- unique(covid_df$nama_kelurahan)

Values in DESA_KELUR not found in nama_kelurahan:

DESA_KELUR[!(DESA_KELUR %in% nama_kelurahan)]

 [1] "KALIBARU"             "KRENDANG"            
 [3] "RAWAJATI"             "TENGAH"              
 [5] "BALEKAMBANG"          "PINANGRANTI"         
 [7] "JATIPULO"             "PALMERIAM"           
 [9] "KRAMATJATI"           "HALIM PERDANA KUSUMA"

Values in nama_kelurahan not found in DESA_KELUR:

nama_kelurahan[!(nama_kelurahan %in% DESA_KELUR)]

 [1] "BALE KAMBANG"          "HALIM PERDANA KUSUMAH"
 [3] "JATI PULO"             "KALI BARU"            
 [5] "KAMPUNG TENGAH"        "KERENDANG"            
 [7] "KRAMAT JATI"           "PAL MERIAM"           
 [9] "PINANG RANTI"          "PULAU HARAPAN"        
[11] "PULAU KELAPA"          "PULAU PANGGANG"       
[13] "PULAU PARI"            "PULAU TIDUNG"         
[15] "PULAU UNTUNG JAWA"     "RAWA JATI"            
[17] "P. HARAPAN"            "P. KELAPA"            
[19] "P. PANGGANG"           "P. PARI"              
[21] "P. TIDUNG"             "UNTUNG JAWA"

7.1.2 Join geospatial and aspatial data using selected join key

Code Chunk

dki_cov <- left_join(dkij_slice, covid_df, by = c('KODE_DESA' ='ID_KEL'))

Head

kable(head(dki_cov)) %>%
  kable_styling("striped", full_width = F) %>%
  scroll_box(width = "100%", height = "400px")

OBJECT_ID	KODE_DESA	DESA	KODE	PROVINSI	KAB_KOTA	KECAMATAN	DESA_KELUR	JUMLAH_PEN	Nama_provinsi	nama_kota	nama_kecamatan	nama_kelurahan	POSITIF	Meninggal	Month	Year	MthYr	geometry
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	435	3	FEBRUARI	2021	FEBRUARI_2021	MULTIPOLYGON (((-3626874 69…
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	3	0	APRIL	2020	APRIL_2020	MULTIPOLYGON (((-3626874 69…
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	540	4	APRIL	2021	APRIL_2021	MULTIPOLYGON (((-3626874 69…
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	349	2	JANUARI	2021	JANUARI_2021	MULTIPOLYGON (((-3626874 69…
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	6	0	JUNI	2020	JUNI_2020	MULTIPOLYGON (((-3626874 69…
25477	3173031006	KEAGUNGAN	317303	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	21609	DKI JAKARTA	JAKARTA BARAT	TAMAN SARI	KEAGUNGAN	725	7	JUNI	2021	JUNI_2021	MULTIPOLYGON (((-3626874 69…

7.2 Calculate cumulative confirmed cases rate (i.e. cases per 10000 population) and the cumulative death cases rate by month

The formula that will be used in this calculation is as such:

Cumulative confirmed cases rate = (Total Confirmed cases per month and sub-district/Population of sub-district)*10000
Cumulative death cases rate = (Total Death cases per month and sub-district/Population of sub-district)*10000

7.2.1 Check for any NA values in the df

colSums(is.na(dki_cov))

     OBJECT_ID      KODE_DESA           DESA           KODE 
             0              0              0              0 
      PROVINSI       KAB_KOTA      KECAMATAN     DESA_KELUR 
             0              0              0              0 
    JUMLAH_PEN  Nama_provinsi      nama_kota nama_kecamatan 
             0              0              0              0 
nama_kelurahan        POSITIF      Meninggal          Month 
             0              0              0              0 
          Year          MthYr       geometry 
             0              0              0

7.2.2 Set geometry to NULL first as we need to pivot our table for the calculation later

Code Chunk

dki_cov_wogeo <- st_set_geometry(dki_cov, NULL)

Glimpse

glimpse(dki_cov_wogeo)

Rows: 4,437
Columns: 18
$ OBJECT_ID      <dbl> 25477, 25477, 25477, 25477, 25477, 25477, 254~
$ KODE_DESA      <chr> "3173031006", "3173031006", "3173031006", "31~
$ DESA           <chr> "KEAGUNGAN", "KEAGUNGAN", "KEAGUNGAN", "KEAGU~
$ KODE           <dbl> 317303, 317303, 317303, 317303, 317303, 31730~
$ PROVINSI       <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", ~
$ KAB_KOTA       <chr> "JAKARTA BARAT", "JAKARTA BARAT", "JAKARTA BA~
$ KECAMATAN      <chr> "TAMAN SARI", "TAMAN SARI", "TAMAN SARI", "TA~
$ DESA_KELUR     <chr> "KEAGUNGAN", "KEAGUNGAN", "KEAGUNGAN", "KEAGU~
$ JUMLAH_PEN     <dbl> 21609, 21609, 21609, 21609, 21609, 21609, 216~
$ Nama_provinsi  <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", ~
$ nama_kota      <chr> "JAKARTA BARAT", "JAKARTA BARAT", "JAKARTA BA~
$ nama_kecamatan <chr> "TAMAN SARI", "TAMAN SARI", "TAMAN SARI", "TA~
$ nama_kelurahan <chr> "KEAGUNGAN", "KEAGUNGAN", "KEAGUNGAN", "KEAGU~
$ POSITIF        <dbl> 435, 3, 540, 349, 6, 725, 221, 144, 60, 260, ~
$ Meninggal      <dbl> 3, 0, 4, 2, 0, 7, 2, 1, 1, 2, 0, 11, 0, 3, 0,~
$ Month          <chr> "FEBRUARI", "APRIL", "APRIL", "JANUARI", "JUN~
$ Year           <int> 2021, 2020, 2021, 2021, 2020, 2021, 2020, 202~
$ MthYr          <chr> "FEBRUARI_2021", "APRIL_2020", "APRIL_2021", ~

7.2.3 Perform calculation for cumulative confirmed cases rate by month (POSITIF)

Code Chunk

cumu_cc_rate <- dki_cov_wogeo %>%
  group_by(KODE_DESA, nama_kelurahan, MthYr, JUMLAH_PEN)  %>%
  summarise(`TOTAL_CONFIRMED` = sum(`POSITIF`))  %>%
  ungroup()  %>%
  mutate(`cumu_cc_rate` = `TOTAL_CONFIRMED`/`JUMLAH_PEN`*10000)  %>%
  select(`KODE_DESA`, `nama_kelurahan`, `JUMLAH_PEN`, `MthYr`, `cumu_cc_rate`) %>%
  pivot_wider(names_from=MthYr, 
              values_from=cumu_cc_rate)

Glimpse

glimpse(cumu_cc_rate)

Rows: 261
Columns: 20
$ KODE_DESA      <chr> "3171011001", "3171011002", "3171011003", "31~
$ nama_kelurahan <chr> "GAMBIR", "CIDENG", "PETOJO UTARA", "PETOJO S~
$ JUMLAH_PEN     <dbl> 3088, 19216, 21828, 18388, 12858, 25256, 1579~
$ AGUSTUS_2020   <dbl> 210.49223, 43.19317, 28.40388, 47.85730, 22.5~
$ APRIL_2020     <dbl> 6.4766839, 5.2039967, 5.0393989, 1.6314988, 5~
$ APRIL_2021     <dbl> 2014.2487, 387.6978, 259.7581, 414.9445, 323.~
$ DESEMBER_2020  <dbl> 1036.2694, 179.0175, 114.9899, 246.9002, 159.~
$ FEBRUARI_2021  <dbl> 1632.1244, 335.6578, 227.2311, 365.4557, 259.~
$ JANUARI_2021   <dbl> 1318.0052, 261.2406, 175.0046, 300.7396, 221.~
$ JULI_2020      <dbl> 35.621762, 20.815987, 19.241341, 16.858821, 8~
$ JULI_2021      <dbl> 3808.2902, 879.9958, 637.7130, 863.0629, 726.~
$ JUNI_2020      <dbl> 29.145078, 10.928393, 13.285688, 10.876659, 6~
$ JUNI_2021      <dbl> 2726.6839, 545.8993, 380.2456, 582.4451, 496.~
$ MARET_2020     <dbl> 0.0000000, 0.5203997, 0.9162544, 0.0000000, 0~
$ MARET_2021     <dbl> 1839.3782, 370.5246, 246.9305, 388.2967, 286.~
$ MEI_2020       <dbl> 3.238342, 8.326395, 8.246289, 2.175332, 5.444~
$ MEI_2021       <dbl> 2075.7772, 409.0341, 272.5857, 450.2937, 356.~
$ NOVEMBER_2020  <dbl> 783.67876, 130.62032, 87.50229, 186.53470, 12~
$ OKTOBER_2020   <dbl> 605.56995, 101.47794, 75.13286, 155.53622, 76~
$ SEPTEMBER_2020 <dbl> 511.65803, 82.74355, 56.34964, 126.16924, 56.~

7.2.4 Perform calculation for cumulative death cases rate by month (Meninggal)

Code Chunk

cumu_death_rate <- dki_cov_wogeo %>%
  group_by(KODE_DESA, nama_kelurahan, MthYr, JUMLAH_PEN)  %>%
  summarise(`TOTAL_DEATHS` = sum(`Meninggal`))  %>%
  ungroup()  %>%
  mutate(`cumu_death_rate` = `TOTAL_DEATHS`/`JUMLAH_PEN`*10000)  %>%
  select(`KODE_DESA`, `nama_kelurahan`, `JUMLAH_PEN`, `MthYr`, `cumu_death_rate`) %>%
  pivot_wider(names_from=MthYr, 
              values_from=cumu_death_rate)

Glimpse

glimpse(cumu_death_rate)

Rows: 261
Columns: 20
$ KODE_DESA      <chr> "3171011001", "3171011002", "3171011003", "31~
$ nama_kelurahan <chr> "GAMBIR", "CIDENG", "PETOJO UTARA", "PETOJO S~
$ JUMLAH_PEN     <dbl> 3088, 19216, 21828, 18388, 12858, 25256, 1579~
$ AGUSTUS_2020   <dbl> 0.0000000, 2.6019983, 0.9162544, 3.2629976, 0~
$ APRIL_2020     <dbl> 0.0000000, 1.0407993, 0.9162544, 0.5438329, 0~
$ APRIL_2021     <dbl> 25.906736, 6.244796, 3.665017, 7.069828, 5.44~
$ DESEMBER_2020  <dbl> 6.476684, 4.163197, 1.374382, 4.894496, 1.555~
$ FEBRUARI_2021  <dbl> 19.430052, 5.203997, 2.290636, 5.982162, 3.88~
$ JANUARI_2021   <dbl> 16.191710, 5.203997, 1.374382, 4.894496, 3.11~
$ JULI_2020      <dbl> 0.0000000, 1.5611990, 0.9162544, 1.0876659, 0~
$ JULI_2021      <dbl> 42.098446, 13.530391, 9.162544, 12.508157, 6.~
$ JUNI_2020      <dbl> 0.0000000, 1.5611990, 0.9162544, 0.5438329, 0~
$ JUNI_2021      <dbl> 29.145078, 7.285595, 5.497526, 9.245160, 6.99~
$ MARET_2020     <dbl> 0.0000000, 0.0000000, 0.4581272, 0.0000000, 0~
$ MARET_2021     <dbl> 25.906736, 5.724396, 3.206890, 7.069828, 4.66~
$ MEI_2020       <dbl> 0.0000000, 1.0407993, 0.9162544, 0.5438329, 0~
$ MEI_2021       <dbl> 29.145078, 6.765196, 3.665017, 8.157494, 6.22~
$ NOVEMBER_2020  <dbl> 3.2383420, 4.1631973, 1.3743815, 4.8944964, 0~
$ OKTOBER_2020   <dbl> 3.2383420, 3.6427977, 1.3743815, 4.3506635, 0~
$ SEPTEMBER_2020 <dbl> 0.0000000, 3.1223980, 1.3743815, 4.3506635, 0~

7.2.5 Perform calculation for relative risk

Relative risk also known as standardized mortality rate (SMR) is sometimes used to compare the observed mortality rate to a national (or regional) standard. This means that we would want to compare the death rates of the individual sub districts to the national DKI death rates.

Code Chunk

rel_risk <- dki_cov_wogeo %>%
  mutate(`TOTAL_DEATHS` = sum(`Meninggal`)) %>%
  mutate(`REL_RISK` = (`TOTAL_DEATHS`*100)/ `TOTAL_DEATHS`/`JUMLAH_PEN`) %>%
  select (`KODE_DESA`, `REL_RISK`)

Glimpse

glimpse(rel_risk)

Rows: 4,437
Columns: 2
$ KODE_DESA <chr> "3173031006", "3173031006", "3173031006", "3173031~
$ REL_RISK  <dbl> 0.004627701, 0.004627701, 0.004627701, 0.004627701~

Code Chunk

total_deaths <- dki_cov_wogeo %>%
  group_by(KODE_DESA, nama_kelurahan, JUMLAH_PEN)  %>%
  summarise(`TOTAL_DEATHS` = sum(`Meninggal`))  %>%
  ungroup()  %>%
  select (`KODE_DESA`, `TOTAL_DEATHS`, `JUMLAH_PEN`, nama_kelurahan)

Glimpse

glimpse(total_deaths)

Rows: 261
Columns: 4
$ KODE_DESA      <chr> "3171011001", "3171011002", "3171011003", "31~
$ TOTAL_DEATHS   <dbl> 62, 140, 86, 146, 58, 144, 96, 200, 153, 187,~
$ JUMLAH_PEN     <dbl> 3088, 19216, 21828, 18388, 12858, 25256, 1579~
$ nama_kelurahan <chr> "GAMBIR", "CIDENG", "PETOJO UTARA", "PETOJO S~

Exploratory Data Analysis

8.1 Create a vector containing the months column names in ascending order

months_col <- c('MARET_2020','APRIL_2020','MEI_2020','JUNI_2020','JULI_2020','AGUSTUS_2020','SEPTEMBER_2020','OKTOBER_2020','NOVEMBER_2020','DESEMBER_2020','JANUARI_2021','FEBRUARI_2021','MARET_2021','APRIL_2021','MEI_2021','JUNI_2021','JULI_2021')

8.2 Cumulative confirmed cases rates

8.2.1 Time Series Box Plot - Cumulative confirmed cases rates

8.2.1.1 Create a list containing all the boxplots by months

The following code chunk will perform 2 tasks:

Create a list called cc_boxplot_list to store the boxplots
For each month_year, create a boxplot using the ggplot2 package and store it in the cc_boxplot_list created

cc_boxplot_list <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  box_plot <- ggplot(cumu_cc_rate, aes_string(x = months_col[[i]])) +
    geom_boxplot(color="midnight blue", fill="light sky blue") +
    labs(title = months_col[[i]]) +
    theme(plot.title = element_text(size = 10),
          axis.title = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank())
  cc_boxplot_list[[i]] <- box_plot
}

8.2.1.2 Plot boxplots to examine DKI Jakarta’s COVID-19 cases rates across months

Here we are using ggarrange function which is a useful function to arrange multiple ggplots on the same page

ggarrange(plotlist = cc_boxplot_list,
          ncol = 5,
          nrow = 4)

Here are some of the observations that we can draw from the boxplots of the cumulative confirmed cases rates above:

The distribution of DKI Jakarta’s COVID-19 cumulative confirmed cases rates across the months are right-skewed.
In all of the months from March 2020 to July 2021, there are multiple sub-districts (kelurahan) which have extremely high cumulative confirmed cases rates as compared to the other sub-districts. This is indicated by the multiple far upper outliers.
As the range of the cumulative confirmed cases rate in DKI Jakarta increases over the months (indicated by the increase in the axis), the threshold defining an outlier also increases.
From November 2020 onwards, two sub-districts are always noticeable at the far end of the upper outlier.
Starting from October 2020 onwards, none of the sub-districts had a cumulative confirmed case rate of 0.

8.2.2 Time Series Histogram - Cumulative confirmed cases rates

8.2.2.1 Create a list containing all the histogram by months

The following code chunk will perform 2 tasks:

Create a list called cc_histogram_list to store the histograms
For each month_year, create a histogram using the ggplot2 package and store it in the cc_histogram_list created

cc_histogram_list <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  hist_plot <- ggplot(cumu_cc_rate, aes_string(x = months_col[[i]])) +
    geom_histogram(fill = "steel blue") +
    labs(title = months_col[[i]]) +
    theme(plot.title = element_text(size = 10),
          axis.title = element_blank())
  
  cc_histogram_list[[i]] <- hist_plot
}

8.2.2.2 Plot histograms to examine DKI Jakarta’s COVID-19 cumulative confirmed cases rates across months

ggarrange(plotlist = cc_histogram_list,
          ncol = 5,
          nrow = 4)

Here are some of the observations that we can draw from the cumulative confirmed cases histogram above:

The distribution of COVID-19 cumulative confirmed cases rates across the months are right-skewed.
In March 2020, the majority of the sub-districts (keluruhan) had a cumulative confirmed cases rate of 0. This might be because the very first case finding in Indonesia was announced on 2nd March 2020 according to: Situation report #5: COVID-19 pandemic - 24th March 2020 - Indonesia.
Hence, as months passed and as cases spread across all the provinces in Indonesia, the number of sub-districts in DKI Jakarta having a cumulative confirmed case rate of 0 dropped.
Starting from October 2020, none of the sub-districts had a cumulative confirmed case rate of 0.

8.3 Cumulative death cases rates

8.3.1 Time Series Box Plot - Cumulative death cases rates

8.3.1.1 Create a list containing all the boxplots by months

The following code chunk will perform 2 tasks:

Create a list called death_boxplot_list to store the boxplots
For each month_year, create a boxplot using the ggplot2 package and store it in the death_boxplot_list created

# Create a list to store the boxplots
death_boxplot_list <- vector(mode = "list", length = length(months_col))
# For each month_year, create a boxplot using the ggplot2 package and store it in the boxplot_list created
for (i in 1:length(months_col)) {
  box_plot <- ggplot(cumu_death_rate, aes_string(x = months_col[[i]])) +
    geom_boxplot(color="firebrick", fill="light coral") +
    labs(title = months_col[[i]]) +
    theme(plot.title = element_text(size = 10),
          axis.title = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank())
  death_boxplot_list[[i]] <- box_plot
}

8.3.1.2 Plot boxplots to examine DKI Jakarta’s COVID-19 death cases rates across months

Here we are using ggarrange function which is a useful function to arrange multiple ggplots on the same page

ggarrange(plotlist = death_boxplot_list,
          ncol = 5,
          nrow = 4)

Here are some of the observations that we can draw from the boxplots of the cumulative death cases rates above:

In comparison to the cumulative confirmed cases rates boxplots, the distribution of DKI Jakarta’s COVID-19 cumulative death cases rates across the months are also right-skewed.
Similarly, in all of the months from March 2020 to July 2021, there are multiple sub-districts which have high cumulative death cases rates as compared to the other sub-districts.
As the range of the cumulative death cases rates in DKI Jakarta increases over the months, the threshold defining an outlier also increases.
Starting from February 2021, one sub-district can be seen always appearing at the far end of the upper outlier. Additionally, starting from February 2021, the minimum of the boxplot (minimum cumulative death cases rates) is no longer 0.

8.3.2 Time Series Histogram - Cumulative death cases rates

8.3.2.1 Create a list containing all the histogram by months

The following code chunk will perform 2 tasks:

Create a list called death_histogram_list to store the histograms
For each month_year, create a histogram using the ggplot2 package and store it in the death_histogram_list created

death_histogram_list <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  hist_plot <- ggplot(cumu_death_rate, aes_string(x = months_col[[i]])) +
    geom_histogram(fill = "firebrick") +
    labs(title = months_col[[i]]) +
    theme(plot.title = element_text(size = 10),
          axis.title = element_blank())
  
  death_histogram_list[[i]] <- hist_plot
}

8.3.2.2 Plot histograms to examine DKI Jakarta’s COVID-19 cumulative death cases rates across months

ggarrange(plotlist = death_histogram_list,
          ncol = 5,
          nrow = 4)

Here are some of the observations that we can draw from the histograms of the cumulative death cases rates above:

The distribution of COVID-19 cumulative death cases rates across the months are right-skewed.
In March 2020, the majority of the sub-districts (keluruhan) had a cumulative death cases rates of 0.
As months passed, the number of sub-districts in DKI Jakarta having a cumulative death cases rate of 0 dropped.
Starting from February 2021 onwards, none of the sub-districts had a cumulative death cases rate of 0.

9. Thematic Mapping

As seen from the previous sections, both the cumulative confirmed cases and death cases rates by month vary significantly. Hence, when we perform thematic mapping, we should carefully choose how we decide to define the intervals that are able to show significant spatio-temporal patterns.

Additionally, although it is known that the distribution for both the cumulative confirmed cases and death cases rates are right skewed, I will still be showing the difference between the maps with Manually Defined classification and Jenks classification. Equal and Quantiles classification will not be plotted for this analysis as:

Equal classification is more appropriate for variables with a uniform distribution
Quantiles classification is more well suited for linearly distributed data.

9.1 Time-series choropleth map of COVID-19 rates - Manual Classification

9.1.1 Check summary of cumulative confirmed cases rate by month (POSITIF)

summary(cumu_cc_rate)

  KODE_DESA         nama_kelurahan       JUMLAH_PEN    
 Length:261         Length:261         Min.   :  3088  
 Class :character   Class :character   1st Qu.: 26177  
 Mode  :character   Mode  :character   Median : 38845  
                                       Mean   : 42082  
                                       3rd Qu.: 52424  
                                       Max.   :167523  
  AGUSTUS_2020       APRIL_2020       APRIL_2021     
 Min.   :  2.429   Min.   : 0.000   Min.   :  90.04  
 1st Qu.: 14.134   1st Qu.: 1.393   1st Qu.: 264.21  
 Median : 19.532   Median : 2.220   Median : 315.17  
 Mean   : 25.203   Mean   : 3.370   Mean   : 342.78  
 3rd Qu.: 31.381   3rd Qu.: 4.016   3rd Qu.: 372.94  
 Max.   :210.492   Max.   :49.883   Max.   :2014.25  
 DESEMBER_2020     FEBRUARI_2021     JANUARI_2021    
 Min.   :  42.06   Min.   :  75.2   Min.   :  59.22  
 1st Qu.: 106.54   1st Qu.: 217.0   1st Qu.: 163.49  
 Median : 124.56   Median : 256.6   Median : 192.08  
 Mean   : 142.66   Mean   : 280.1   Mean   : 212.31  
 3rd Qu.: 152.73   3rd Qu.: 309.5   3rd Qu.: 229.76  
 Max.   :1036.27   Max.   :1632.1   Max.   :1318.01  
   JULI_2020         JULI_2021        JUNI_2020      
 Min.   :  1.518   Min.   : 187.3   Min.   :  0.000  
 1st Qu.:  7.477   1st Qu.: 545.1   1st Qu.:  4.179  
 Median : 10.690   Median : 658.3   Median :  6.515  
 Mean   : 14.002   Mean   : 705.3   Mean   :  8.728  
 3rd Qu.: 15.758   3rd Qu.: 779.7   3rd Qu.: 10.631  
 Max.   :112.243   Max.   :3808.3   Max.   :105.336  
   JUNI_2021        MARET_2020        MARET_2021     
 Min.   : 119.4   Min.   : 0.0000   Min.   :  83.71  
 1st Qu.: 360.0   1st Qu.: 0.0000   1st Qu.: 247.10  
 Median : 423.9   Median : 0.3264   Median : 294.83  
 Mean   : 465.3   Mean   : 0.7609   Mean   : 318.40  
 3rd Qu.: 503.4   3rd Qu.: 0.6928   3rd Qu.: 348.82  
 Max.   :2726.7   Max.   :49.8826   Max.   :1839.38  
    MEI_2020         MEI_2021       NOVEMBER_2020     OKTOBER_2020   
 Min.   : 0.000   Min.   :  91.88   Min.   : 28.85   Min.   : 13.82  
 1st Qu.: 2.646   1st Qu.: 276.21   1st Qu.: 75.33   1st Qu.: 54.17  
 Median : 4.198   Median : 333.16   Median : 88.97   Median : 64.34  
 Mean   : 5.458   Mean   : 362.22   Mean   :103.09   Mean   : 76.97  
 3rd Qu.: 6.796   3rd Qu.: 391.92   3rd Qu.:110.95   3rd Qu.: 85.28  
 Max.   :49.883   Max.   :2075.78   Max.   :783.68   Max.   :605.57  
 SEPTEMBER_2020   
 Min.   :  6.681  
 1st Qu.: 32.867  
 Median : 41.312  
 Mean   : 51.530  
 3rd Qu.: 58.183  
 Max.   :511.658

From March 2020 to July 2021, it can be seen that the cumulative confirmed cases rate ranges from 0 to about 3810.
If we were to manually define the intervals, we would get 9 classes of intervals of 450.

9.1.2 Check summary of cumulative death cases rate by month (Meninggal)

summary(cumu_death_rate)

  KODE_DESA         nama_kelurahan       JUMLAH_PEN    
 Length:261         Length:261         Min.   :  3088  
 Class :character   Class :character   1st Qu.: 26177  
 Mode  :character   Mode  :character   Median : 38845  
                                       Mean   : 42082  
                                       3rd Qu.: 52424  
                                       Max.   :167523  
  AGUSTUS_2020      APRIL_2020       APRIL_2021     DESEMBER_2020  
 Min.   :0.0000   Min.   :0.0000   Min.   : 1.679   Min.   :0.000  
 1st Qu.:0.3321   1st Qu.:0.0000   1st Qu.: 4.273   1st Qu.:1.632  
 Median :0.6899   Median :0.2074   Median : 5.706   Median :2.444  
 Mean   :0.8827   Mean   :0.3482   Mean   : 5.772   Mean   :2.583  
 3rd Qu.:1.2247   3rd Qu.:0.4854   3rd Qu.: 6.882   3rd Qu.:3.211  
 Max.   :6.1069   Max.   :6.1069   Max.   :25.907   Max.   :9.160  
 FEBRUARI_2021     JANUARI_2021      JULI_2020        JULI_2021     
 Min.   : 1.388   Min.   : 0.000   Min.   :0.0000   Min.   : 3.439  
 1st Qu.: 3.243   1st Qu.: 2.415   1st Qu.:0.2155   1st Qu.: 8.446  
 Median : 4.486   Median : 3.266   Median :0.5490   Median :10.298  
 Mean   : 4.637   Mean   : 3.506   Mean   :0.6912   Mean   :10.602  
 3rd Qu.: 5.631   3rd Qu.: 4.337   3rd Qu.:0.9707   3rd Qu.:12.508  
 Max.   :19.430   Max.   :16.192   Max.   :6.1069   Max.   :42.098  
   JUNI_2020        JUNI_2021        MARET_2020        MARET_2021    
 Min.   :0.0000   Min.   : 2.339   Min.   :0.00000   Min.   : 1.388  
 1st Qu.:0.0000   1st Qu.: 5.503   1st Qu.:0.00000   1st Qu.: 4.002  
 Median :0.4144   Median : 7.253   Median :0.00000   Median : 5.306  
 Mean   :0.5579   Mean   : 7.351   Mean   :0.08094   Mean   : 5.438  
 3rd Qu.:0.7939   3rd Qu.: 8.617   3rd Qu.:0.00000   3rd Qu.: 6.441  
 Max.   :6.1069   Max.   :29.145   Max.   :3.05344   Max.   :25.907  
    MEI_2020         MEI_2021      NOVEMBER_2020    OKTOBER_2020   
 Min.   :0.0000   Min.   : 2.064   Min.   :0.000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.: 4.697   1st Qu.:1.166   1st Qu.:0.9442  
 Median :0.3372   Median : 6.157   Median :1.839   Median :1.4868  
 Mean   :0.4611   Mean   : 6.350   Mean   :2.022   Mean   :1.6846  
 3rd Qu.:0.6447   3rd Qu.: 7.544   3rd Qu.:2.639   3rd Qu.:2.2127  
 Max.   :6.1069   Max.   :29.145   Max.   :6.107   Max.   :6.1069  
 SEPTEMBER_2020  
 Min.   :0.0000  
 1st Qu.:0.5694  
 Median :0.9934  
 Mean   :1.2192  
 3rd Qu.:1.6270  
 Max.   :6.1069

From March 2020 to July 2021, it can be seen that the cumulative death cases rate ranges from 0 to about 42.
If we were to manually define the intervals, we would get 9 classes intervals of 5.

9.1.3 Join cumulative confirmed rates with dkij_slice sf dataframe

9.1.3.1 Cumulative confirmed cases rates

Code Chunk

cumu_cc_dkij <- left_join(dkij_slice, cumu_cc_rate, by = c('KODE_DESA' ='KODE_DESA', 'JUMLAH_PEN' = 'JUMLAH_PEN'))

Glimpse

glimpse(cumu_cc_dkij)

Rows: 261
Columns: 28
$ OBJECT_ID      <dbl> 25477, 25478, 25397, 25400, 25390, 25391, 253~
$ KODE_DESA      <chr> "3173031006", "3173031007", "3171031003", "31~
$ DESA           <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ KODE           <dbl> 317303, 317303, 317103, 317103, 317102, 31710~
$ PROVINSI       <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", ~
$ KAB_KOTA       <chr> "JAKARTA BARAT", "JAKARTA BARAT", "JAKARTA PU~
$ KECAMATAN      <chr> "TAMAN SARI", "TAMAN SARI", "KEMAYORAN", "KEM~
$ DESA_KELUR     <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ JUMLAH_PEN     <dbl> 21609, 9069, 29085, 41913, 15793, 33383, 3590~
$ nama_kelurahan <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ AGUSTUS_2020   <dbl> 27.766209, 8.821259, 44.352759, 44.139050, 38~
$ APRIL_2020     <dbl> 1.3883104, 2.2053148, 3.7820182, 3.1016630, 1~
$ APRIL_2021     <dbl> 249.8959, 347.3371, 322.5030, 371.7224, 464.1~
$ DESEMBER_2020  <dbl> 120.32024, 94.82854, 136.49648, 160.80930, 18~
$ FEBRUARI_2021  <dbl> 201.3050, 250.3032, 278.1502, 320.6642, 385.6~
$ JANUARI_2021   <dbl> 161.5068, 168.7066, 204.5728, 245.0314, 279.2~
$ JULI_2020      <dbl> 8.792633, 6.615944, 24.755028, 22.904588, 16.~
$ JULI_2021      <dbl> 545.1432, 853.4568, 652.9139, 694.7725, 939.6~
$ JUNI_2020      <dbl> 2.776621, 2.205315, 10.658415, 12.168062, 11.~
$ JUNI_2021      <dbl> 335.5084, 527.0702, 438.0265, 480.2806, 609.7~
$ MARET_2020     <dbl> 0.0000000, 1.1026574, 0.6876397, 0.0000000, 1~
$ MARET_2021     <dbl> 231.3851, 309.8467, 306.6873, 350.4879, 443.2~
$ MEI_2020       <dbl> 1.851081, 2.205315, 5.501117, 7.157684, 14.56~
$ MEI_2021       <dbl> 263.7790, 378.2115, 339.3502, 379.3572, 480.5~
$ NOVEMBER_2020  <dbl> 102.27220, 61.74881, 109.33471, 128.36113, 13~
$ OKTOBER_2020   <dbl> 84.68694, 47.41427, 92.48754, 103.30924, 116.~
$ SEPTEMBER_2020 <dbl> 66.63890, 31.97706, 63.60667, 63.70339, 93.71~
$ geometry       <MULTIPOLYGON [m]> MULTIPOLYGON (((-3626874 69..., ~

9.1.3.2 Cumulative death cases rates

Code Chunk

cumu_death_dkij <- left_join(dkij_slice, cumu_death_rate, by = c('KODE_DESA' ='KODE_DESA', 'JUMLAH_PEN' = 'JUMLAH_PEN' ))

Glimpse

glimpse(cumu_death_dkij)

Rows: 261
Columns: 28
$ OBJECT_ID      <dbl> 25477, 25478, 25397, 25400, 25390, 25391, 253~
$ KODE_DESA      <chr> "3173031006", "3173031007", "3171031003", "31~
$ DESA           <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ KODE           <dbl> 317303, 317303, 317103, 317103, 317102, 31710~
$ PROVINSI       <chr> "DKI JAKARTA", "DKI JAKARTA", "DKI JAKARTA", ~
$ KAB_KOTA       <chr> "JAKARTA BARAT", "JAKARTA BARAT", "JAKARTA PU~
$ KECAMATAN      <chr> "TAMAN SARI", "TAMAN SARI", "KEMAYORAN", "KEM~
$ DESA_KELUR     <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ JUMLAH_PEN     <dbl> 21609, 9069, 29085, 41913, 15793, 33383, 3590~
$ nama_kelurahan <chr> "KEAGUNGAN", "GLODOK", "HARAPAN MULIA", "CEMP~
$ AGUSTUS_2020   <dbl> 0.4627701, 1.1026574, 2.0629190, 2.1473051, 0~
$ APRIL_2020     <dbl> 0.0000000, 0.0000000, 0.6876397, 1.1929473, 0~
$ APRIL_2021     <dbl> 1.851081, 9.923917, 5.157298, 6.919094, 6.965~
$ DESEMBER_2020  <dbl> 0.9255403, 3.3079722, 3.7820182, 4.5331997, 3~
$ FEBRUARI_2021  <dbl> 1.388310, 5.513287, 4.813478, 5.726147, 4.432~
$ JANUARI_2021   <dbl> 0.9255403, 4.4106296, 4.4696579, 4.7717892, 4~
$ JULI_2020      <dbl> 0.0000000, 1.1026574, 1.7190992, 1.6701262, 0~
$ JULI_2021      <dbl> 5.090472, 14.334546, 11.689875, 13.361010, 12~
$ JUNI_2020      <dbl> 0.0000000, 1.1026574, 1.0314595, 1.6701262, 0~
$ JUNI_2021      <dbl> 3.239391, 12.129231, 7.564036, 9.782168, 9.49~
$ MARET_2020     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0~
$ MARET_2021     <dbl> 1.388310, 6.615944, 4.813478, 6.919094, 6.331~
$ MEI_2020       <dbl> 0.0000000, 1.1026574, 0.6876397, 1.4315368, 0~
$ MEI_2021       <dbl> 2.776621, 11.026574, 5.844937, 7.634863, 8.23~
$ NOVEMBER_2020  <dbl> 0.9255403, 3.3079722, 3.0943785, 4.2946103, 2~
$ OKTOBER_2020   <dbl> 0.4627701, 2.2053148, 3.0943785, 4.0560208, 1~
$ SEPTEMBER_2020 <dbl> 0.4627701, 1.1026574, 2.7505587, 2.8630735, 0~
$ geometry       <MULTIPOLYGON [m]> MULTIPOLYGON (((-3626874 69..., ~

9.1.4 Time-series choropleth map of cumulative confirmed cases rates (Manual Classification)

9.1.4.1 Create list containing all maps by months

The following code chunk will perform 2 tasks:

Create a list called cc_map_list to store the maps
For each month_year, create a map using the tmap package and store it in the cc_map_list created

cc_map_list_manual <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- tm_shape(cumu_cc_dkij) +
            tm_fill(col = months_col[[i]],
                    palette = 'Blues',
                    breaks = c(0, 450, 900, 1350, 1800, 2250, 2700, 3150, 3600, 4050)
                    
                    ) +
            tm_borders(lwd = 0.1,
                       alpha = 0.3) +
            tm_layout(panel.show = TRUE,
                      panel.labels = months_col[[i]],
                      panel.label.color = 'black',
                      panel.label.size = 0.5,
                      legend.show = TRUE,
                      legend.text.size = 0.5,
                      inner.margins = 0
                      )
  cc_map_list_manual[[i]] <- cmap
}

9.1.4.2 Arrange maps to create a time-series visualisation of choropleth maps across months

tmap_arrange(cc_map_list_manual, outer.margins=0, ncol = 3)

9.1.5 Time-series choropleth map of cumulative death cases rates (Manual Classification)

9.1.4.1 Create list containing all maps by months

The following code chunk will perform 2 tasks:

Create a list called death_map_list to store the maps
For each month_year, create a map using the tmap package and store it in the death_map_list created

death_map_list_manual <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- tm_shape(cumu_death_dkij) +
            tm_fill(col = months_col[[i]],
                    palette = 'Reds',
                    breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45)) +
            tm_borders(lwd = 0.1,
                       alpha = 0.3) +
            tm_layout(panel.show = TRUE,
                      panel.labels = months_col[[i]],
                      panel.label.color = 'black',
                      panel.label.size = 0.5,
                      legend.show = TRUE, 
                      legend.text.size = 0.5,
                      inner.margins = 0
                      )
  death_map_list_manual[[i]] <- cmap
}

9.1.5.2 Arrange maps to create a time-series visualisation of choropleth maps across months

tmap_arrange(death_map_list_manual, ncol = 3, outer.margins=0)

From both the cumulative confirmed and death cases maps above, some observations that can be found are:

Sub-districts in the Central Region of Jakarta had higher COVID-19 cumulative confirmed cases rate which is indicated by the darker shade of blue which can be visible from Feb 2021 onwards.
Sub-districts with higher COVID-19 cumulative cases rates also had higher cumulative death cases rates which is indicated by the darker shade of red. Again, this can be seen in the Central Region from Jan 2021 onwards.
The spatial patterns in the earlier months however can hardly be observed as this is due to the manual classification used.

Some disadvantages that are commonly known when using this manual classification are:

Intervals of 450 will have the tendency to over-generalize the data since the cumulative confirmed cases rates in the earlier months (March 2020 to November 2020) are relatively low as compared to the later months. This will also result in loss of important spatial patterns which we would want to avoid.
Similarly, for cumulative death cases rates, intervals of 5 may not be ideal as we are unable to capture important patterns of those cumulative death cases rates that are relatively low in the earlier months.
Both the cumulative confirmed cases and death cases rates are right skewed. Hence, those outliers that are observed in the previous boxplots are more likely to produce empty classes.

Therefore, since our distribution for both cumulative confirmed cases and death cases rates are right skewed, it would be better to use Natural Breaks classification to plot the choropleth maps for both cumulative confirmed cases and death rates instead.

9.2 Time-series choropleth map of COVID-19 rates (Jenks Classification)

9.2.1 Time-series choropleth map of cumulative confirmed cases rates (Jenks Classification)

9.2.1.1 Create list containing all maps by months

The following code chunk will perform 2 tasks:

Create a list called cc_map_list to store the maps
For each month_year, create a map using the tmap package and store it in the cc_map_list created

cc_map_list_jenks <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- tm_shape(cumu_cc_dkij) +
            tm_fill(col = months_col[[i]],
                    palette = 'Blues',
                    style = 'jenks',
                    n = 5) +
            tm_borders(lwd = 0.1,
                       alpha = 0.3) +
            tm_layout(panel.show = TRUE,
                      panel.labels = months_col[[i]],
                      panel.label.color = 'black',
                      panel.label.size = 0.5,
                      legend.show = TRUE,
                      legend.text.size = 0.5,
                      inner.margins = 0)
  cc_map_list_jenks[[i]] <- cmap
}

9.2.1.2 Arrange maps to create a time-series visualisation of choropleth maps across months

tmap_arrange(cc_map_list_jenks, ncol = 3, outer.margins=0)

With the Jenks classification, Spatial-temporal patterns can now be observed for all months.
In March 2020 when COVID-19 outbreak first started in Indonesia, the most obvious cluster that can be seen in the sub-districts located at the South West region of Jakarta.
This began to change over the months as more clusters are formed near the central and north region of DKI Jakarta. This change can be seen in August 2020, with a higher concentration of confirmed cases in the central region, indicated by a darker shade of blue.
Additionally, from August 2020 to July 2021, there is a particular sub-district in the central region of DKI Jakarta that has the darkest shade of blue. Referring to our boxplot created previously, we can now confirm that the uppermost outlier for the boxplots from August 2020 to July 20201 is in fact this particular sub-district.

9.2.2 Time-series choropleth map of cumulative death cases rates (Jenks Classification)

9.2.2.1 Create list containing all maps by months

The following code chunk will perform 2 tasks:

Create a list called death_map_list to store the maps
For each month_year, create a map using the tmap package and store it in the death_map_list created

death_map_list_jenks <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- tm_shape(cumu_death_dkij) +
            tm_fill(col = months_col[[i]],
                    palette = 'Reds',
                    style = 'jenks',
                    n = 5) +
            tm_borders(lwd = 0.1,
                       alpha = 0.3) +
            tm_layout(panel.show = TRUE,
                      panel.labels = months_col[[i]],
                      panel.label.color = 'black',
                      panel.label.size = 0.5,
                      legend.show = TRUE,
                      legend.text.size = 0.5,
                      inner.margins = 0)
  death_map_list_jenks[[i]] <- cmap
}

9.2.2.2 Arrange maps to create a time-series visualisation of choropleth maps across months

tmap_arrange(death_map_list_jenks, ncol = 3, outer.margins=0)

Similarly, for the choropleth maps of cumulative death cases rates created with the Jenks classification, spatial-temporal patterns can now be observed for all months.
In March 2020, the clusters for the cumulative death rates are at random sub-districts. However, as months passed, the most obvious cluster that had a more concentrated cumulative death rates are at the sub-districts of the central region of DKI Jakarta. This can be seen from August 2020 to November 2020 where the centre sub-districts are in a darker shade of red.
One interesting find here would be that, for the particular sub-district that had the highest cumulative confirmed cases rate previously from August 2020 to July 2021, it instead had the highest cumulative death cases rate only from January 2021 onwards.

Overall, although choropleth maps provides a quick and easy way to spot spatial patterns, the observations are not conclusive as our data contains outliers. The next section will attempt to determine if these are more significant patterns that can be derived using extreme value maps.

10. Analytical Mapping

As it is evident from the boxplots created earlier, the COVID data contains outliers. Hence, we can also make use of extreme value maps like percentile and box map to highlight these values at the lower and upper end of the scale.

10.1 Box Map

10.1.1 Functions needed to plot box maps

Get var function: For extracting the variable as vector out of the dataframe
Box breaks function: Creating break points for the box map into 6 classes to classify the data points
- “lower outlier”
- “1 - 25%”
- “25% - 50%”
- “50% - 75%”
- “> 75%”
- “upper outlier”

# Extract variable as vector out of sf dataframe
get.var <- function(vname, df) {
  v <- df[vname]
  v <- unname(v[[1]])
  return(v)
}
# To create break points for box map
boxbreaks <- function(v, mult=1.5) {
  qv <- unname(quantile(v))
  iqr <- qv[4] - qv[2]
  upfence <- qv[4] + mult * iqr
  lofence <- qv[2] - mult * iqr
  # initialize break points vector
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) { # no lower outliers
    bb[1] <- lofence
    bb[2] <- floor(qv[1])
  } else {
    bb[2] <- lofence
    bb[1] <- qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
    bb[7] <- upfence
    bb[6] <- ceiling(qv[5])
  } else {
    bb[6] <- upfence
    bb[7] <- qv[5]
  }
  bb[3:5] <- qv[2:4]
  return(bb)
}

10.1.2 Box map function to plot the box map

# Boxmap function
boxmap <- function(vnam, df, 
                   legtitle=NA,
                   mult=1.5){
  var <- get.var(vnam,df)
  bb <- boxbreaks(var)
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Blues",
          labels = c("lower outlier", 
                     "< 25%", 
                     "25% - 50%", 
                     "50% - 75%",
                     "> 75%", 
                     "upper outlier"))  +
  tm_borders(lwd=0.1, alpha=0.3) +
  tm_layout(legend.show = TRUE)
}

10.1.3 Test boxbreaks function

var <- get.var("JULI_2021", cumu_cc_dkij)
boxbreaks(var)

[1]  187.3189  193.3497  545.1432  658.3395  779.6722 1131.4658
[7] 3808.2902

10.1.4 Time-series box map of cumulative confirmed cases rates

cc_map_list_boxmap <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- boxmap(months_col[[i]], cumu_cc_dkij)
  cc_map_list_boxmap[[i]] <- cmap
}

tmap_arrange(cc_map_list_boxmap, ncol = 3, outer.margins=0)

From the above box maps, more sub districts are in a darker blue shade. From this, we can infer that more sub districts are in the upper outlier class which is also confirming the observation that we have seen in the earlier boxplots.
Although previously it is thought that the concentration of confirmed cases are in the centre region, we can see that over the months, the southern region of Jakarta seems to be appearing more in the upper outlier class. This can be seen from January 2021 onwards. This observation cannot be seen in the choropleth maps plotted previously using jenks classification.

10.1.5 Time-series box map of cumulative death cases rates

# Extract variable as vector out of sf dataframe
get.var <- function(vname, df) {
  v <- df[vname]
  v <- unname(v[[1]])
  return(v)
}

boxmap <- function(vnam, df, 
                   legtitle=NA,
                   mult=1.5){
  var <- get.var(vnam,df)
  bb <- boxbreaks(var)
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Reds",
          labels = c("lower outlier", 
                     "< 25%", 
                     "25% - 50%", 
                     "50% - 75%",
                     "> 75%", 
                     "upper outlier"))  +
  tm_borders(lwd=0.1, alpha=0.3) +
  tm_layout(title = NA,
            legend.show = TRUE)
}

death_map_list_boxmap <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- boxmap(months_col[[i]], cumu_death_dkij)
  death_map_list_boxmap[[i]] <- cmap
}

tmap_arrange(death_map_list_boxmap, ncol = 3, outer.margins=0)

Similarly, for the cumulative death cases rate, more sub districts are in a darker red shade as seen from the above box maps. From this, we can infer that more sub districts are in the upper outlier class which is also confirming the observation that we have seen in the earlier boxplots.
We can now see that the concentration of confirmed cases are not only in the centre region, but also spread out across the other regions.
There is also no clear indication as to which sub-district has the highest cumulative death cases rate over time as the rates seems to spread out more to all of the sub-districts in all DKI Jakarta’s region.
However, we can see that towards the north of DKI Jakarta, the sub-districts had a relatively low cumulative death cases rate compared to the other areas.
The only observation that is inaccurate would be the box map plotted for the month of March 2020 as almost all the sub-districts is classified in the upper outlier class. This might be due to the fact that most of the sub-districts having a lower death rate in March 2020 since COVID-19 outbreak only started in that month. This observation can also be seen from our boxplot and histogram created previously.
Another alternative to identify these outliers is to use is Percentile Maps which is shown in the next section.

10.2 Percentile Maps

10.2.1 get.var function for Percentile Maps

get.var <- function(vname,df) {
  v <- df[vname] %>% 
    st_set_geometry(NULL)
  v <- unname(v[[1]])
  return(v)
}

10.2.2 Time-series percentile map of cumulative confirmed cases rates

10.2.2.1 Time-series percentile map function

# Creating the percentmap function
percentmap <- function(vnam, df, legtitle=NA){
  percent <- c(0,.01, .1,.5,.9,.99,1)
  var <- get.var(vnam,df)
  bperc <- quantile (var, percent)
  tm_shape (cumu_cc_dkij) + 
    tm_polygons() + 
    tm_shape(df)+
    tm_fill(vnam, title=legtitle, breaks=bperc, palette="Blues", 
            labels = c("< 1%", "1%-10%", "10%-50%", "50%-90%", "90%-99%", ">99%"))+
     tm_borders(lwd=0.1, alpha = 0.3) +
    tm_layout(legend.show = TRUE)
    
}

10.2.2.2 Create a list containing all maps by months

cc_map_list_percentile <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- percentmap(months_col[[i]], cumu_cc_dkij)
  cc_map_list_percentile[[i]] <- cmap
}

10.2.2.3 Create visualistion of percentile maps across months

tmap_arrange(cc_map_list_percentile, ncol = 3, outer.margins=0)

From the percentile maps shown above, we can see that more sub-districts are classified as 50% quantile and above classes.
In comparison to the box maps above, the observations are quite similar.
However, it becomes more evident that the sub-districts in the centre region are the outliers here.

10.2.3 Time-series percentile map of cumulative confirmed death rates

10.2.3.1 Time-series percentile map function

# Creating the percentmap function
percentmap <- function(vnam, df, legtitle=NA){
  percent <- c(0,.01, .1,.5,.9,.99,1)
  var <- get.var(vnam,df)
  bperc <- quantile (var, percent)
  tm_shape (cumu_death_dkij) + 
    tm_polygons() + 
    tm_shape(df)+
    tm_fill(vnam, title=legtitle, breaks=bperc, palette="Reds", 
            labels = c("< 1%", "1%-10%", "10%-50%", "50%-90%", "90%-99%", ">99%"))+
     tm_borders(lwd=0.1, alpha = 0.3) +
    tm_layout(legend.show = TRUE)
    
}

10.2.3.2 Create a list containing all maps by months

death_map_list_percentile <- vector(mode = "list", length = length(months_col))
for (i in 1:length(months_col)) {
  cmap <- percentmap(months_col[[i]], cumu_death_dkij)
  death_map_list_percentile[[i]] <- cmap
}

10.2.2.3 Create visualistion of percentile maps accross months

tmap_arrange(death_map_list_percentile, ncol = 3, outer.margins=0)

From the percentile maps shown above, we can see that more sub-districts are classified as 50% quantile and above classes.
In comparison to the box maps above, we can now clearly see the specific regions that are in the darker red shade and this would mean that they are the outliers that we saw in the box plots previously.
Over the months, we can also see clearly which are the outliers that below to the upper class indicated by the darkest red shade (>99% quantile).

10.3 Relative Risk map vs Total Deaths map

rel_risk_geo <- left_join(dkij_slice, rel_risk, by = c('KODE_DESA' ='KODE_DESA'))
total_deaths_geo <- left_join(dkij_slice, total_deaths, by = c('KODE_DESA' ='KODE_DESA', 'JUMLAH_PEN'='JUMLAH_PEN'))

relative_risk_map <- tm_shape(rel_risk_geo) +
  tm_fill(col = "REL_RISK", n = 5, style = "jenks", title = "Relative Risk", palette="Greens") +
   tm_layout(main.title = "Relative Risk",
             main.title.position = "center",
             main.title.size = 1.5) + 
  tm_borders(alpha = 0.3) 

total_death_map <- tm_shape(total_deaths_geo)+
  tm_fill(col="TOTAL_DEATHS", style="jenks", n = 5, title= "Total Deaths", palette="Reds") +
   tm_layout(main.title = "Total Deaths",
             main.title.position = "center",
             main.title.size = 1.5) +
  tm_borders(alpha = 0.3) 

total_pop_map <- tm_shape(total_deaths_geo)+
  tm_fill(col="JUMLAH_PEN", style="jenks", n = 5, title= "Total Population", palette="Purples") +
   tm_layout(main.title = "Total Population",
             main.title.position = "center",
             main.title.size = 1.5) +
  tm_borders(alpha = 0.3) 



tmap_arrange(relative_risk_map, total_death_map, total_pop_map, ncol=3)

From the maps above, we can see that it is not always the case that those sub-districts with a higher total deaths having a higher relative risk.
Sub-districts with higher population and deaths has generally lower relative risk. Similarly, sub-districts with lower population and deaths count has generally higher relative risk.

11. Conclusion

Some of the conclusions that we can derive from this analysis includes:

Using manual classification is not recommended whenever there is a right skewed distribution in our data, unless we know specifically which intervals are the best to use, as it shows little to no important patterns.
Using a proper data classification when plotting chloropleth maps is crucial as more significant patterns can be observed (more darker shades or blue and red) for both cumulative confirmed and death cases rate.
It is generally useful to further enhance our analysis when our data consists of outliers using percentile maps and box maps.However, different maps have the ability to show different kinds of patterns. We should ultimately choose specific ones to fit our data.
The patterns from the spatial temporal analysis of cumulative confirmed cases rate were more significant when using a box map while the patterns from the spatial temporal analysis of cumulative death cases rate were more significant when using a percentile map.
In the Relative Risk map, it shows that sub-districts with higher total deaths does not necessarily lead to a higher relative risk.
Similarly, in the chloropleth maps using Jenks classification, it also shows that sub-districts with a higher cumulative confirmed cases rates does not necessarily lead to a a higher cumulative death cases rates.

12. References

Anselin, L. (2020, September 15). Basic mapping. GeoDa on Github. Retrieved from: https://geodacenter.github.io/workbook/3a_mapping/lab3a.html
Anselin, L. (2018, August 3). Maps for rates or proportions. GeoDa on Github. Retrieved from: https://geodacenter.github.io/workbook/3b_rates/lab3b.html
Gimond, M. (2021, September 07). E mapping rates in R | Intro to GIS and spatial analysis. Main repos | mgimond.github.io. Retrieved from: https://mgimond.github.io/Spatial/mapping-rates-in-r.html#raw-rates
Lovelace, R., Nowosad, J., & Muenchow, J. (n.d.). Chapter 8 making maps with R. Welcome | Geocomputation with R. Retrieved from: https://geocompr.robinlovelace.net/adv-map.html
Situation report #5: COVID-19 pandemic - 24th March 2020 - Indonesia. (24 March 2020). ReliefWeb. Retrieved from: https://reliefweb.int/report/indonesia/situation-report-5-covid-19-pandemic-24th-march-2020
Tin Seong.K (2021, August 30) In-class Exercise 3: Analytical Mapping. Retrieved from: https://is415.netlify.app/in-class_ex/in-class_ex03/in-class_ex03#1

Take-home Exercise 1

1. Background Information

2. Dataset

3. Install and Load R packages

4. Importing Geospatial Data

4.1 Importing polygon feature data in shapefile format

Code Chunk

Glimpse

4.2 Retrieve coordinate reference system from object using st_crs of sf package

4.3 Assign EPSG code to a simple feature data frame

4.3.1 Assign correct EPSG code

4.3.2 Check CSR again

4.3.3 Plot dkij_sf23845

5. Geospatial Data Wrangling

5.1 Exclude outer islands from DKI Jakarta

5.1.1 Check islands using tmap_mode

5.1.2 Exclude outer islands (“KEPULAUAN SERIBU”) from DKI Jakarta using filter function

5.1.3 Plot dkij_5cities

5.2 Retain first 9 fields using basic indexing of baseR.

Code Chunk

Head

6. Importing, Extracting and Wrangling Aspatial Data

5.1 Import and extract required columns

6.2 Integrate the daily data into a data frame by month using rbind.data.frame

6.3 Save Aspatial file as RDS

6.4 Read RDS Aspatial file

Code Chunk

Glimpse

7. Integrate Aspatial and Geospatial data

7.1 Combine COVID aspatial data and DKI Jakarta geospatial frame into simple feature data frame using left_join

7.1.1 Choosing the join key

7.1.1.1 Discrepancies between KECAMATAN and nama_kecamatan

7.1.1.2 Discrepancies between DESA_KELUR and nama_kelurahan

7.1.2 Join geospatial and aspatial data using selected join key

Code Chunk

Head

7.2 Calculate cumulative confirmed cases rate (i.e. cases per 10000 population) and the cumulative death cases rate by month

7.2.1 Check for any NA values in the df

7.2.2 Set geometry to NULL first as we need to pivot our table for the calculation later

Code Chunk

Glimpse

7.2.3 Perform calculation for cumulative confirmed cases rate by month (POSITIF)

Code Chunk

Glimpse

7.2.4 Perform calculation for cumulative death cases rate by month (Meninggal)

Code Chunk

Glimpse

7.2.5 Perform calculation for relative risk

Code Chunk

Glimpse

Code Chunk

Glimpse

8.1 Create a vector containing the months column names in ascending order

8.2 Cumulative confirmed cases rates

8.2.1 Time Series Box Plot - Cumulative confirmed cases rates

8.2.1.1 Create a list containing all the boxplots by months

8.2.1.2 Plot boxplots to examine DKI Jakarta’s COVID-19 cases rates across months

8.2.2 Time Series Histogram - Cumulative confirmed cases rates

8.2.2.1 Create a list containing all the histogram by months

8.2.2.2 Plot histograms to examine DKI Jakarta’s COVID-19 cumulative confirmed cases rates across months

8.3 Cumulative death cases rates

8.3.1 Time Series Box Plot - Cumulative death cases rates

8.3.1.1 Create a list containing all the boxplots by months

8.3.1.2 Plot boxplots to examine DKI Jakarta’s COVID-19 death cases rates across months

8.3.2 Time Series Histogram - Cumulative death cases rates

8.3.2.1 Create a list containing all the histogram by months

8.3.2.2 Plot histograms to examine DKI Jakarta’s COVID-19 cumulative death cases rates across months

9. Thematic Mapping

9.1 Time-series choropleth map of COVID-19 rates - Manual Classification

9.1.1 Check summary of cumulative confirmed cases rate by month (POSITIF)

9.1.2 Check summary of cumulative death cases rate by month (Meninggal)

9.1.3 Join cumulative confirmed rates with dkij_slice sf dataframe

9.1.3.1 Cumulative confirmed cases rates

Code Chunk

Glimpse

9.1.3.2 Cumulative death cases rates

Code Chunk

Glimpse

9.1.4 Time-series choropleth map of cumulative confirmed cases rates (Manual Classification)

9.1.4.1 Create list containing all maps by months