Hands-on Exercise 2

In this hands-on exercise, I learn how to handle geospatial data in R by using sf package and performing data science tasks using tidyverse package.

Nor Aisyah https://www.linkedin.com/in/nor-aisyah/
08-26-2021

1. Getting Started

This code chunk performs 3 tasks: - A list called packages will be created and will consists of all the R packages required to accomplish this hands-on exercise. - Check if R packages on package have been installed in R and if not, they will be installed. - After all the R packages have been installed, they will be loaded.

packages = c('sf', 'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

1.1 My own notes:

  1. More about packages used:

2. Importing Geospatial Data

Data used: - MP14_SUBZONE_WEB_PL, a polygon feature layer in ESRI shapefile format, - CyclingPath, a line feature layer in ESRI shapefile format, and - PreSchool, a point feature layer in kml file format.

2.1 Importing polygon feature data in shapefile format

mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\aisyahajit2018\IS415\IS415_blog\_posts\2021-08-26-hands-on-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

2.2 Importing polyline feature data in shapefile form

cyclingpath = st_read(dsn = "data/geospatial", layer = "CyclingPath")
Reading layer `CyclingPath' from data source 
  `C:\aisyahajit2018\IS415\IS415_blog\_posts\2021-08-26-hands-on-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3336 features and 2 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 12831.45 ymin: 28347.98 xmax: 42799.89 ymax: 48948.15
Projected CRS: SVY21

2.3 Importing GIS/Line data in kml format

preschool = st_read("data/geospatial/pre-schools-location-kml.kml")
Reading layer `PRESCHOOLS_LOCATION' from data source 
  `C:\aisyahajit2018\IS415\IS415_blog\_posts\2021-08-26-hands-on-exercise-2\data\geospatial\pre-schools-location-kml.kml' 
  using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

2.4 My own notes:

  1. Difference between Polyline and Polygon :
  1. Both mpsz and cycling path have svy21 as the projected coordinates systems.

  2. Only preschool have wgs84 as the projected coordinates systems.

3. Checking the Content of A Simple Feature Data Frame

3.1 st_geometry() function

mpsz$geometry
Geometry set for 323 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 5 geometries:
st_geometry(mpsz)
Geometry set for 323 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 5 geometries:

3.2 glimpse() function

glimpse(mpsz)
Rows: 323
Columns: 16
$ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15~
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1,~
$ SUBZONE_N  <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HEN~
$ SUBZONE_C  <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03",~
$ CA_IND     <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N",~
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUK~
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "~
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGI~
$ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "~
$ INC_CRC    <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF0~
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, ~
$ X_ADDR     <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96,~
$ Y_ADDR     <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70,~
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594,~
$ SHAPE_Area <dbl> 1630379.3, 559816.2, 160807.5, 595428.9, 387429.4~
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULT~

3.3 head() function

head(mpsz, n=5)  
Simple feature collection with 5 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 25867.68 ymin: 28369.47 xmax: 32362.39 ymax: 30435.54
Projected CRS: SVY21
  OBJECTID SUBZONE_NO      SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
1        1          1   MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
2        2          1   PEARL'S HILL    OTSZ01      Y          OUTRAM
3        3          3      BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
4        4          8 HENDERSON HILL    BMSZ08      N     BUKIT MERAH
5        5          3        REDHILL    BMSZ03      N     BUKIT MERAH
  PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D
1         MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05
2         OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05
3         SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05
4         BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05
5         BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05
    X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
1 31595.84 29220.19   5267.381  1630379.3
2 28679.06 29782.05   3506.107   559816.2
3 29654.96 29974.66   1740.926   160807.5
4 26782.83 29933.77   3313.625   595428.9
5 26201.96 30005.70   2825.594   387429.4
                        geometry
1 MULTIPOLYGON (((31495.56 30...
2 MULTIPOLYGON (((29092.28 30...
3 MULTIPOLYGON (((29932.33 29...
4 MULTIPOLYGON (((27131.28 30...
5 MULTIPOLYGON (((26451.03 30...

4.Plotting the Geospatial Data

4.1 plot() function

plot(st_geometry(mpsz))

plot(mpsz)

plot(mpsz["PLN_AREA_N"])

5. Working with Projection

Projection transformation: Project a simple feature data frame from one coordinate system to another coordinate system

5.1 Assigning EPSG code to a simple feature data frame

5.1.1 Retrieve coordinate reference system from object using st_crs function

st_crs(mpsz)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

5.1.2 Assign correct EPSG code

mpsz3414 <- st_set_crs(mpsz, 3414)

5.1.3 Check CSR again

st_crs(mpsz3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

5.1.4 Transform the original data from geographic coordinate system to projected coordinate system.

preschool3414 <- st_transform(preschool, crs = 3414)
st_geometry(preschool3414)
Geometry set for 1925 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 11203.01 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

6. Importing and Converting An Aspatial Data

listings <- read_csv("data/aspatial/listings.csv")

6.1 View data using glimpse vs list

6.1.1 glimpse()

glimpse(listings) 
Rows: 4,252
Columns: 16
$ id                             <dbl> 50646, 71609, 71896, 71903, 2~
$ name                           <chr> "Pleasant Room along Bukit Ti~
$ host_id                        <dbl> 227796, 367042, 367042, 36704~
$ host_name                      <chr> "Sujatha", "Belinda", "Belind~
$ neighbourhood_group            <chr> "Central Region", "East Regio~
$ neighbourhood                  <chr> "Bukit Timah", "Tampines", "T~
$ latitude                       <dbl> 1.33432, 1.34537, 1.34754, 1.~
$ longitude                      <dbl> 103.7852, 103.9589, 103.9596,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 80, 178, 81, 81, 52, 40, 72, ~
$ minimum_nights                 <dbl> 90, 90, 90, 90, 14, 14, 90, 8~
$ number_of_reviews              <dbl> 18, 20, 24, 48, 20, 13, 133, ~
$ last_review                    <date> 2014-07-08, 2019-12-28, 2014~
$ reviews_per_month              <dbl> 0.22, 0.28, 0.33, 0.67, 0.20,~
$ calculated_host_listings_count <dbl> 1, 4, 4, 4, 50, 50, 7, 1, 50,~
$ availability_365               <dbl> 365, 365, 365, 365, 353, 364,~

6.1.2 list()

list(listings) 
[[1]]
# A tibble: 4,252 x 16
       id name        host_id host_name neighbourhood_g~ neighbourhood
    <dbl> <chr>         <dbl> <chr>     <chr>            <chr>        
 1  50646 Pleasant R~  227796 Sujatha   Central Region   Bukit Timah  
 2  71609 Ensuite Ro~  367042 Belinda   East Region      Tampines     
 3  71896 B&B  Room ~  367042 Belinda   East Region      Tampines     
 4  71903 Room 2-nea~  367042 Belinda   East Region      Tampines     
 5 275343 Convenient~ 1439258 Joyce     Central Region   Bukit Merah  
 6 275344 15 mins to~ 1439258 Joyce     Central Region   Bukit Merah  
 7 294281 5 mins wal~ 1521514 Elizabeth Central Region   Newton       
 8 301247 Nice room ~ 1552002 Rahul     Central Region   Geylang      
 9 324945 20 Mins to~ 1439258 Joyce     Central Region   Bukit Merah  
10 330089 Accomo@ RE~ 1439258 Joyce     Central Region   Bukit Merah  
# ... with 4,242 more rows, and 10 more variables: latitude <dbl>,
#   longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>,
#   last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>

6.2 Create a simple feature data frame from an aspatial data frame

listings_sf <- st_as_sf(listings, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>% st_transform(crs = 3414)
glimpse(listings_sf) 
Rows: 4,252
Columns: 15
$ id                             <dbl> 50646, 71609, 71896, 71903, 2~
$ name                           <chr> "Pleasant Room along Bukit Ti~
$ host_id                        <dbl> 227796, 367042, 367042, 36704~
$ host_name                      <chr> "Sujatha", "Belinda", "Belind~
$ neighbourhood_group            <chr> "Central Region", "East Regio~
$ neighbourhood                  <chr> "Bukit Timah", "Tampines", "T~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 80, 178, 81, 81, 52, 40, 72, ~
$ minimum_nights                 <dbl> 90, 90, 90, 90, 14, 14, 90, 8~
$ number_of_reviews              <dbl> 18, 20, 24, 48, 20, 13, 133, ~
$ last_review                    <date> 2014-07-08, 2019-12-28, 2014~
$ reviews_per_month              <dbl> 0.22, 0.28, 0.33, 0.67, 0.20,~
$ calculated_host_listings_count <dbl> 1, 4, 4, 4, 50, 50, 7, 1, 50,~
$ availability_365               <dbl> 365, 365, 365, 365, 353, 364,~
$ geometry                       <POINT [m]> POINT (22646.02 35167.9~

7. Geoprocessing with sf package

Task on hand: Determine the extend of the land need to be acquired and their total area

7.1 Buffer

7.1.1 Compute the 5-meter buffers around cycling paths using st_buffer

buffer_cycling <- st_buffer(cyclingpath, dist=5, nQuadSegs = 30)

7.1.2 Calculate area of the buffers

buffer_cycling$AREA <- st_area(buffer_cycling)

7.1.3 Derive the total land involved with sum()

sum(buffer_cycling$AREA)
1642750 [m^2]

8. Point-in-polygon count

8.1 Find no. of preschools in each Planning Subzone

mpsz3414$`PreSch Count`<- lengths(st_intersects(mpsz3414, preschool3414))
summary(mpsz3414$`PreSch Count`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    3.00    5.96    9.00   58.00 

8.2 List planning subzone with most number of pre-school using top_n()

top_n(mpsz3414, 1, `PreSch Count`)
Simple feature collection with 1 feature and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 39655.33 ymin: 35966 xmax: 42940.57 ymax: 38622.37
Projected CRS: SVY21 / Singapore TM
  OBJECTID SUBZONE_NO     SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1      189          2 TAMPINES EAST    TMSZ02      N   TAMPINES
  PLN_AREA_C    REGION_N REGION_C          INC_CRC FMEL_UPD_D
1         TM EAST REGION       ER 21658EAAF84F4D8D 2014-12-05
    X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
1 41122.55 37392.39   10180.62    4339824
                        geometry PreSch Count
1 MULTIPOLYGON (((42196.76 38...           58

8.3 Calculate density of pre-school by planning subzone.

8.3.1 Derive area of each planning subzone using st_area() of sf package first

mpsz3414$Area <- mpsz3414 %>% st_area()

8.3.2 Compute the density using mutate() of dplyr

mpsz3414 <- mpsz3414 %>%
  mutate(`PreSch Density` = `PreSch Count`/Area * 1000000)

9 Explorotary Data Analysis (EDA)

9.1 Histogram to showdistribution of PreSch Density

hist(mpsz3414$`PreSch Density`)

9.2 Enhance histogram using ggplot

ggplot(data=mpsz3414, 
       aes(x= as.numeric(`PreSch Density`)))+
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue") +
  labs(title = "Are pre-school even distributed in Singapore?",
       subtitle= "There are many planning sub-zones with a single pre-school, on the other hand, \nthere are two planning sub-zones with at least 20 pre-schools",
      x = "Pre-school density (per km sq)",
      y = "Frequency")

9.3 Plot a scatterplot showing the relationship between Pre-school Density and Pre-school Count.

9.3.1 Default, showing different scale for x and y axis

ggplot(data=mpsz3414, 
       aes(y = `PreSch Count`, 
           x= as.numeric(`PreSch Density`)))+
  geom_point(color="black", 
             fill="light blue") + 
  labs(title = "",
      x = "Pre-school density (per km sq)",
      y = "Pre-school count")

9.3.2 Edit xlim and ylim to show same scale for x and y axis

Done! :)