Hands-on Exercise 7C

Author

Janet Toa

Published

February 26, 2024

Modified

February 26, 2024

23 Analytical Mapping

23.1 Overview and Learning Outcomes

This hands-on exercise is based on Chapter 23 of the R for Visual Analytics book.

The learning outcomes are to use the appropriate functions of the tmap and tidyverse packages to plot analytical maps through the following steps:

Import geospatial data in rds format into the R environment.
Create cartographic quality choropleth maps using functions from the tmap package.
Create a rate map.
Create a percentile map.
Create a boxmap.

23.2 Getting Started

23.2.1 Installing and Loading Required Libraries

In this hands-on exercise, the following R packages are used:

tidyverse (i.e. readr, tidyr, dplyr) for performing data science tasks such as importing, tidying, and wrangling data;
sf for handling geospatial data; and
tmap for thematic mapping.

The code chunk below uses the p_load() function in the pacman package to check if the packages are installed. If yes, they are then loaded into the R environment. If no, they are installed, then loaded into the R environment.

pacman::p_load(tidyverse, sf, tmap)

23.2.2 Importing Data

The dataset for this hands-on exercise is imported into the R environment using the read_rds() function in the readr package and stored as the R object, NGA_wp. The data is from a prepared dataset, NGA_wp.rds, which is a polygon feature data frame providing information on water points in Nigeria at the LGA level.

NGA_wp = read_rds("data/rds/NGA_wp.rds")

23.3 Basic Choropleth Mapping

23.3.1 Visualising Distribution of Non-functional Water Point

The non-functional water points, and total functional water points are plotted side by side.

p1 = tm_shape(NGA_wp) +
  tm_fill("wp_functional",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of functional\nwater point by LGAs",
            legend.outside = FALSE)

p2 = tm_shape(NGA_wp) +
  tm_fill("total_wp",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of total\nwater point by LGAs",
            legend.outside = FALSE)

tmap_arrange(p2, p1, nrow = 1)

23.4 Choropleth Map for Rates

It is important to consider rates on top of simply counts of things, given that water points are not equally distributed in space.

23.4.1 Deriving Proportion of Functional Water Points and Non-functional Water Points

The proportion of functional water points and the proportion of non-functional water points in each LGA are derived. The mutate() function in the dplyr package is used to derive two fields, “pct_functional”, and “pct_nonfunctional”.

NGA_wp = NGA_wp %>%
  mutate(pct_functional = wp_functional/total_wp) %>%
  mutate(pct_nonfunctional = wp_nonfunctional/total_wp)

23.4.2 Plotting Map of Rate

The map of rates (for functional water points) is then plotted below.

tm_shape(NGA_wp) +
  tm_fill("pct_functional",
          n = 10,
          style = "equal",
          palette = "Blues",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of functional water point by LGAs",
            legend.outside = TRUE)

23.5 Extreme Value Maps

Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin, 1994).

23.5.1 Percentile Map

The percentile map is a special type of a quantile map with six specific categories: 0-1%,1-10%, 10-50%,50-90%,90-99%, and 99-100%. The corresponding breakpoints can be derived by means of the base R quantile command, passing an explicit vector of cumulative probabilities as c(0,.01,.1,.5,.9,.99,1). Note that the begin and endpoint need to be included.

First, NA records are removed. Then, the customised classification method is created and the relevant values are extracted.

When variables are extracted from an sf data frame, the geometry is extracted as well. For mapping and spatial manipulation, this is the expected behavior, but many base R functions cannot deal with the geometry. Specifically, the quantile() function gives an error. As a result st_set_geomtry(NULL) function is used to drop geometry field.

NGA_wp = NGA_wp %>%
  drop_na()

percent = c(0,.01,.1,.5,.9,.99,1)

var = NGA_wp["pct_functional"] %>%
  st_set_geometry(NULL)

quantile(var[,1], percent)

       0%        1%       10%       50%       90%       99%      100% 
0.0000000 0.0000000 0.2169811 0.4791667 0.8611111 1.0000000 1.0000000

23.5.2 Why Write Functions?

Writing a function has three big advantages over using copy-and-paste:

Can give a function an evocative name that makes your code easier to understand.
As requirements change, you only need to update code in one place, instead of many.
Eliminate the chance of making incidental mistakes when copying and pasting (i.e. updating a variable name in one place, but not in another).

Source: Chapter 19: Functions of R for Data Science.

23.5.2.1 Creating the get.var Function

First, an R function is written to extract a variable (i.e. “wp_nonfunctional”) as a vector out of an sf data frame.

Note:

Arguments:

“vname”: variable name (as character, in quotes).

“df”: name of sf data frame.

Return:

v: vector with values (without a column name).

get.var = function(vname,df) {
  v = df[vname] %>% 
    st_set_geometry(NULL)
  v = unname(v[,1])
  return(v)
}

23.5.2.2 A Percentile Mapping Function

Then, a percentile mapping function is written.

percentmap = function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
  percent = c(0,.01,.1,.5,.9,.99,1)
  var = get.var(vnam, df)
  bperc = quantile(var, percent)
  tm_shape(df) +
  tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,
             title=legtitle,
             breaks=bperc,
             palette="Blues",
          labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("right","bottom"))
}

23.5.2.3 Testing the Percentile Mapping Function

A bare bones implementation is done below. Additional arguments involving the title, legend positioning, etc. could be passed to the function to customise various features of the map.

percentmap("total_wp", NGA_wp)

23.5.2 Box Map

In essence, a box map is an augmented quartile map, with an additional lower and upper category.

When there are lower outliers, then the starting point for the breaks is the minimum value, and the second break is the lower fence.
In contrast, when there are no lower outliers, then the starting point for the breaks will be the lower fence, and the second break is the minimum value (there will be no observations that fall in the interval between the lower fence and the minimum value).

Note: To create a box map, a custom breaks specification will be used. However, there is a complication. The break points for the box map vary depending on whether lower or upper outliers are present.

ggplot(data = NGA_wp,
       aes(x = "",
           y = wp_nonfunctional)) +
  geom_boxplot()

23.5.2.1 Creating the boxbreaks function

An R function that creates break points for a box map is written.

Arguments:
- “v”: vector with observations.
- “mult”: multiplier for IQR (default 1.5).
Return:
- “bb”: vector with 7 break points compute quartile and fences.

boxbreaks = function(v,mult=1.5) {
  qv = unname(quantile(v))
  iqr = qv[4] - qv[2]
  upfence = qv[4] + mult * iqr
  lofence = qv[2] - mult * iqr
  # initialize break points vector
  bb = vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) {  # no lower outliers
    bb[1] = lofence
    bb[2] = floor(qv[1])
  } else {
    bb[2] = lofence
    bb[1] = qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
    bb[7] = upfence
    bb[6] = ceiling(qv[5])
  } else {
    bb[6] = upfence
    bb[7] = qv[5]
  }
  bb[3:5] = qv[2:4]
  return(bb)
}

23.5.2.2 Testing the Boxbreaks Function

The same get.var function written at sub-section 23.5.2.1 is reused. The boxbreaks function is tested.

var = get.var("wp_nonfunctional", NGA_wp) 
boxbreaks(var)

[1] -56.5   0.0  14.0  34.0  61.0 131.5 278.0

23.5.2.2 Boxmap function

An R function to create a box map.

Arguments:
- “vnam”: variable name (as character, in quotes).
- “df”: simple features polygon layer.
- “legtitle”: legend title.
- “mtitle”: map title.
- “multi”: mutliplier for IQR.
Return: a tmap elements (i.e., plots a map).

boxmap = function(vnam, df, 
                   legtitle=NA,
                   mtitle="Box Map",
                   mult=1.5){
  var = get.var(vnam,df)
  bb = boxbreaks(var)
  tm_shape(df) +
    tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Blues",
          labels = c("lower outlier", 
                     "< 25%", 
                     "25% - 50%", 
                     "50% - 75%",
                     "> 75%", 
                     "upper outlier"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("left",
                               "top"))
}

tmap_mode("plot")
boxmap("wp_nonfunctional", NGA_wp)

~~~ End of Hands-on Exercise 7C ~~~