Geographic Visualisation of Ecological Data in R

Geographic Visualisation of Ecological Data in R

This short tutorial explains basic geographic data visualisation for ecologists. I assume a basic knowledge of R, however if you have never used R before you should be able to follow along and copy the below scripts.

If you do not currently have R installed you can download it here. I would also highly recommend downloading the free RStudio editor.

For an introduction to R, i suggest the excelent and free R for data science ebook by Hadley Wickham & Garrett Grolemund, as well as the interactive datacamp course free introduction to R

Plotting basic maps

Set up for analysis

The first step for our analysis to set our working directory, and install the required R packages to run the analysis

The packages we will install are:

  • tidyverse - This is a collection of handy packages for data manipulation, exploration and visualization
  • maptools - This package contains the base world map data we will use
  • devtools - This package will allow installation of packages that have not yet been released on CRAN, such as the below patchwork package, directly from github
  • patchwork - This package allows easily composition of multiplots
setwd("/Users/alexanderpiper/Dropbox/R/Geographic") ##Change this to the directory you wish to work in

#install packages from CRAN
install.packages(c("tidyverse","maptools","devtools", "gpclib"))

# install packages from github
devtools::install_github("thomasp85/patchwork")

After packages are downloaded and installed, we then load them into our workspace

library("tidyverse")
library("maptools")
library("devtools")
library("patchwork")

Plotting a basic world map

First step is to plot a basic world map. For this we will use the simple world map dataset contained within the maptools package

#Get the simple world map data
data(wrld_simpl)
wrld_simpl@data$id <- wrld_simpl@data$NAME
gpclibPermit() # Give permissions to maptools to use gpclib
## [1] TRUE
wrld <- fortify(wrld_simpl, region="id")

# Plot simple worldmap with ggplot
wrld_plot <- ggplot() + geom_map(data=wrld, map=wrld, aes(map_id=id, x=long, y=lat)) + coord_equal() 

#plot the map
plot(wrld_plot)

Changing the base map colours

Now we make some small changes to the base map colours and background to make it more visible

wrld_plot <- ggplot() + geom_map(data=wrld, map=wrld, aes(map_id=id, x=long, y=lat), fill="lightgrey", color="#7f7f7f", size=0.25) + 
  coord_equal() + 
  theme_bw() #changes the theme to basic black and white

#plot the map
plot(wrld_plot)

Thats looking much better, but what if we only have a dataset within australia? We can subsert the map using geographic coordinates.

aus_plot <- wrld_plot + coord_fixed(ylim = c(-50,0), xlim=c(110,180)) #change these for desired region 

#plot the map of australia only
plot(aus_plot)

Part one: Specimen collection locations

Now we want to plot some data on our map, for this we are plotting the collection locations of some tephritid fruit fly specimens from the Victorian Agricultural Insect Collection that i am using during my PhD

#Read in our datafile
obs.data <- read_csv("sampleloc.csv")
head(obs.data) # display the top of the file
## # A tibble: 6 x 3
##   Sample location                             Species  
##   <chr>  <chr>                                <chr>    
## 1 7186a  Kalumburu Western Australia          B. tryoni
## 2 7186b  Kalumburu Western Australia          B. tryoni
## 3 7186c  Kalumburu Western Australia          B. tryoni
## 4 7186d  Kalumburu Western Australia          B. tryoni
## 5 7186e  Kalumburu Western Australia          B. tryoni
## 6 7187a  Cobourg Peninsula Northern Territory B. tryoni

This data file contains 3 columns (or variables) and 259 entries. Many functions in R expect data to be in a long format rather than a wide format, ie: each column is a variable and each row is an entry. The variables in this dataset are: Sample ID, Collection location and Species. Your specific data file could have more or less columns depending on how many variables were recorded for your specimens and this is fine. If we look at this we can notice a problem. The collection data i have been supplied alongside these specimens contains only the location name, rather than geographic coordinates, oops!

Unfortunately R is not an encyclopedia, and will not know where to put these locations unless we can provide lattitude and longitude coordinates to match with those in our base map. For this dataset, i could manually correct this, by contacting the original specimen collectors, or doing a google maps search. However this could be very time consuming repetitive task for larger datasets.

If you already have the lattitude and longitude info for your samples already, thank whoever collected your specimens for using their gps and skip this section, otherwise to solve this issue, we are going to get the computer to do the grunt work of geocoding for us.

Geocoding

Geocoding is the process of converting locationes (i.e. “1 Ocean Avenue, Surfers Paradise QLD”) into geographic coordinates (i.e. latitude -27.993412, and longitude 153.429921), which you can then use to place markers on a map.

Online maps packages such as google maps and openstreetmaps allow you to query their API using HTML and geocode data. The package ggmaps contains a function to interface with the google maps api, however google now requires you to register and fill in credit card details to access this API

Instead, we will create a function to geocode our data using the openstreetmap api. You do not need to know the specifics of this code, but the basic idea is it searches a character string “location”, and downloads the JSON formatted data from openstreetmap, then returns a data frame that contains the query adress, the longitude and lattitude.

get_geocode <- function(location = NULL)
{
  if(suppressWarnings(is.null(location)))
    return(data.frame())
  tryCatch(
    d <- jsonlite::fromJSON( 
      gsub('\\@addr\\@', gsub('\\s+', '\\%20', location), 
           'http://nominatim.openstreetmap.org/search/@addr@?format=json&locationdetails=0&limit=1')
    ), error = function(c) return(data.frame())
  )
  if(length(d) == 0) return(data.frame())
  return(data.frame(location,lon = as.numeric(d$lon), lat = as.numeric(d$lat)))
}

Now that we have defined the get_geocode function in R, we can then apply it to every sample in in our dataset in order to retrieve the lattitude and longitude. As the input for the function is a character vector of locations, we will need to grab the location column from our dataset and put it in a new variable. To save time querying the openstreetmaps server, we will only get the unique location names from our dataset.

#Get unique locations
location <- unique(obs.data$location)

#What does this look like?
head(location)
## [1] "Kalumburu Western Australia"         
## [2] "Cobourg Peninsula Northern Territory"
## [3] "Coen Queensland"                     
## [4] "Mackay Queensland"                   
## [5] "Brisbane Queensland"                 
## [6] "Sydney New South Wales"

You can see that instead of a dataframe, we now have a character vector

We will then use lapply to apply our get_geocode function we to each of the elements in our character vector. This may take a few minutes depending on how many searches you are conducting and the speed of your internet connection.

As using the get_geocode function will return a list of dataframes (one for each location in the vector), we will collapse this list into one big dataframe using the bind_rows functionality of the dplyr package (part of the tidyverse library we loaded earlier)

#lapply the function to all elements in the location vector
coords <- lapply(location,get_geocode)

#Collapse list of dataframes into one
coords <- bind_rows(coords)

#what does this look like?
head(coords)
##                      location      lon       lat
## 1 Kalumburu Western Australia 126.6426 -14.29503
## 2             Coen Queensland 143.1997 -13.94464
## 3           Mackay Queensland 149.1865 -21.14196
## 4         Brisbane Queensland 153.0235 -27.46897
## 5      Sydney New South Wales 151.2165 -33.85482
## 6          Ellerslie Victoria 142.6832 -38.15133

You can see we now have geographic coordinates for our collection locations, however they are in a seperate data frame to our original data frame that contains other info about the samples (ie: Species name)

Therefore we will use the inner_join function of the dplyr package to join the 2 data frames together. As this relies on using a matching column between the two dataframes as a key, we need to make sure the key column is called the same thing. If it isnt, you can rename the columns in one of the data frames using the colnames() function

#join tables together
samples_latlon <- inner_join(obs.data,coords,by="location")

head(samples_latlon)
## # A tibble: 6 x 5
##   Sample location                    Species     lon   lat
##   <chr>  <chr>                       <chr>     <dbl> <dbl>
## 1 7186a  Kalumburu Western Australia B. tryoni  127. -14.3
## 2 7186b  Kalumburu Western Australia B. tryoni  127. -14.3
## 3 7186c  Kalumburu Western Australia B. tryoni  127. -14.3
## 4 7186d  Kalumburu Western Australia B. tryoni  127. -14.3
## 5 7186e  Kalumburu Western Australia B. tryoni  127. -14.3
## 6 3012e  Coen Queensland             B. tryoni  143. -13.9

We now have all the original data, as well as the lattitude and longitude information in one data frame and can go back to plotting. First we will plot on a world map to see how our function did. If there are dots in locations you didnt expect, it will probably be a result of insuficient adress information to uniquely place a location it in the world. If this is so, we can write out the data frame into a csv that can be manually curated in Excel or other programs.

# Plot simple worldmap with ggplot
gg.geocoded <- ggplot(samples_latlon) + geom_map(data=wrld, map=wrld, aes(map_id=id, x=long, y=lat), fill="lightgrey", color="#7f7f7f", size=0.25)

#Add data points
gg.geocoded <- gg.geocoded + geom_point(aes(x=lon, y=lat, color=Species), alpha=.5) + theme(legend.position = "none") + coord_equal()

plot(gg.geocoded)

#we can see some errors, this is because there was not enough unique information recorded for an automated search, therefore manual curation will be required
write.csv(samples_latlon, file="samples_latlon.csv")

Plotting point data

After some manual curation,we can read bck in the data and see how it looks

#Read in our data
cleaned.data <- read_csv("samples_latlon_curated.csv")

# Plot simple worldmap with ggplot
gg.curated <- ggplot(cleaned.data) + geom_map(data=wrld, map=wrld, aes(map_id=id, x=long, y=lat), fill="lightgrey", color="#7f7f7f", size=0.25) + geom_count(aes(x=Long, y=Lat, color=Species), alpha=.5) + 
  theme_bw() + coord_fixed(ylim = c(-90,90), xlim=c(-180,180))  

plot(gg.curated)

#Subset to australia using geographic coordinates
gg.curated <- gg.curated + coord_fixed(ylim = c(-50,0), xlim=c(110,180))  

#Add data points, colouring by species
gg.species <- gg.curated + geom_point(aes(x=Long, y=Lat, color=Species), alpha=.5) + 
  theme_bw() 

plot(gg.species)

We can also adjust the point size to change with the number of entries with the same geographic coordinates by changing geom_point to geom_count

#Add data points, colouring by species
gg.species <- gg.curated + geom_count(aes(x=Long, y=Lat, color=Species), alpha=.5) + 
  theme_bw() 

plot(gg.species)

Customising colours

Finally, we can play with the colours and aesthetics of our map. I am applying the ‘Spectral’ pallette from the colourbrewer package in R. You can see more colour brewer palettes here

#Change colours of data points

gg.col1 <- gg.species +
  scale_color_brewer(palette= "Spectral")
gg.col2 <- gg.species +
  scale_color_brewer(palette= "PrGn")
gg.col3 <- gg.species +
  scale_color_brewer(palette= "RdYlBu")

#Change colours manually
custom.palette <- c("#d73027","#f46d43","#fdae61","#fee08b","#ffffbf","#d9ef8b","#a6d96a","#66bd63","#1a9850")
gg.col4 <- gg.species +
  scale_color_manual(values=custom.palette,aesthetics="colour")

#Plot multiplot using patchwork, removing the legend from all plots for easier viewing

gg.col1 + gg.col2 + gg.col3 + gg.col4 & theme(legend.position = "none")

#alternatively, we can colour by location
gg.location <- gg.curated + geom_count(aes(x=Long, y=Lat, color=Locality), alpha=.5) + 
  theme_bw() 
plot(gg.location)

Adding labels

Ive decide that for this dataset it is best to colour by species, using the Spectral palette from colour brewer. So to finish up i will go ahead with that and add appropriate labels and titles to the plot

#Using spectral palette
gg.species <- gg.species +
  scale_color_brewer(palette= "Spectral")

#Change legend titles
gg.species <- gg.species + labs(size="No. Specimens")

#Change x and y axis titles
gg.species <- gg.species + xlab("Longitude") + ylab("Lattitude")

#give the plot a title
gg.species <- gg.species + labs(title="Collection locations of Tephritidae specimens")

Conclusion

I hope this tutorarial can help with your research. For simple code to create the entire map without tutorial comments interspaced, see below:

#Get the simple world map data
data(wrld_simpl)
wrld_simpl@data$id <- wrld_simpl@data$NAME
wrld <- fortify(wrld_simpl, region="id")

#Read in our data
cleaned.data <- read_csv("samples_latlon_curated.csv")

# Plot map of australia with ggplot
gg.finalmap <- ggplot(cleaned.data) + geom_map(data=wrld, map=wrld, aes(map_id=id, x=long, y=lat), fill="lightgrey", color="#7f7f7f", size=0.25)  + coord_fixed(ylim = c(-50,0), xlim=c(110,180))  

#Add data points
gg.finalmap <- gg.finalmap + geom_count(aes(x=Long, y=Lat, color=Species), alpha=.5) + 
  theme_bw()  +
  scale_color_brewer(palette= "Spectral") + 
  labs(size="No. Specimens") + 
  xlab("Longitude") + 
  ylab("Lattitude") + 
  labs(title="Collection locations of Tephritidae specimens")

plot(gg.finalmap)

The versions of the packages used in creating this document can be found below. To view the original rmarkdown code, use the button in the top left corner.

session info

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_0.0.1 devtools_2.2.1  usethis_1.5.0   maptools_0.9-8 
##  [5] sp_1.3-1        forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
##  [9] purrr_0.3.2     readr_1.3.1     tidyr_0.8.3     tibble_2.1.3   
## [13] ggplot2_3.1.1   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.1         pkgload_1.0.2      jsonlite_1.6      
##  [4] modelr_0.1.4       assertthat_0.2.1   cellranger_1.1.0  
##  [7] yaml_2.2.0         remotes_2.1.0      sessioninfo_1.1.1 
## [10] pillar_1.4.2       backports_1.1.5    lattice_0.20-38   
## [13] glue_1.3.1         digest_0.6.21      RColorBrewer_1.1-2
## [16] rvest_0.3.4        colorspace_1.4-1   htmltools_0.4.0   
## [19] plyr_1.8.4         pkgconfig_2.0.3    broom_0.5.2       
## [22] haven_2.1.0        scales_1.0.0       processx_3.3.1    
## [25] generics_0.0.2     ellipsis_0.3.0     withr_2.1.2       
## [28] lazyeval_0.2.2     cli_1.1.0          magrittr_1.5      
## [31] crayon_1.3.4       readxl_1.3.1       memoise_1.1.0     
## [34] evaluate_0.14      ps_1.3.0           fs_1.3.1          
## [37] fansi_0.4.0        nlme_3.1-140       xml2_1.2.2        
## [40] foreign_0.8-71     gpclib_1.5-5       pkgbuild_1.0.3    
## [43] tools_3.6.0        prettyunits_1.0.2  hms_0.4.2         
## [46] munsell_0.5.0      callr_3.2.0        compiler_3.6.0    
## [49] rlang_0.4.0        grid_3.6.0         rstudioapi_0.10   
## [52] labeling_0.3       rmarkdown_1.16     testthat_2.1.1    
## [55] gtable_0.3.0       R6_2.4.0           lubridate_1.7.4   
## [58] knitr_1.25         utf8_1.1.4         zeallot_0.1.0     
## [61] rprojroot_1.3-2    desc_1.2.0         stringi_1.4.3     
## [64] Rcpp_1.0.2         vctrs_0.2.0        tidyselect_0.2.5  
## [67] xfun_0.10
Avatar
Alexander M Piper
PhD Candidate

Insect pest ecology and evolution through the eye of the gene and computer screen.