Creating maps
Last updated on 2024-05-24 | Edit this page
Estimated time: 124 minutes
Overview
Questions
- How do we create maps using R?
Objectives
- Learn how to create static maps with ggplot2
- Learn how to create interactive maps with mapview
- Learn how to plot iNaturalist observations on a map
R
library(sf)
library(dplyr)
library(readr)
library(ggplot2)
library(mapview)
Geographic data
There are various file formats for geographic data. Shape files for GIS applications, KML for Google maps, geojson for web applications.
You can get boundaries for countries, states, cities, etc from various sources. I googled “Los Angeles county boundary shape” which had a link to “County Boundary | City of Los Angeles Hub - LA GeoHub” https://geohub.lacity.org/datasets/10f1e37c065347e693cf4e8ee753c09b I downloaded the shapefile for LA county.
You can also create your boundaries using GIS applications or GIS web applications.
Mapping iNaturalist data
iNaturalist data includes latitude and longitude, which means we can put the observations in a map. There are several packages to create maps. We will use ggplot and mapview packages.
First, read data from the cleaned iNaturalist observation file.
R
inat <- read_csv('data/cleaned/observations.csv')
OUTPUT
Rows: 93950 Columns: 39
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (23): observed_on_string, time_observed_at, time_zone, user_login, user...
dbl (10): id, user_id, num_identification_agreements, num_identification_di...
lgl (5): captive_cultivated, private_place_guess, private_latitude, privat...
date (1): observed_on
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
See all the column names. “latitude” and “longitude” are the column names we need.
R
names(inat)
OUTPUT
[1] "id" "observed_on_string"
[3] "observed_on" "time_observed_at"
[5] "time_zone" "user_id"
[7] "user_login" "user_name"
[9] "created_at" "updated_at"
[11] "quality_grade" "license"
[13] "url" "image_url"
[15] "sound_url" "tag_list"
[17] "description" "num_identification_agreements"
[19] "num_identification_disagreements" "captive_cultivated"
[21] "oauth_application_id" "place_guess"
[23] "latitude" "longitude"
[25] "positional_accuracy" "private_place_guess"
[27] "private_latitude" "private_longitude"
[29] "public_positional_accuracy" "geoprivacy"
[31] "taxon_geoprivacy" "coordinates_obscured"
[33] "positioning_method" "positioning_device"
[35] "species_guess" "scientific_name"
[37] "common_name" "iconic_taxon_name"
[39] "taxon_id"
We use the sf package to add geographic data to our dataframe.
st_as_sf()
from sf
package will take the
longitude and latitude and add a geometry
column that we
can use for mapping.
- We pass in longitude and latitude columns to coors argument. Must wrap longitude and latitude in quotes.
- crs is coordinate reference system.
-
remove=FALSE
will keep the cooridate columns in the dataframe
R
temp <- inat %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove=FALSE)
use names()
to get a list of all the columns. A
geometry
column was added.
R
names(temp)
OUTPUT
[1] "id" "observed_on_string"
[3] "observed_on" "time_observed_at"
[5] "time_zone" "user_id"
[7] "user_login" "user_name"
[9] "created_at" "updated_at"
[11] "quality_grade" "license"
[13] "url" "image_url"
[15] "sound_url" "tag_list"
[17] "description" "num_identification_agreements"
[19] "num_identification_disagreements" "captive_cultivated"
[21] "oauth_application_id" "place_guess"
[23] "latitude" "longitude"
[25] "positional_accuracy" "private_place_guess"
[27] "private_latitude" "private_longitude"
[29] "public_positional_accuracy" "geoprivacy"
[31] "taxon_geoprivacy" "coordinates_obscured"
[33] "positioning_method" "positioning_device"
[35] "species_guess" "scientific_name"
[37] "common_name" "iconic_taxon_name"
[39] "taxon_id" "geometry"
use select to pick which columns to use.
R
inat_map <- inat %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove=FALSE) %>%
select(id, user_login, common_name, scientific_name, observed_on, url, longitude, latitude, geometry)
static map
Use ggplot to plot the observations. geom_sf
will use
geometry
column to produce a map.
R
ggplot() +
geom_sf(data = inat_map)
There are some observations that are outside of Los Angeles. Use filter to select observations in LA.
R
inat_map <- inat_map %>%
filter(latitude < 40)
create map with ggplot.
R
ggplot() +
geom_sf(data = inat_map)
Use dim()
to show the number of rows and columns. There
are over 90K rows.
R
dim(inat_map)
OUTPUT
[1] 93948 9
interactive map
use mapview package to create interactive maps.
Since there are over 90K rows, an interactive map will be very slow. I suggest not using mapview if there are lots of rows.
To speed up the interactive map, let’s filter the list of observations. Get all observations for Western Fence Lizard.
R
inat_lizard <- inat_map %>%
filter(common_name == 'Western Fence Lizard')
Use dim to get number of rows. About 3000 rows.
R
dim(inat_lizard)
OUTPUT
[1] 2936 9
Create interactive map. You can zoom in and out. Click on observation to see the info.
R
mapview(inat_lizard)
working with other geographic files
Let’s add LA county boundaries to the map.
I downloaded the LA county boundaries from https://geohub.lacity.org/datasets/lacounty::county-boundaries/explore
use read_sf()
from sf
package to read the
shape file.
R
la_county <- read_sf('data/raw/County_Boundary/County_Boundary.shp')
la_county
OUTPUT
Simple feature collection with 7 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -118.9446 ymin: 32.79521 xmax: -117.6464 ymax: 34.8233
Geodetic CRS: WGS 84
# A tibble: 7 × 18
CITY CITY_ID CITY_TYPE CITY_NAME CITY_LABEL COLOR_CODE ABBR CITY_NO
<int> <int> <chr> <chr> <chr> <int> <chr> <int>
1 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
2 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
3 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
4 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
5 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
6 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
7 250 31 Unincorporated Unincorporat… Unincorpo… 1 UNIN 0
# ℹ 10 more variables: DESCRIPTN <chr>, URL <chr>, PHONE <chr>,
# OF_AREA_SM <int>, FEAT_TYPE <chr>, COMMENT <chr>, SUB_TYPE <int>,
# COLOR <chr>, OBJECTID <int>, geometry <MULTIPOLYGON [°]>
add LA County to maps.
R
ggplot() +
geom_sf(data = la_county) +
geom_sf(data = inat_lizard)
R
mapview(la_county) +
mapview(inat_lizard)
Exploring iNaturlist data
Lets look for all iNaturalist observations made in Exposition Park.
I downloaded the boundaries for Exposition Park using this site https://wykhuh.github.io/draw-map-boundaries/
R
expo_park <- st_read('data/raw/boundaries_expo_park_area.geojson') %>%
st_transform(4326)
OUTPUT
Reading layer `boundaries_expo_park_area' from data source
`/Users/wyk/Development/science/city_nature_challenge/NHMLA_workshop/CNC-coding-workshop/site/built/data/raw/boundaries_expo_park_area.geojson'
using driver `GeoJSON'
Simple feature collection with 1 feature and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -118.2915 ymin: 34.01096 xmax: -118.2829 ymax: 34.01806
Geodetic CRS: WGS 84
plot map of Expo Park.
R
ggplot() +
geom_sf(data = expo_park)
R
mapview(expo_park)
We want to get observation inside Expo Park.
You should check if the crs for the inaturalist data and the Expo Park are the same
R
st_crs(expo_park) == st_crs(inat_map)
OUTPUT
[1] TRUE
Use st_intersection()
to get all observations that
inside of Exposition Park.
R
inat_expo <- inat_map %>% st_intersection(expo_park)
WARNING
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Use dim to get row and column count. 93K in LA county. 1191 observation in Expo Park.
R
dim(inat_map)
OUTPUT
[1] 93948 9
R
dim(inat_expo)
OUTPUT
[1] 1191 11
Create map of all observations in Expo Park.
R
ggplot() +
geom_sf(data = expo_park) +
geom_sf(data = inat_expo)
R
mapview(expo_park) +
mapview(inat_expo)
Use alpha.regions to set opacity. use col.regions to set color.
R
mapview(expo_park, alpha.regions=0.3, col.regions="#333333") +
mapview(inat_expo)