Pebesma E.
Simple features are a standardized way of encoding spatial vector data (points, lines, polygons) in computers. The sf package implements simple features in R, and has roughly the same capacity for spatial vector data as packages sp, rgeos and rgdal. We describe the need for this package, its place in the R package ecosystem, and its potential to connect R to other computer systems. We illustrate this with examples of its use. What are simple features? Features can be thought of as “things” or objects that have a spatial location or extent; they may be physical objects like a building, or social conventions like a political state. Feature geometry refers to the spatial properties (location or extent) of a feature, and can be described by a point, a point set, a linestring, a set of linestrings, a polygon, a set of polygons, or a combination of these. The simple adjective of simple features refers to the property that linestrings and polygons are built from points connected by straight line segments. Features typically also have other properties (temporal properties, color, name, measured quantity), which are called feature attributes. Not all spatial phenomena are easy to represent by “things or objects”: continuous phenoma such as water temperature or elevation are better represented as functions mapping from continuous or sampled space (and time) to values (Scheider et al., 2016), and are often represented by raster data rather than vector (points, lines, polygons) data. Simple feature access (Herring, 2011) is an international standard for representing and encoding spatial data, dominantly represented by point, line and polygon geometries (ISO, 2004). It is widely used e.g. by spatial databases (Herring, 2010), GeoJSON (Butler et al., 2016), GeoSPARQL (Perry and Herring, 2012), and open source libraries that empower the open source geospatial software landscape including GDAL (Warmerdam, 2008), GEOS (GEOS Development Team, 2017) and liblwgeom (a PostGIS component, Obe and Hsu (2015)). The need for a new package Package sf (Pebesma, 2017) is an R package for reading, writing, handling and manipulating simple features in R, reimplementing the vector (points, lines, polygons) data handling functionality of packages sp (Pebesma and Bivand, 2005; Bivand et al., 2013), rgdal (Bivand et al., 2017) and rgeos (Bivand and Rundel, 2017). However, sp has some 400 direct reverse dependencies, and a few thousand indirect ones. Why was there a need to write a package with the potential to replace it? First of all, at the time of writing sp (2003) there was no standard for simple features, and the ESRI shapefile was by far the dominant file format for exchanging vector data. The lack of a clear (open) standard for shapefiles, the omnipresence of “bad” or malformed shapefiles, and the many limitations of the ways it can represent spatial data adversely affected sp, for instance in the way it represents holes in polygons, and a lack of discipline to register holes with their enclosing outer ring. Such ambiguities could influence plotting of data, or communication with other systems or libraries. The simple feature access standard is now widely adopted, but the sp package family has to make assumptions and do conversions to load them into R. This means that you cannot round-trip data, as of: loading data in R, manipulating them, exporting them and getting the same geometries back. With sf, this is no longer a problem. A second reason was that external libraries heavily used by R packages for reading and writing spatial data (GDAL) and for geometrical operations (GEOS) have developed stronger support for the simple feature standard. A third reason was that the package cluster now known as the tidyverse (Wickham, 2017, 2014), which includes popular packages such as dplyr (Wickham et al., 2017) and ggplot2 (Wickham, 2016), does not work well with the spatial classes of sp: • tidyverse packages assume objects not only behave like data.frames (which sp objects do by providing methods), but are data.frames in the sense of being a list with equally sized column vectors, which sp does not do. The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 2 • attempts to “tidy” polygon objects for plotting with ggplot2 (“fortify”) by creating data.frame objects with records for each polygon node (vertex) were neither robust nor efficient. A simple (S3) way to store geometries in data.frame or similar objects is to put them in a geometry list-column, where each list element contains the geometry object of the corresponding record, or data.frame “row”; this works well with the tidyverse package family.
,