The General Transit Feed Specification (GTFS) data format defines a common scheme for describing transit systems, and is widely used by transit agencies around the world and consumed by many software applications. Thus, many R packages have been developed to read, write, manipulate and analyse such feeds, such as gtfs2gps, gtfsrouter, gtfstools and tidytransit.
Each one of these, however, represent GTFS feeds in a slightly different way, making the interoperability between packages harder to accomplish. At the end of the day, this lack of integration results in a more painful experience to the final user who may want to enjoy functions from different packages.
gtfsio offers tools for the development of GTFS-related packages that aim to increase such interoperability. It establishes a standard for representing GTFS feeds using R data types based on Google’s Static GTFS Reference. It provides fast and flexible functions to read and write GTFS feeds while sticking to this standard. It defines a basic gtfs
class which is meant to be extended by packages that depend on it. And it also offers utility functions that support checking the structure of GTFS objects.
This vignette describes the basic usage of gtfsio. Please read get_gtfs_standards()
documentation for more detail on the standards for reading and writing GTFS feeds using R data types.
Before using gtfsio please make sure that you have it installed in your computer. You can download either the most stable version from CRAN or the development version from GitHub:
# stable version
install.packages("gtfsio")
# development version
remotes::install_github("r-transit/gtfsio")
Then attach it to the current R session:
Throughout this demonstration we will be using a few sample files included in the package:
data_dir <- system.file("extdata", package = "gtfsio")
list.files(data_dir)
#> [1] "bad_gtfs.zip" "ggl_gtfs.zip" "locations_feed.zip"
#> [4] "nested_gtfs.zip"
ggl_gtfs.zip
has been manually built from the example GTFS feed provided by Google. The files samples are licensed under Creative Commons Attribution 4.0 License.bad_gtfs.zip
is a modified version of ggl_gtfs.zip
that includes some issues frequently found in GTFS data.To read a feed use the import_gtfs()
function. It takes either a local path or an URL to a GTFS .zip
file and returns a GTFS object (which is, basically, a list
of data frames):
gtfs_path <- file.path(data_dir, "ggl_gtfs.zip")
gtfs_url <- "https://github.com/r-transit/gtfsio/raw/master/inst/extdata/ggl_gtfs.zip"
gtfs_from_path <- import_gtfs(gtfs_path)
names(gtfs_from_path)
#> [1] "calendar_dates" "fare_attributes" "fare_rules" "feed_info"
#> [5] "frequencies" "levels" "pathways" "routes"
#> [9] "shapes" "stop_times" "stops" "transfers"
#> [13] "translations" "trips" "agency" "attributions"
#> [17] "calendar"
gtfs_from_url <- import_gtfs(gtfs_url)
names(gtfs_from_url)
#> [1] "calendar_dates" "fare_attributes" "fare_rules" "feed_info"
#> [5] "frequencies" "levels" "pathways" "routes"
#> [9] "shapes" "stop_times" "stops" "transfers"
#> [13] "translations" "trips" "agency" "attributions"
#> [17] "calendar"
The function reads, by default, all .txt
files contained in the .zip
file. Alternatively, you can specify which files should be read with the files
argument (note: without the .txt
extension):
gtfs <- import_gtfs(gtfs_path, files = c("shapes", "trips"))
gtfs
#> $shapes
#> shape_id shape_pt_lat shape_pt_lon shape_pt_sequence shape_dist_traveled
#> <char> <num> <num> <int> <num>
#> 1: A_shp 37.61956 -122.4816 1 0.0000
#> 2: A_shp 37.64430 -122.4107 2 6.8310
#> 3: A_shp 37.65863 -122.3084 3 15.8765
#>
#> $trips
#> route_id service_id trip_id trip_headsign block_id
#> <char> <char> <char> <char> <char>
#> 1: A WE AWE1 Downtown 1
#> 2: A WE AWE2 Downtown 2
Similarly, you can use the fields
argument to read only a few selective fields of a file. These arguments can be combined, offering a great deal of flexibility that can translate into very fast reading times.
gtfs <- import_gtfs(
gtfs_path,
files = c("shapes", "trips"),
fields = list(trips = c("trip_id", "route_id"))
)
gtfs
#> $shapes
#> shape_id shape_pt_lat shape_pt_lon shape_pt_sequence shape_dist_traveled
#> <char> <num> <num> <int> <num>
#> 1: A_shp 37.61956 -122.4816 1 0.0000
#> 2: A_shp 37.64430 -122.4107 2 6.8310
#> 3: A_shp 37.65863 -122.3084 3 15.8765
#>
#> $trips
#> trip_id route_id
#> <char> <char>
#> 1: AWE1 A
#> 2: AWE2 A
These fields are parsed according to the standards for reading and writing GTFS feeds in R. Undocumented files and fields (i.e. not specified in the GTFS reference) are read as character
, by default. You can overrule this default with extra_spec
(note that only undocumented fields should be specified in this argument). ggl_gtfs.zip
contains an undocumented field in the levels.txt
file, named elevation
. Let’s check the effect of extra_spec
:
gtfs <- import_gtfs(gtfs_path, files = "levels")
gtfs$levels
#> level_id level_index level_name elevation
#> <char> <num> <char> <char>
#> 1: L0 0 Street 0
#> 2: L1 -1 Mezzanine -6
#> 3: L2 -2 Southbound -18
#> 4: L3 -3 Northbound -24
class(gtfs$levels$elevation)
#> [1] "character"
gtfs <- import_gtfs(
gtfs_path,
files = "levels",
extra_spec = list(levels = c(elevation = "integer"))
)
gtfs$levels
#> level_id level_index level_name elevation
#> <char> <num> <char> <int>
#> 1: L0 0 Street 0
#> 2: L1 -1 Mezzanine -6
#> 3: L2 -2 Southbound -18
#> 4: L3 -3 Northbound -24
class(gtfs$levels$elevation)
#> [1] "integer"
Use export_gtfs()
to write a GTFS object to disk. Please note that the function assumes that the object is formatted according to the standards for reading and writing GTFS feeds in R - i.e. if it’s not, any conversions should be done before using export_gtfs()
.
Objects are written as .zip
feeds by default, but you can also write them as directories using the as_dir
argument:
gtfs <- import_gtfs(gtfs_path)
tmpf <- tempfile(fileext = ".zip")
tmpd <- tempfile()
export_gtfs(gtfs, tmpf)
zip::zip_list(tmpf)$filename
#> [1] "calendar_dates.txt" "fare_attributes.txt" "fare_rules.txt"
#> [4] "feed_info.txt" "frequencies.txt" "levels.txt"
#> [7] "pathways.txt" "routes.txt" "shapes.txt"
#> [10] "stop_times.txt" "stops.txt" "transfers.txt"
#> [13] "translations.txt" "trips.txt" "agency.txt"
#> [16] "attributions.txt" "calendar.txt"
export_gtfs(gtfs, tmpd, as_dir = TRUE)
list.files(tmpd)
#> [1] "agency.txt" "attributions.txt" "calendar_dates.txt"
#> [4] "calendar.txt" "fare_attributes.txt" "fare_rules.txt"
#> [7] "feed_info.txt" "frequencies.txt" "levels.txt"
#> [10] "pathways.txt" "routes.txt" "shapes.txt"
#> [13] "stop_times.txt" "stops.txt" "transfers.txt"
#> [16] "translations.txt" "trips.txt"
The function defaults to writing every element inside a GTFS object as a .txt
file. As with import_gtfs()
, use the files
argument to overrule this behaviour:
export_gtfs(gtfs, tmpf, files = c("shapes", "trips"))
zip::zip_list(tmpf)$filename
#> [1] "shapes.txt" "trips.txt"
You can also use the standard_only
argument to export only files and fields specified in the GTFS reference (i.e. to leave out undocumented files/fields). In the example below, extra_gtfs
contains both an undocumented file (extra_file
) and an undocumented field in a regular file (levels$elevation
) that are not written to disk when using export_gtfs()
:
extra_gtfs <- gtfs
extra_gtfs$extra_file <- data.frame(column = "value")
export_gtfs(extra_gtfs, tmpd, as_dir = TRUE, standard_only = TRUE)
"extra_file" %in% sub(".txt", "", list.files(tmpd))
#> [1] FALSE
levels_fields <- readLines(file.path(tmpd, "levels.txt"), n = 1L)
grepl("elevation", levels_fields)
#> [1] FALSE
gtfsio also includes functions to check the structure of GTFS objects. check_file_exists()
checks the existence of elements representing specific text files inside an object. It returns TRUE
if the check is successful, and FALSE
otherwise. assert_file_exists()
invisibly returns the object if successful, and throws an error otherwise:
gtfs <- import_gtfs(gtfs_path, files = c("shapes", "trips"))
check_file_exists(gtfs, "shapes")
#> [1] TRUE
check_file_exists(gtfs, "stop_times")
#> [1] FALSE
assert_file_exists(gtfs, "shapes")
assert_file_exists(gtfs, "stop_times")
#> Error: The GTFS object is missing the following required element(s): 'stop_times'
check_field_exists()
checks the existence of fields, represented by columns, inside GTFS objects. It returns TRUE
if the check is successful, and FALSE
otherwise. assert_field_exists()
invisibly returns the object if successful, and throws an error otherwise:
gtfs <- import_gtfs(
gtfs_path,
files = "trips",
fields = list(trips = "trip_id")
)
check_field_exists(gtfs, "trips", fields = "trip_id")
#> [1] TRUE
check_field_exists(gtfs, "trips", fields = "shape_id")
#> [1] FALSE
assert_field_exists(gtfs, "trips", fields = "trip_id")
assert_field_exists(gtfs, "trips", fields = "shape_id")
#> Error: The GTFS object 'trips' element is missing the following required column(s): 'shape_id'
check_field_class()
checks the classes of fields inside GTFS objects. It returns TRUE
if the check is successful, and FALSE
otherwise. assert_field_class()
invisibly returns the object if successful, and throws an error otherwise:
gtfs <- import_gtfs(gtfs_path, files = "levels")
check_field_class(gtfs, "levels", fields = "elevation", classes = "character")
#> [1] TRUE
check_field_class(gtfs, "levels", fields = "elevation", classes = "integer")
#> [1] FALSE
assert_field_class(gtfs, "levels", fields = "elevation", classes = "character")
assert_field_class(gtfs, "levels", fields = "elevation", classes = "integer")
#> Error: The following columns in the GTFS object 'levels' element do not inherit from the required classes:
#> - 'elevation': requires integer, but inherits from character
Please notes that “lower-level” checks are conducted inside each function - e.g. before checking the type of a field, first the existence of such field is checked:
gtfs <- import_gtfs(gtfs_path, files = "shapes")
check_field_class(gtfs, "stop_times", fields = "stop_id", classes = "character")
#> [1] FALSE
assert_field_class(gtfs, "stop_times", fields = "stop_id", classes = "character")
#> Error: The GTFS object is missing the following required element(s): 'stop_times'
These functions are great for package interoperability. If two distinct packages represent GTFS text files using the same data structure (both gtfstools and gtfsrouter use data.table
s, for example), they just need to add some basic checks before proceeding with operations on objects created by the other package.
So, if gtfsrouter requires the transfers
element to perform some operations, it might as well perform them on an object created by gtfstools, as long as it contains a transfers
element. Thus, it could greatly benefit of some assert_*
/check_*
calls before proceeding with such operations.