January 17, 2022 1:39 pm

Why a Canonical GTFS Schedule Validator?

At MobilityData, we want to improve travelers’ information through public transit data that is standardized, discoverable and of good quality (and much much more!). Sticking to the quality of data, a Canonical GTFS Schedule Validator is part of our core work since our foundation. We released the v1.0 in 2020, the v2.0 and the v3.0 in 2021, unlocking tons of opportunities to increase the quality of data and save precious time for the public transit data producers and consumers.

General Transit Feed Specification (GTFS) is used by public transit agencies to publish their transit data, and by developers that write applications consuming the data in an interoperable way. But not all GTFS datasets are made equal: missing or inaccurate information can make a public transit rider’s experience extremely frustrating, giving them a reason to use their cars instead.

What happens when you combine “data” and “quality”?

Quality matters in order to provide a service that people trust. If a trip planning app recommends taking the bus route “3N” but the route indicator on the bus says “3 Northbound” or “3 to City Center,” the rider will question if this is the correct bus. Before they know it, the bus has left.

Quality matters in order to serve all riders equally. Transit riders with disabilities rely on accessibility information in public transport stations. If pathways aren’t modelled correctly in stations, this means that a portion of the population cannot use the service (GTFS-pathways extension was adopted in March 2019).

Quality matters in order to make good decisions. The TransitCenter equity dashboard is built using GTFS data and it can help transit agencies and local governments craft transportation and land use policies that make transit access more equitable.

Why do you want data of better quality?

Many entities will save money, time and resources with data of better quality. But in the end, the traveler will get better information. And this is gold!

To summarize some of the advantages:

Less time wasted for data producer & data consumer

More travelers will be able so see the service through third party applications
More travelers will rely on the service and trust it
Travelers will have more options for their transportation, based on their needs (accessibility, is bike allowed on board, how to pay a ride, etc)

We need a consensus on what a “valid” dataset means

GTFS is supported by a community and has a solid amendment process. But part of it is still difficult to interpret in the same consistent way.

Public transit agencies often create dataset flagged as “valid” according to their validation tool, but they are flagged as “invalid” by a trip planning app, which uses a validation tool with a slightly different criteria. When this happens, the dataset is either fixed by the trip planning app (and the Public Transit agency often never hears about it), or the public transit agency has to investigate and demystify the different trip planning app’s criterias, sometimes providing different versions of the same dataset.

The intent of writing the GTFS Best Practices was to reduce the “room to interpretation gap” in the GTFS specification, and the Canonical GTFS Schedule Validator is a complementary initiative intended for the same purpose. Having one canonical tool used by both producers and consumers will help the GTFS community collaborate more efficiently and re-allocate quality control resources.

We would love to hear from you.

🛠️ If you’re interested in contributing to the Canonical GTFS Schedule Validator, please get in touch with isabelle@mobilitydata.org, the Product Manager for High Quality Data.