Data mining in public transport
What is data mining and what does it have to do with public transport and with mobility in general? Digitalisation is resulting in more and more data is being collected, including in public transport. In the future how can this data be used to establish patterns and correlations? What conclusions can be drawn in order to formulate new strategies for public transport?
In general, data mining is defined as the automated use of statistical and computer-assisted methods to analyse data records.1 The point of this analysis is to examine generally large amounts of data for patterns and correlations, which can ultimately lead to the formulation of solutions and strategies in a variety of areas.
A distinction must be made between data mining and big data, a similar though not identical term. The aim of big data can also be described as establishing certain patterns and correlations, although it primarily has to do with analysing pre-existing databases.
Big data is the result of information collected everywhere in business, communications and healthcare and is initially devoid of structure. Mobile networks, social media and the internet in its entirety also constantly collect and store data. Such an amount of information can no longer be handled using conventional systems and tools2, which means automated digital methods are necessary to process the huge volume of data and make it usable for other purposes.
Analytical methods such as data mining make it possible to evaluate this information and determine sales patterns, establish data profiles and examine correlations linking different areas. Businesses are among the main beneficiaries of these methods, as based on data analyses they can predict customer behaviour and how this is influenced by certain factors and can optimise their sales strategy accordingly.
There are also potentially great benefits to evaluating big data in transport. Public transport and future developments in the mobility sector in general can benefit greatly from data mining, as is illustrated below.
A key function of data analysis in data mining is being able to forecast certain trends and break down customer behaviour based on available data. In the mobility sector ticketing is becoming an increasingly important source of information.
According to a survey conducted by Freie Universität Berlin3 as part of Nahverkehrstage 2017, in Germany around 30 public transport companies currently offer digital ticketing services, whereby the relevant apps automatically collect passenger and transaction data. The survey examined the transport company Berliner Verkehrsbetriebe which collects the following data, for example:
- Master data
So-called master data contains a user’s names, address, date of birth, registration date, their most recent log-in and purchase.
- Transaction data
Transaction data is defined as ticket and purchase information, including ticket number and ticket price, as well as time and place of purchase.
Analysing this data produces statistical information on the customer demographic – and the target audiences for digital ticketing services – as well as statistics on sales and individual products sold.
Thus, only 50 % of BVG customers live in the capital, and 58 % are aged between 25 and 40. Most customers purchase a single fare valid for the AB travel zone. Armed with these findings the next step is to correlate individual data categories with the behaviour patterns of certain customer groups. Thus, according to the survey by FU Berlin sales of tickets such as the Berlin WelcomeCard follow clear patterns and days of the week.
Using a variety of methods and data mining of available information it is possible to make customer forecasts, which in turn can be used to further improve digital ticketing services.
In addition to the ticketing apps of various transport companies, there is now also an increasing focus on electronic fare management (EFM) systems in Germany. Based on check-in, check-out (CICO) information, each customer’s use of public transport is automatically registered and invoiced according to the appropriate fare rate. This ticketing system is currently being trialled by Nordhessischer Verkehrsbund NVV and Kreisverkehr Schwäbisch Hall KSH.4
According to a survey by Universität Kassel5, EFM systems provide a reliable and automatic source of information for long-term, comprehensive data mining purposes, however the resultant data records are not representative of all public transport customers. Non-regular customers and fares that overlap with EFM customers’ purchases elsewhere mean the records produce incomplete and possibly misleading findings.
In public transport it is not just digital ticketing systems that represent data mining sources. Increasingly, passenger information systems are providing reliable data on where, when and the frequency with which customers use public transport.
On 01 January 2017 the Ministry of Transport and Digital Infrastructure (BMVI) introduced a research and development programme under the heading of “eTicketing and digitally connected users in public transport”, and is currently supporting forward-looking mobility projects. Thus, in Dresden the Fraunhofer Institute is researching a software system in order to “develop flexible public transport fare rates“, Hamburger Verkehrsbetriebe and partners are working on a “flexible ticketing platform for Hamburg (HaDif)“, i.e. a “flexible transport solution for conurbations“.6
A key issue here is the expansion and development of dynamic passenger information systems.
In order for data mining to succeed it is important for passenger information systems and public transport customers to be digitally connected. Mobile devices such as smartphones/smartwatches and corresponding apps can directly connect individual users with information systems. In contrast to fixed passenger information systems which offer passengers information but cannot collect data on its use, mobile devices generate valuable data on public transport customers’ behaviour. Thus, digitalisation in public transport benefits all parties: passengers, by providing them with flexible access to real time information, and transport companies by providing them with statistics which they can use to draw important conclusions for optimising public transport.
As an example, the Berlin-based startup MotionTag7 has developed a concept that highlights the innovative potential of networking information systems, ticketing services and user data. Only recently, in January 2018, several investors agreed to support8 this project. The idea is based on a virtual ticketing service which provides customers with the appropriate public transport ticket and timetable information in real time anywhere in Germany and for the corresponding travel zones. The intention is also for transport companies to receive anonymous app usage and travel behaviour data that will enable them to improve their services. Thus, underpinning the basic principle of data mining is an algorithm which collects raw customer data in anonymous form and establishes relevant correlations, a technology that not only digitalises tomorrow’s transport but above all optimises it for individual customer categories.
Armed with a data analysis of this kind, what steps are actually taken to improve public transport? Data mining potentially offers great benefits for the mobility sector and a wide range of areas where it can be applied. At Nahverkehrstage 2017 Andreas Schmidt9 from Universität Kassel published an essay outlining a number of public transport areas that could be improved using collected data, as illustrated below.
Control of vehicle utilisation
A corresponding analysis of collected data can significantly improve vehicle utilisation levels. In that context so-called structural data is relevant, for example on passenger ratios of students and trainees and where they live. This type of information can be used to pinpoint peak times and in-demand routes and to establish tendencies and trends regarding population density. Such a data analysis can also be conducted by simulation. Using the data collected one can simulate future scenarios and generate forecasts of trends in public transport demand. It is thus possible to adapt transport planning appropriately and optimise vehicle utilisation accordingly.
Better vehicle fleet management
Data mining is also a reliable and efficient instrument for managing vehicle fleets. By collecting data on malfunctions and their causes it is possible to statistically calculate running times and outages in public transport and actual service hours. By analysing times, correlations can also be established between vehicle features such as the number of doors on buses. The findings of such a data analysis ultimately make it possible to manage vehicle fleets in accordance with actual needs. Traffic information can also supply data: collecting and evaluating it provides advance warning of potential delays, which makes it easier to plan reserve vehicles and reliably ensure their availability.
Improving transport networks
Expanding transport networks is also a substantial part of overall transport planning. An analysis of the master data of public transport customers and their travel data – e.g. through eTicketing systems – provides information on the catchment areas of bus stops, appropriate distances between stops or the possible need for extra stops.
Demand for certain routes also plays an important part in the planning of public transport networks. By collecting the relevant data it is possible to determine key routes and destinations and take these into account when organising route networks. An analysis of demand also provides information on the need for expanding route networks in rural areas. Thus, in 2016 Danish geographers evaluated customers’ frequentation of bus routes in order to research demand for public transport in north Jutland and singled out often-used routes.10
Digitally connecting public transport systems with one another also promises benefits. It makes its possible to respond rapidly to an increase in passenger numbers during peak periods, unforeseen malfunctions, or to make other spontaneous adjustments to public transport operations. This not only requires reliable data collection methods, but also for public transport vehicles to possess the corresponding technical equipment.11
Digitalisation in public transport and data analysis methods including data mining as part of this process represent progress and the way forward to improving tomorrow’s transport. Nevertheless, the increasing trend towards automation in transport raises issues which Prof. Dr. Stephan Rammler, author and professor at the Institut für Transportdesign of Hochschule für Bildende Künste in Braunschweig, explains below.12
Data protection poses a challenge for big data
In the age of social media and digital connectivity data protection plays an important role, and equally so where mobility is concerned. While data mining in public transport offers many benefits, including for transport companies and customers, relevant user data such as travel behaviour, travel times and regularly frequented routes can often be correlated with information on the individual behaviour of public transport customers. It is thus imperative to define clear guidelines on how to handle personal data in the field of transport.
The Data Protection Law which came into force in May 2018 raises questions concerning the processing of big data. The new regulations make no specific mention of how to handle big data in respect of data protection and are unclear as to whether automatically collected data, as in the case of public transport, counts as personal data and whether processing thereof must be subject to these new regulations.13
Does digitalisation in public transport have a negative environmental impact?
The aim of digitally connected mobility is to make public transport processes as efficient as possible and provide maximum benefit for customers. However, as Prof Dr Rammler explains in conversation with Dr Michael Benz, a co-organiser of the German Mobility Congress14, it is necessary to take the ecological aspects of these technological developments into account. The increasing amount of technology taking over road transport also means higher demand for raw materials in order to manufacture the necessary hardware. Intensely analytical methods such as data mining require energy-consuming server centres. It can therefore be assumed that, in addition to the energy consumed by public transport vehicles themselves, the extra energy consumption resulting from the relevant technological innovations will further increase the environmental impact caused by public transport. With energy requirements constantly rising, how can one guarantee the ecologically responsible production of technical end devices and ensure that public transport takes convincing steps to protect the environment?
Mobility of the future
With public transport making such progress with regard to technology and data analysis the term mobility must be redefined so as to emphasise its future direction and goals. The aim of analysing and constantly developing the processes involved in transport operations using computer-assisted methods including data mining is naturally to fully perfect those processes. However, it should be noted that technologically advancing and automating these processes is not aimed at replacing the “hard infrastructures“15 but at improving them. Transport companies and structures continue to represent the physical basis of a functioning public transport network. Data analysis and other automated processes enhance these fundamental structures with regard to their efficiency and to providing a balanced customer service, while at the same time not replacing them.
1 cf. https://www.bigdata-insider.de/was-ist-data-mining-a-593421/ (Stand 10.05.2018)
4 cf. https://www.uni-kassel.de/fb14bau/institute/ifv/verkehrsplanung-und-verkehrssysteme/forschung-und-dienstleistungen/forschungsprojekte/flexitarife-entwicklung-anwendung-und-wirkungsermittlung-flexibler
5 cf. Uni Kassel: Fachgebiet Verkehrsplanung und Verkehrssysteme
6 cf. http://mobilitaet21.de/eticket-deutschland/
9 Schmidt, Andreas (2017): Nutzung von Daten in der Angebots- und Betriebsplanung. In: Institut für Verkehrswesen der Universität Kassel: Nahverkehrs-Tage 2017. Digital und Disruptiv – Neue Daten und Methoden für einen kundengerechten ÖPNV. Kassel: University Press, S.5-24.
11 cf. Deutsches Zentrum für Luft- und Raumfahrt (Hg.) (2017): Digital mobil in Deutschlands Städten.