Created by Materia for OpenMind Recommended by Materia
Start BIG “DATAxi”: Invasion or Innovation?
30 June 2014

BIG “DATAxi”: Invasion or Innovation?

Estimated reading time Time 4 to read

It’s no wonder that taxis are recently on the forefront of press headlines worldwide. It is merely another example of how the Big Data paradigm is already changing our lives. The synthesis is simple: if taxis have geo-localization (GPS) systems and customers have geo-localization systems on their mobile devices, can we design a new matching system that is more efficient than the previous one?

Before: to date, an after-the-fact model has been in use. A customer called a centre (the initial fact) and this service was assigned to the nearest taxi driver, based on a series of priorities. In other words there was no applied science behind the business model. Only a service and a series of heuristic rules to assign it.

Now: today more innovative solutions are being developed that are based on applied science. I myself was recently head hunted by a Private Equity (PE) fund to put together a project of this type. PE funds generally buy companies with growth potential, contract a (small) team capable of achieving this potential and finally sell the company on to profit from the improvement. Bottom line: the fact that they contacted me means by itself that they are calculating that there is room for innovation in the taxi industry (and “they put their money where their mouth is”). In my case, they wanted someone who could make the company switch from being technology intensive to completely technological. In other words they wanted to become an algorithm, the Amazon of taxis, all over the world. My proposal was very simple, folded in four layers and based upon two data sources, those already held by the company to date and the exploitation of new data that they could acquire via their app:

  1. Improving the distribution of the fleet of taxis. The change needed was from the after-the-fact model to the probability model. Using the historical record of the activity, it is possible to generate heat maps that visually display areas with more probability of a service request occurring, for instance “a rainy Wednesday afternoon in January while the local football team is playing the Champions League away”. It is important to note that the model does not only reduce waiting time for the customer but also allows drivers to improve their waiting time and time spent with the taxi. And if you can keep your drivers happy, you keep you competitive advantage compared to Uber type companies.
  2. Matching customer and driver. Thanks to the app, the customer is no longer anonymous. Once the customer is identified, her experience can be improved so that instead of being a mere user, she can become an “evangelist” for your brand. Word of mouth is the strongest marketing. This meant that the fleet of taxis needed to become a fleet of taxi drivers, with individual names and characteristics. Technically speaking, the key factor is granularity. So the challenge of matching becomes similar to the algorithm of a date matching site. With this in mind and unlike those sites, matching for taxis would be required to take place instantly, constrained by the drivers available near the customer. Therefore, the key is to increase the probability of this successful matching. To increase the proportion of taxi drivers with favorable matching within this group of drivers available near the customer. The taxi drivers (individuals, no longer merely the taxis) needed to be distributed around the city based on the median customer profile in each area.
  3. Exploiting the drivers’ latent knowledge. In general it is expected that the aggregate knowledge of the drivers with respect to how to move about between two points of the city is superior to the routes suggested by companies such as TomTom, GoogleMaps, etc. Two main parameters needed to be taken into account: the fastest route and the cheapest route, given that not all customers have the same priorities (time is money, but how much money?).
  4. Smart resiliency. When a taxi driver is in an influence area for her own profile (point 2) and a service is triggered (point 1) the efficient route of which (point 3) made her cross areas that did not correspond to her profile, in general the return should not follow the same route. Instead, she should choose alternative routes, again crossing influence areas within her own profile. In other words, it is necessary to analyze the pros of the fleet continuing to optimize the customer experience against the cons of consuming more time and fuel.

In all, the needed first to switch from the current data analysis system (disperse and probably contradictory at times) to a homogeneous system (clear and consensual data model) that was structured and exploitable (Big Data).  To finally obtain an intelligent system taking advantage of latent/hidden patterns (Smart Data).

If this takes place, the traditional taxi service system could become obsolete and find itself at an obvious crossroads. Either it takes the leap toward innovation and more aggressive market competition (both in terms of price and customer experience) or it seeks government protection and legislative reinforcement. Right now, we are immersed in the second option, hence the recent controversy and headlines.

It is now time to observe how the three issues I talked about in my previous post: secrecy, espionage , and propaganda evolve.

In the future: in the medium/long term, I’d expect to see dynamic prices based on the casuistics of each taxi. For instance services in the return route according to point 4 should be less expensive that those derived from the initial distribution according to point 1. I would even expect that the possibility of sharing taxis is mooted (see, for instance Shou et al. (2013)).

The greatest challenge, then, as in other Big Data related projects, is the lack of human capital with all the characteristics necessary to spearhead this type of project: the logic of an economist in decision taking, the creativity and precision of a scientist, the independence and capacity to generate prototypes of a developer and the motivation of an entrepreneur. And these, generally prefer to keep on in the most intensive sector in Big Data: finance.


M. Shou, Zheng. Y. and Ouri Wolfson; T-Share: A Large-Scale Dynamic Taxi Ridesharing Service (2013); IEEE.

Sergio Álvarez Teleña

Strategies & Data Science, BBVA (Madrid)

More publications about Sergio Álvarez-Teleña

Trading: an Arms Race

Comments on this publication

Name cannot be empty
Write a comment here…* (500 words maximum)
This field cannot be empty, Please enter your comment.
*Your comment will be reviewed before being published
Captcha must be solved