Designing Uber backend
1. What is Uber?
Uber enables its customers to book drivers for taxi rides. Uber drivers use their personal cars to drive customers around. Both customers and drivers communicate with each other through their smartphones using Uber app.
2. Requirements and Goals of the System
Let’s start with building a simpler version of Uber.
There are two kinds of users in our system 1) Drivers 2) Customers.
- Drivers need to regularly notify the service about their current location and their availability to pick passengers.
- Passengers get to see all the nearby available drivers.
- Customer can request a ride; nearby drivers are notified that a customer is ready to be picked up.
- Once a driver and customer accept a ride, they can constantly see each other’s current location, until the trip finishes.
- Upon reaching the destination, the driver marks the journey complete to become available for the next ride.
3. Capacity Estimation and Constraints
- Let’s assume we have 300M customer and 1M drivers, with 1M daily active customers and 500K daily active drivers.
- Let’s assume 1M daily rides.
- Let’s assume that all active drivers notify their current location every three seconds.
- Once a customer puts a request for a ride, the system should be able to contact drivers in real-time.
4 Schema
乘客发出打车请求,服务器创建一次Trip • 将 trip_id 返回给用户 • 乘客每隔几秒询问一次服务器是否匹配成功
服务器找到匹配的司机,写入Trip,状态为等待司机回应 • 同时修改 Driver Table 中的司机状态为不可用,并存入对应的 trip_id
司机汇报自己的位置 • 顺便在 Driver Table 中发现有分配给自己的 trip_id • 去 Trip Table 查询对应的 Trip,返回给司机
司机接受打车请求 • 修改 Driver Table, Trip 中的状态信息 • 乘客发现自己匹配成功,获得司机信息
司机拒绝打车请求 • 修改 Driver Table,Trip 中的状态信息,标记该司机已经拒绝了该trip • 重新匹配一个司机,重复第2步
5. Fault Tolerance and Replication
What if a Driver Location server or Notification server dies?We would need replicas of these servers, so that if the primary dies the secondary can take control. Also, we can store this data in some persistent storage like SSDs that can provide fast IOs; this will ensure that if both primary and secondary servers die we can recover the data from the persistent storage.
6. Ranking
How about if we want to rank the search results not just by proximity but also by popularity or relevance?
How can we return top rated drivers within a given radius?Let’s assume we keep track of the overall ratings of each driver in our database and QuadTree. An aggregated number can represent this popularity in our system, e.g., how many stars a driver gets out of ten? While searching for top 10 drivers within a given radius, we can ask each partition of the QuadTree to return top 10 drivers with maximum rating. The aggregator server can then determine top 10 drivers among all the drivers returned by different partitions.
7. Advanced Issues
- How to handle clients on slow and disconnecting networks?
- What if a client gets disconnected when it was a part of a ride? How will we handle billing in such a scenario?
- How about if clients pull all the information as compared to servers always pushing it?