System Design of a Food Delivery App Like Foodmandu / Uber Eats
A system design breakdown of a scalable food delivery app like Foodmandu or Uber Eats, covering architecture, real-time order tracking, payments, and delivery management for handling large-scale users and high traffic efficiently.
Building a food ordering app sounds deceptively approachable — an item list, a cart, a checkout button. Until you realize the cart needs to hold state across app restarts, the order status needs to update in real time across three different users simultaneously (customer, restaurant, driver), and the whole thing has to work on a 4G connection in Lalitpur during peak lunch hours. I've been through this building a delivery platform, and this post is the full system design walkthrough I wish existed when I started.
This covers the full architecture — database models, real-time order tracking, driver-matching logic, push notifications, menu management, payment integration, and the non-obvious trade-offs at each step. References are Foodmandu for the local market context and Uber Eats for the canonical large-scale benchmark. The implementation examples use Flutter (Dart), Django REST Framework, Firebase, and PostgreSQL — but the design principles apply regardless of stack.
- High-level architecture overview
- User roles and auth design
- Core database models
- Menu and catalog management
- Order lifecycle — from tap to delivery
- Real-time order tracking
- Driver matching and dispatch
- Push notifications across three actors
- Payment integration (Khalti, Stripe, COD)
- Search, filtering, and restaurant discovery
- Scalability and performance
- Lessons learned
01 —High-Level Architecture Overview
Before anything else, it helps to see the full picture. A food delivery platform isn't a single app — it's at least three apps talking to shared infrastructure, with completely different interaction patterns for each user type. The customer browses and orders. The restaurant receives and prepares. The driver picks up and delivers. All three need to stay in sync, often within seconds of each other.
The architecture that handles this without melting down looks roughly like this: a Django REST API as the central business logic layer, PostgreSQL as the primary relational database for orders, menus, and users, Firebase Firestore for real-time order state that all three apps subscribe to, Firebase RTDB for driver GPS location streaming, and FCM for push notifications. Redis sits in front of Django for caching hot menu data and rate limiting.
│ │ │
│ ↓ │
Firestore ←───────── writes ───────────── Firestore
(order state) (menu cache)
│
[ Driver App ] ─── GPS stream ──→ Firebase RTDB
↓
Broadcast to Customer App
PostgreSQL ← Django (source of truth for all business data)
Redis ← Django (menu cache, session, rate limiting)
Celery ← Django (async tasks: notifications, receipts, analytics)
The key insight here: Firebase handles anything that needs to be pushed in real time (order state, driver location), while Django handles anything that needs to be trusted (payments, order creation, inventory). These are different jobs, and trying to do both with one system always ends badly — either you use Firebase for payments (dangerous) or you try to push order updates from Django over WebSockets (extra infrastructure you don't need).
02 —User Roles and Auth Design
A food delivery platform has four distinct user types, each with completely different permissions and app experiences: Customers, Restaurant Owners / Staff, Delivery Drivers, and Platform Admins. One authentication system needs to serve all of them without leaking permissions across roles.
The approach that works cleanest: Django owns user accounts and issues JWTs with a role claim embedded. Firebase Auth runs in parallel with a custom token that carries the same role claim. This mirrors the GymTaar pattern for good reason — it separates business logic (Django) from real-time transport (Firebase) while keeping a single identity.
# Django model — single user table with role field
class User(AbstractBaseUser):
email = models.EmailField(unique=True)
phone = models.CharField(max_length=20, unique=True)
role = models.CharField(choices=[
('customer', 'Customer'),
('restaurant', 'Restaurant'),
('driver', 'Driver'),
('admin', 'Admin'),
], max_length=20)
firebase_uid = models.CharField(max_length=128, blank=True)
fcm_token = models.TextField(blank=True)
is_verified = models.BooleanField(default=False)
created_at = models.DateTimeField(auto_now_add=True)
# Firebase custom token minting — called after Django login
def mint_firebase_token(user):
return firebase_admin.auth.create_custom_token(
user.firebase_uid,
developer_claims={
'role': user.role,
'restaurantId': str(user.restaurant_id) if user.role == 'restaurant' else None,
}
)
Phone number authentication matters here more than email, especially for the Nepali market where Foodmandu operates. Most users don't have reliable email but everyone has a phone. OTP via SMS (we used Sparrow SMS for Nepal numbers, Twilio for international) is the primary registration path for customers and drivers. Restaurant owners get email-based accounts because they're onboarded by the platform team manually.
Driver verification is a separate layer on top of auth. A driver can have a valid account but still be in pending_verification state until their license, vehicle registration, and identity documents are reviewed. We store these as a separate DriverProfile model with a verification_status enum. The dispatch system only considers drivers where this status is approved.
03 —Core Database Models
PostgreSQL holds the source of truth for everything that money or business logic touches. The most important tables and their relationships:
restaurants (id, owner_id→users, name, slug, address, lat, lng,
cuisine_tags, avg_rating, is_open, prep_time_minutes,
min_order_amount, delivery_radius_km)
menu_categories (id, restaurant_id→restaurants, name, sort_order, is_active)
menu_items (id, category_id→menu_categories, name, description,
base_price, image_url, is_available, is_veg, prep_time_minutes)
item_variants (id, item_id→menu_items, label, extra_price)
item_addons (id, item_id→menu_items, label, price, is_required)
orders (id, customer_id→users, restaurant_id→restaurants,
driver_id→users nullable, status, subtotal, delivery_fee,
discount, total, payment_method, payment_status,
delivery_address_snapshot, placed_at, confirmed_at,
picked_up_at, delivered_at)
order_items (id, order_id→orders, item_id→menu_items,
variant_id nullable, quantity, unit_price, addons_json)
addresses (id, user_id→users, label, lat, lng, line1, landmark, is_default)
payments (id, order_id→orders, gateway, gateway_txn_id,
amount, status, initiated_at, completed_at)
driver_profiles (id, user_id→users, vehicle_type, license_no,
verification_status, total_deliveries, rating)
reviews (id, order_id→orders, customer_id→users,
food_rating, delivery_rating, comment)
Why snapshot the delivery address on the order?
A user's address can change or be deleted after an order is placed. If you store only a foreign key to the address table, historical orders can end up with a null delivery address or — worse — wrong address if the user edited it. Storing a JSON snapshot of the address at order time means the order record is self-contained and auditable forever. Same logic applies to unit_price in order_items — the restaurant might change a price tomorrow, but your order from today should reflect what you actually paid.
Addons as JSON on the order item
Menu addons (extra cheese, no onion, spicy level) are stored as a JSON blob on the order item row rather than a separate join table. This is a deliberate denormalization — at order time you need a frozen snapshot of what the user chose, not a live reference to the addon catalog. If the restaurant later removes an addon from their menu, completed orders shouldn't break.
04 —Menu and Catalog Management
Menu data has an interesting read/write pattern. Writes are rare — a restaurant updates their menu maybe a few times a week. Reads are extremely frequent — every customer who opens the app fetches menu data. This asymmetry screams for aggressive caching.
The menu API endpoint returns a nested structure: restaurant info, categories, items within each category, and variants/addons per item. This is a fairly heavy query (several joins) but it's the same query every time for a given restaurant. Redis with a 5-minute TTL handles this cleanly — the restaurant owner edits via the dashboard, which invalidates the cache key, and the next customer request re-warms it.
class MenuAPIView(RetrieveAPIView):
def retrieve(self, request, restaurant_id):
cache_key = f'menu:{restaurant_id}'
cached = cache.get(cache_key)
if cached:
return Response(cached)
restaurant = Restaurant.objects.prefetch_related(
'menu_categories__menu_items__variants',
'menu_categories__menu_items__addons',
).get(id=restaurant_id)
data = RestaurantMenuSerializer(restaurant).data
cache.set(cache_key, data, timeout=300) # 5 min TTL
return Response(data)
# Invalidate on any menu change
@receiver(post_save, sender=MenuItem)
def invalidate_menu_cache(sender, instance, **kwargs):
cache.delete(f'menu:{instance.category.restaurant_id}')
One thing that trips people up: item availability needs to be togglable in near-real-time. A restaurant runs out of momos at 1pm — they need to mark that item unavailable from their dashboard and have it reflect immediately for new orders. The 5-minute cache becomes a problem here. The solution is to separate item availability from the full menu cache — availability flags live in a separate short-TTL cache key (30 seconds), and the app merges both on render.
05 —Order Lifecycle — From Tap to Delivery
This is the heart of the system. An order passes through a well-defined sequence of states, and each transition has side effects — database writes, Firebase updates, push notifications, and sometimes external API calls (payment gateway). Getting this right means modelling the state machine explicitly and never letting it be implicit in if-else chains scattered across your codebase.
→ ready_for_pickup → driver_assigned → picked_up
→ delivered
Cancellation exits:
pending → cancelled_by_customer
confirmed → cancelled_by_restaurant
preparing → cancelled_by_restaurant (rare, with penalty logic)
driver_assigned → driver_unassigned → loops back to ready_for_pickup
Each transition is a Django service method — not a view, not a signal. Service methods make it easy to test state transitions in isolation and ensure all side effects (Firestore write, FCM notification, analytics event) happen atomically within a transaction:
class OrderService:
@staticmethod
@transaction.atomic
def confirm_order(order_id: str, confirmed_by_restaurant: bool = True):
order = Order.objects.select_for_update().get(id=order_id)
if order.status != 'payment_verified':
raise InvalidTransitionError(f'Cannot confirm order in {order.status}')
order.status = 'confirmed'
order.confirmed_at = timezone.now()
order.save(update_fields=['status', 'confirmed_at'])
# Sync state to Firestore for real-time client updates
FirestoreService.update_order_state(order)
# Push notification to customer (via Celery, async)
notify_order_confirmed.delay(order_id)
return order
The select_for_update() is critical. Without it, two concurrent requests (say, a restaurant confirming and a customer cancelling at the exact same second) can both read the same state and both succeed — leaving the order in a broken in-between condition. The row-level lock prevents that.
Idempotency keys for order creation: Customers hitting "Place Order" on a slow network can trigger duplicate submissions. Always generate an idempotency key on the client (UUID) and check for it before creating the order. If the same key hits the API twice within 5 minutes, return the existing order instead of creating a duplicate. This alone saves a surprising number of support tickets.
06 —Real-Time Order Tracking
This is where the customer-facing "wow" moment lives. You place an order and immediately see a live timeline: order confirmed, being prepared, driver assigned, on the way, arriving. And on a map, you watch the driver's dot move toward your pin. Making that feel smooth requires two separate real-time streams that serve different purposes.
Order status updates come from Firestore. Every time Django transitions an order, it writes the new state to a Firestore document. The customer app listens to this document with a snapshots() stream. Updates arrive in under a second. The Firestore document is lightweight — just the order ID, current status, timestamps, and driver info:
status: 'confirmed' | 'preparing' | 'ready_for_pickup' | ...
statusUpdatedAt: Timestamp
restaurantName: string
estimatedDeliveryAt: Timestamp
driver: map | null
├── id: string
├── name: string
└── phone: string
participantIds: array ← used in security rules
Driver GPS location comes from Firebase Realtime Database — not Firestore. The driver app writes GPS coordinates every 3–5 seconds while they have an active delivery. RTDB's low-latency streaming is purpose-built for this kind of continuous high-frequency write. Firestore would work but at meaningfully higher cost and slightly higher latency per update.
// Driver app — stream GPS while delivery is active
void startLocationStream(String orderId) {
Geolocator.getPositionStream(
locationSettings: LocationSettings(
accuracy: LocationAccuracy.high,
distanceFilter: 10, // only update if moved 10m
),
).listen((position) {
FirebaseDatabase.instance
.ref('driver_location/$orderId')
.set({
'lat': position.latitude,
'lng': position.longitude,
'bearing': position.heading,
'updatedAt': ServerValue.timestamp,
});
});
}
// Customer app — listen for driver position
Stream<DriverLocation> watchDriverLocation(String orderId) {
return FirebaseDatabase.instance
.ref('driver_location/$orderId')
.onValue
.map((event) => DriverLocation.fromSnapshot(event.snapshot));
}
The distanceFilter: 10 on the GPS stream is important — it means the RTDB write only fires when the driver has moved at least 10 meters. Without this, a stationary driver sitting at a red light generates continuous writes, burning battery and bandwidth for no visual change on the map.
07 —Driver Matching and Dispatch
When an order hits ready_for_pickup, the system needs to find an available driver near the restaurant and assign them. This sounds like a simple "find nearby drivers" query, but there are several failure modes to handle: no driver available, driver accepts then goes offline, driver rejects the offer.
Driver availability and current location are stored in RTDB under driver_status/{driverId}. The dispatch logic runs as a Celery task triggered when an order becomes ready:
@app.task(bind=True, max_retries=5)
def dispatch_driver(self, order_id: str):
order = Order.objects.get(id=order_id)
restaurant_point = Point(order.restaurant.lng, order.restaurant.lat)
# Get online drivers within 5km, sorted by distance
candidates = DriverProfile.objects.filter(
verification_status='approved',
current_status='available',
current_location__distance_lte=(restaurant_point, D(km=5)),
).annotate(
distance=Distance('current_location', restaurant_point)
).order_by('distance')[:5]
if not candidates:
# Retry in 60s — maybe a driver finishes their current delivery
raise self.retry(countdown=60)
# Offer to nearest driver first (sequential, not broadcast)
for driver in candidates:
accepted = DriverOfferService.send_offer_and_wait(
driver_id=driver.user_id,
order_id=order_id,
timeout_seconds=30,
)
if accepted:
OrderService.assign_driver(order_id, driver.user_id)
return
# All 5 rejected — expand radius and retry
raise self.retry(countdown=30)
Sequential offers (nearest driver first, wait 30 seconds, then move to next) work better than broadcast offers (ping all nearby drivers simultaneously) at the scale Foodmandu operates. Broadcast sounds faster but leads to multiple drivers heading to the same restaurant, two of them wasting petrol when only one order exists. Sequential with timeout is slightly slower per offer but more efficient in aggregate.
Driver location in PostgreSQL uses PostGIS — the spatial extension. The distance_lte filter and Distance annotation come from django.contrib.gis. A spatial index on the driver location column keeps this query fast even with thousands of drivers. Without the index it's a full table scan — unusable at any real scale.
08 —Notifications Across Three Actors
One order creates notifications for three different people: the customer tracking their delivery, the restaurant receiving and managing the order, and the driver accepting and completing it. Each has different urgency levels and different notification types.
The notification matrix looks roughly like this: Customer gets informational updates (confirmed, preparing, on the way, delivered). Restaurant gets action-required alerts (new order in, high-priority sound, must confirm within 5 minutes). Driver gets offer alerts (delivery request, tap to accept) and operational updates (new pickup ready). Restaurant and driver notifications need to be high-priority FCM messages that wake the device — an alert notification on iOS and a PRIORITY_HIGH data message on Android.
def send_new_order_to_restaurant(order: Order):
restaurant_fcm_token = order.restaurant.owner.fcm_token
if not restaurant_fcm_token:
return
message = messaging.Message(
data={
'type': 'new_order',
'orderId': str(order.id),
'itemCount': str(order.items.count()),
'total': str(order.total),
'customerName': order.customer.full_name,
},
android=messaging.AndroidConfig(
priority='high',
notification=messaging.AndroidNotification(
title='🔔 New Order!',
body=f'{order.items.count()} items · Rs {order.total}',
sound='order_alert', # custom sound in res/raw/
channel_id='new_orders',
),
),
apns=messaging.APNSConfig(
headers={'apns-priority': '10'},
payload=messaging.APNSPayload(
aps=messaging.Aps(
sound='order_alert.caf',
badge=1,
)
),
),
token=restaurant_fcm_token,
)
messaging.send(message)
The custom notification sound for restaurants matters more than it sounds (pun intended). Restaurant staff are often away from their screen during busy service. A distinct, loud, unmissable audio alert for new orders — different from any other app on the device — is the difference between a confirmed order and a 5-minute delay that cascades into a bad delivery experience.
Token hygiene is non-negotiable. FCM tokens expire silently. Build a system to detect failed sends (messaging/registration-token-not-registered error code) and tombstone those tokens in your database. A restaurant that stopped receiving order notifications because their token rotated after an app update is a support nightmare. Refresh tokens on app launch and on the FCM onTokenRefresh callback.
09 —Payment Integration (Khalti, Stripe, COD)
Payment integration in Nepal means you can't just drop in Stripe and call it done. The dominant payment method for food delivery in Kathmandu is Khalti (digital wallet), followed by eSewa, then Cash on Delivery. Stripe and cards are used but secondary. Your payment layer needs to be gateway-agnostic from the start.
The cleanest architecture here is an abstract PaymentGateway protocol in Django with concrete implementations per gateway. The order creation API accepts a payment_method field and routes to the right implementation:
class PaymentGateway(ABC):
@abstractmethod
def initiate(self, order: Order) -> PaymentInitiateResponse: ...
@abstractmethod
def verify(self, txn_id: str, amount: Decimal) -> bool: ...
class KhaltiGateway(PaymentGateway):
def initiate(self, order):
# POST to Khalti's /api/v2/epayment/initiate/
resp = requests.post('https://a.khalti.com/api/v2/epayment/initiate/', json={
'return_url': f'{settings.BASE_URL}/payments/khalti/callback/',
'website_url': settings.BASE_URL,
'amount': int(order.total * 100), # Khalti uses paisa
'purchase_order_id': str(order.id),
'purchase_order_name': f'Order from {order.restaurant.name}',
}, headers={'Authorization': f'Key {settings.KHALTI_SECRET_KEY}'})
return PaymentInitiateResponse(
payment_url=resp.json()['payment_url'],
pid=resp.json()['pidx'],
)
def verify(self, pidx: str, amount: Decimal) -> bool:
resp = requests.post('https://a.khalti.com/api/v2/epayment/lookup/',
json={'pidx': pidx},
headers={'Authorization': f'Key {settings.KHALTI_SECRET_KEY}'}
)
data = resp.json()
return (
data['status'] == 'Completed'
and data['total_amount'] == int(amount * 100)
)
Always verify payment server-side, never client-side. The client tells you "the user paid with Khalti and here's the transaction ID." Your server verifies with Khalti's lookup API that the amount matches and the status is completed before marking the order as payment_verified. Trusting the client for payment verification is how fraud happens.
10 —Search, Filtering, and Restaurant Discovery
The home screen of any food delivery app is essentially a filtered, ranked list of nearby restaurants. The discovery API needs to handle: geographic proximity (only show restaurants that deliver to the customer's location), availability (only show open restaurants), cuisine filtering, dietary filters (veg/non-veg), and search by restaurant name or dish name. And it needs to be fast — this is the first screen users see.
PostgreSQL with PostGIS handles geographic queries efficiently. The core discovery query:
def get_nearby_restaurants(lat, lng, cuisine=None, search=None):
user_point = Point(lng, lat, srid=4326)
qs = Restaurant.objects.filter(
is_active=True,
is_open=True,
location__distance_lte=(user_point, D(km=F('delivery_radius_km'))),
).annotate(
distance_km=Distance('location', user_point) / 1000
)
if cuisine:
qs = qs.filter(cuisine_tags__contains=[cuisine])
if search:
qs = qs.filter(
Q(name__icontains=search) |
Q(menu_items__name__icontains=search)
).distinct()
return qs.order_by(
'distance_km', # closer first
'-featured_score', # paid placement second
'-avg_rating' # then by rating
)
Dish-level search (finding restaurants that serve "chicken momo") via icontains works fine at small to medium scale. Once you have hundreds of restaurants each with 50-100 menu items, you'll want to move to PostgreSQL's full-text search (SearchVector) or a dedicated search engine like Elasticsearch. The upgrade path from icontains to SearchRank is not difficult — design your serializers so the search field is swappable without changing the API contract.
11 —Scalability and Performance
A food delivery platform has violent traffic spikes — lunch (12–2pm) and dinner (7–9pm) are wildly different from the rest of the day. Your architecture needs to handle 10x normal load during these windows without the system choking or costs exploding.
Database connection pooling
Django opens a PostgreSQL connection per request by default. Under peak load with 50 concurrent requests, that's 50 connections — PostgreSQL starts struggling around 100–200 connections. PgBouncer in transaction pooling mode sits between Django and PostgreSQL, multiplexing hundreds of app connections over a much smaller pool of actual database connections. This is a mandatory addition before your first real peak load event.
Celery for async work
Every notification, every email receipt, every analytics event, every webhook to a restaurant's POS system — these should be async. Celery with Redis as the broker keeps request handlers lean. A confirm_order API call should return in under 200ms. It can't do that if it's synchronously calling FCM, sending an SMS, and posting to Slack inside the same request.
Firestore document write rate limits
Firestore has a hard limit of 1 write per second per document. An order document that gets updated at every status transition — confirmed, preparing, ready, picked up, delivered — can easily hit this during peak if you're not batching correctly. Keep order state updates coarse-grained: one write per state transition, not one write per field change. And never write to the same document from multiple sources simultaneously.
Read replicas for analytics
Restaurant owners want dashboards — daily revenue, top-selling items, order volume trends. These are heavy aggregate queries that shouldn't run on your primary PostgreSQL instance. Set up a read replica and route all dashboard/reporting queries there. Django makes this easy via the using('replica') queryset modifier and a database router.
12 —Lessons Learned
The ones that cost real time:
Model the order state machine explicitly from day one. Every time you add "just one more" implicit state to an if-else chain, the next developer (or you, six months later) has to reverse-engineer it from the code. A proper FSM with documented transitions is documentation that enforces itself.
Never use client-side timestamps for ordering. This applies everywhere: messages, order events, GPS pings. Device clocks are unreliable. A driver's Android phone that hasn't synced NTP in three days can be minutes off. ServerValue.timestamp in RTDB and FieldValue.serverTimestamp() in Firestore exist for this reason.
Snapshot price and address data at order creation. Menu prices change. Discount codes expire. Addresses get edited or deleted. If your order is a collection of live foreign keys, it becomes historically inaccurate the moment anything upstream changes. Freeze the state at order time.
Test the driver-goes-offline scenario exhaustively. A driver accepts an order, drives toward the restaurant, and then their phone dies. Your system needs to handle unassignment gracefully — automatically re-dispatch, notify the customer of the delay, and not leave the order stuck in driver_assigned forever. We have a Celery beat task that checks for orders stuck in non-terminal states with no activity for more than 15 minutes and escalates them.
Redis for menu cache paid for itself in the first week. The home screen discovery query went from 400–600ms to under 30ms once menu data was cached. That alone improved the session-start experience measurably in user testing.
PostGIS is worth the setup complexity. The driver matching query that would have been a 2-second full table scan with application-level distance math runs in under 20ms with a proper spatial index. Install it from the start, not as an afterthought when you're already debugging performance under load.
Wrapping Up
A food delivery app has more moving parts than almost any other product category — real-time tracking, multi-party notifications, geographic dispatch, payment orchestration, and a state machine that has to stay consistent across slow networks and unreliable device connections. None of it is insurmountably complex, but it requires intentional architecture from the start. Bolting on real-time or payment verification as afterthoughts is where projects fall apart.
The core principles that carry through everything above: keep Django as the source of truth for business data, use Firebase only for what it's actually good at (real-time push and presence), model your state machine explicitly, snapshot mutable data at transaction time, and always use server timestamps. Everything else is implementation detail.
If you're building something similar — a hyperlocal delivery platform, a Foodmandu competitor, or any multi-sided marketplace with real-time logistics — and you want to talk architecture, I'm Nimesh, a freelance developer based in Kathmandu working with Flutter + Django stacks.
📧 regminimesh7@gmail.com | 💬 WhatsApp +977-9814062946 | 🌐 regminimesh.com.np
Looking for a Developer?
I build high-performance mobile apps and web platforms. Available for freelance projects.
View My Services →