Neural Networks for Route Management of Driverless Car Fleets

Berlin, and the question nobody in the room could answer cleanly
The IoT Connected Smart Cars & Vehicles Forum in the autumn of 2018 was held in one of those Berlin conference halls that manage to feel both cavernous and airless. The audience skewed European automotive — Tier 1 suppliers, a few OEM research teams, mapping companies, and a surprising number of people from municipal transport authorities who had figured out, correctly, that the interesting fight was not about a single self-driving car but about what happens when you have a thousand of them sharing a city.
That was the talk I had come to give. Almost everyone else on the agenda was focused on the vehicle: perception, sensor fusion, the neural networks that turned camera and lidar frames into "that is a pedestrian." Important work, and largely solved in principle by 2018 even if far from solved in practice. I wanted to talk about the layer above the vehicle — the fleet. If you had a fleet of driverless cars, who decided where they went, how they were routed, and how they coordinated so they did not all pile onto the same optimal street and turn it into a parking lot?
I opened with a question to the room: "You have solved the car. Now solve the thousand." The nervous laughter told me I had the right audience.
The argument: routing a fleet is a learning problem, not a lookup
The instinct in 2018 was to treat routing as a solved, classical problem. Shortest-path algorithms had existed for decades. Navigation apps already did live traffic rerouting. So why bring neural networks and reinforcement learning into it at all?
My answer was that classical routing optimizes one trip against a static or slowly-changing snapshot of the world. A fleet is different in two ways. First, the vehicles are not independent — every routing decision you make changes the traffic that every other vehicle experiences. Route a hundred cars down the "fastest" street and it is no longer the fastest street. Second, demand and conditions shift continuously, and the value of a decision only reveals itself later, when you see whether the rider was picked up on time and whether the network stayed balanced.
That combination — decisions that interact, and rewards that arrive with a delay — is precisely the shape of a reinforcement-learning problem. You have a state (where the vehicles are, where the demand is, how the roads are flowing), a set of actions (dispatch, reroute, reposition an idle car toward predicted demand), and a reward signal that you can only evaluate over time (rider wait, vehicle utilization, total network congestion). Deep neural networks entered the picture because the state space of a whole city is far too large to hold in a lookup table; you need a function approximator to generalize across situations you have never seen exactly before.
I was careful about the era's honesty here. In 2018, deep reinforcement learning had produced genuinely stunning results in games and simulation, and much shakier results in the physical, safety-critical world. So I did not stand up in Berlin and claim we could turn an RL agent loose on live traffic. The responsible architecture was layered: learn and evaluate policies in high-fidelity simulation, use supervised and self-supervised learning on real fleet logs to predict demand and travel times, and keep the online, in-the-world component conservative and bounded, with classical safety constraints wrapped around anything a learned policy proposed.
Why enterprises cared: the economics live in the fleet, not the car
The point that landed hardest with the commercial people in the room was that the money in autonomous mobility was never going to be in the individual vehicle. It was in fleet utilization. A car that drives itself flawlessly but sits idle 60% of the day, or deadheads across the city empty to reach its next rider, is an economic failure no matter how good its perception stack is.
So fleet routing was where the return on investment lived, and it was a problem that got harder, not easier, as the fleet grew. A neural policy that learned to anticipate demand — to reposition idle vehicles toward the district where the evening rush was about to start, before the requests arrived — was worth more than any incremental improvement in a single car's driving. I framed it as the difference between a very good driver and a very good dispatcher. We had spent a decade building the driver. The dispatcher was the open problem.
A concrete example: repositioning before the rush
The example I walked through was a simplified evening commute. Requests in a business district collapse to near zero by seven, then a wave builds in the entertainment districts an hour later. A naive fleet waits for requests and reacts, which means empty cars scramble across town after demand has already appeared, riders wait, and half the fleet ends up in the wrong place.
A fleet trained with reinforcement learning on months of historical demand learns the pattern without being told it explicitly. It learns that idle vehicles in the emptying business district have higher long-term value if they drift toward the entertainment districts now, at a modest immediate cost, because the reward that arrives twenty minutes later is large. That is a classic delayed-reward trade-off, and it is exactly the kind of decision that a myopic shortest-path system cannot make and a learned value function can.
But — and I spent real time on this — you cannot deploy that policy blind. I stressed interpretability and the discipline we were starting to call MLOps: you need to be able to ask the routing model why it just sent forty cars east, monitor it for the drift that comes when a new stadium or a construction project rewrites the city's demand map, and hold a tested fallback policy ready for the day the learned one behaves strangely. A fleet is a safety-critical, revenue-critical production system. The model is one component in it, and it has to be observable like every other component.
The bridge to now
I left Berlin convinced that the hard, valuable work in autonomous systems was coordination — many models, in production, learning continuously from real outcomes, and accountable enough that a human could interrogate any decision. The tools of 2018 made that aspiration far more than they could fully deliver, and I said so on stage.
Those same principles — models running in production, learning from what actually happens, and explaining the decisions they make — are what we now operationalize at StudioX as autonomous AI workers and Missions. The domain has moved well beyond fleets, but the discipline of coordinating many accountable, continuously-learning decisions is the throughline from that Berlin hall to today.
Related on StudioX: Enterprise AI Platform · AI Workers · AI Missions
Discussion
No comments yet — start the conversation.