Why Routing Is Hard
Routing a fleet of vehicles to serve a set of geographically distributed demand points is deceptively simple to describe and notoriously difficult to solve well. The problem is not just about finding short paths, it is about simultaneously deciding which vehicle serves which demand, in what order and subject to how much each vehicle can carry, how long each route may take and how much the whole operation costs.
This family of problems known as the Capacitated Vehicle Routing Problem (CVRP) and its many variants, is NP-hard. As the number of demand points and vehicles grows, the space of possible assignments and orderings grows rapidly, making exhaustive search computationally infeasible at realistic scale.
PARCEL, short for Parallel Assignment and Routing with Clustered Ensemble Logic, is a heuristic framework that explicitly accepts this tradeoff. It produces a feasible solution quickly rather than guaranteeing an optimal one through a principled sequence of stages: geographic preprocessing, demand-aware seeding, greedy construction, multi-stage repair and mode-specific refinement. Every decision is traceable to a local score, a threshold, or an objective formula. Nothing is a black box.
This article walks through the framework’s design, the foundations of each stage, and the reasoning behind the choices made at every step.
The Problem, Formally
Let C = {c₁, …, cₙ} be a set of demand clusters, and F = {f₁, …, fₘ} a heterogeneous fleet of vehicles, where each vehicle f ∈ F has a capacity cap(f).
Each cluster c ∈ C carries a demand qc (the number of service units it represents) and a geographic centroid µc.
A routing oracle R maps an ordered sequence of stops ρ to two real-valued metrics:
where t(ρ) is the travel duration in seconds and ℓ(ρ) is the total distance in meters, both computed over a real road network.
The goal is to partition C into a set of bundles B = {B₁, …, Bₖ}, where each bundle Bᵢ = (Cᵢ, ρᵢ, Dᵢ, fᵢ) consists of an assigned subset of clusters Cᵢ, a stop sequence ρᵢ, a total demand Dᵢ = qc, and an assigned vehicle fᵢ.
Two constraints are hard — they cannot be violated:
Capacity:
Duration:
Everything else from route compactness, utilization efficiency, total distance to fleet size is a soft preference that influences construction and refinement but does not render a solution infeasible.
Stage 1: Geographic Preprocessing with DBSCAN
Before any routing decisions are made, raw demand points are grouped into clusters using DBSCAN (Density-Based Spatial Clustering of Applications with Noise). This stage exists for a fundamental reason: routing algorithms that operate on individual points scale poorly while those that operate on geographic groups scale gracefully. Clustering trades some fine-grained optimality for a dramatic reduction in complexity.
DBSCAN was chosen over alternatives such as k-means or agglomerative clustering for three reasons.
- First, it does not require specifying the number of clusters in advance; it discovers structure from the data itself.
- Second, it supports arbitrary cluster shapes, which matters when demand is distributed along roads or neighborhood boundaries rather than in neat circular patterns.
- Third, it naturally identifies noise points, isolated demand locations that do not belong to any dense group, which PARCEL can handle separately through its rescue mechanism.
The algorithm operates over Haversine distance, the great-circle distance between two geographic coordinates:
with R (Earth radius) = 6,371 km. Haversine distance is not the same as road-network distance, but it is fast to compute and monotonically consistent enough to rank nearby points. It is used throughout the early stages of PARCEL precisely because it is cheap, deterministic and adequate for local geometric reasoning. Road-network distances are reserved for feasibility checking and final cost evaluation where accuracy is required the most.
DBSCAN takes two parameters: ε, the neighborhood radius (in kilometers), and minPoints, the minimum number of points required to form a dense region. Increasing ε creates larger, sparser clusters; decreasing it produces smaller, denser clusters and more noise. Each resulting cluster is enriched with its total demand and its geographic centroid before being passed to the next stage.
Stage 2: Seeding — Ordering Matters More Than It Looks
Greedy algorithms are path-dependent. The order in which clusters are considered during route construction directly determines which clusters consume fleet capacity early and which are left to compete for whatever remains. A naive insertion order, say alphabetical or arbitrary can strand large or remote clusters at the end, when no feasible vehicle remains to serve them.
PARCEL’s seeding stage computes a priority score for each cluster before construction begins, so that the most difficult-to-place clusters are handled first.
Two signals are combined.
Demand pressure measures how large a cluster is relative to the largest vehicle available:
A cluster that fills an entire vehicle scores 1. Smaller clusters score proportionally. Large clusters are prioritized because they have fewer feasible vehicles and fewer feasible insertion opportunities — the more they compete, the worse they fare.
Geographic isolation measures how far a cluster is from all other clusters on average:
Remote clusters are prioritized because they are costly to attach to routes later, and because nearby routes constructed from denser neighborhoods may consume all available capacity, leaving the isolated cluster unserviceable.
These two signals are combined into a single weighted seed score:
where is isolation normalized by the maximum isolation observed in the instance, and the weights are set to wᵈ = 0.6 and wⁱ = 0.4.
The weight rationale is deliberate. Capacity is a hard constraint; a cluster that exceeds vehicle capacity cannot be placed regardless of its isolation. Demand pressure therefore receives the higher weight because a capacity violation cannot be negotiated. Isolation is still significant because a geographically stranded cluster that is not surfaced early may become permanently unserviceable once nearby vehicles fill up. The 60/40 split is a calibrated heuristic choice, not a mathematically derived optimum, and should be revisited against benchmark data.
Stage 3: Greedy Bundle Construction
With clusters ordered by seed score, construction proceeds greedily. At each step, the algorithm considers the next unassigned cluster and asks:
which open bundle, if any, should receive it?
At most K bundles may be open simultaneously. K is a width parameter that controls construction breadth/parallelism. A small K encourages tight packing; clusters must fit into few open bundles or trigger the opening of a new one. A larger K increases flexibility by keeping more bundles in competition at each step.
K is a ceiling, not a target: new bundles are opened only when existing open bundles are not acceptable.
When a new bundle is opened, the framework assigns a vehicle using a best-fit policy: the smallest available vehicle that can satisfy the cluster’s demand. This preserves larger vehicles for larger future demands, a well-established strategy in bin-packing heuristics. A largest-first policy would reduce early allocation failures but waste high-capacity vehicles.
Bundles are sealed, removed from further competition, once they reach a sufficient fill ratio θ. A bundle with fill ratio is considered full enough that adding more clusters risks creating infeasible routes. This prevents overpacking and reduces unnecessary feasibility checks against routes that are already close to their limits.
Stage 4: Local Fit Scoring
During construction, every candidate (cluster, bundle) pair is scored to select the best insertion. This local fit score must be fast, evaluating road-network routes for every pair at every step would be computationally expensive and prohibitive.
The local score combines three signals:
Route insertion cost Δr(c, B) estimates the geometric cost of inserting cluster c into bundle B using Haversine distance. It captures how much the route would need to detour to include the new cluster without calling the routing oracle.
Capacity slack measures how full the bundle would become after insertion:
The slack penalty is:
A nearly full bundle scores near zero on this term, rewarding high utilization. This incentivizes packing: clusters should fill vehicles efficiently, not scatter across many lightly loaded routes.
Centroid proximity d(µc, µB) measures the geographic distance between the cluster’s centroid and the bundle’s centroid. It penalizes long-distance attachments that would create spatially incoherent routes.
These three signals are combined:
with weights wᵣ = 0.5, wₛ = 0.3, wₚ = 0.2.
The weight ordering reflects their relative influence on route quality. Route insertion cost receives the largest share because it most directly predicts how much the final route will grow; a geometrically expensive insertion now propagates into final distance. Capacity slack is second because utilization affects how many vehicles will ultimately be needed, but geometry should dominate over packing. Centroid proximity is third because it is partly redundant with insertion cost — a geographically distant cluster will already have high insertion cost but adds a coarser signal about overall route cohesion.
Crucially, local scores are not final objectives. A locally good insertion can lead to a suboptimal solution, and a globally better arrangement may require a locally worse intermediate step. The local score answers only:
“Should this cluster enter this bundle now?”
The objective function evaluated later, asks:
“How good is the complete solution?”
Stage 5: Feasibility Checking
Before any bundle is accepted, it must pass two hard feasibility tests.
Capacity feasibility is binary and deterministic:
If demand exceeds vehicle capacity, the candidate is rejected immediately, without consulting the routing oracle.
Duration feasibility requires a road-network query:
Here, Tₘₐₓ is the maximum allowable route duration. This is a hard operational deadline — routes that run too long cannot be accepted regardless of their distance.
An important design choice: distance is not used as a feasibility criterion. The reason is that a short route through slow roads may take longer than a longer route on faster roads. Duration is the operationally meaningful constraint, it determines whether service is completed within the required window. Distance is used only in the cost objective, where it serves as a proxy for fuel and operating cost.
The road-network oracle is invoked only when capacity feasibility passes. This avoids expensive routing calls for obviously infeasible candidates.
Stage 6: Objective Functions
Once a solution is constructed, it is evaluated according to one of three mode-specific objectives.
Cost mode minimizes total route distance, used as a proxy for fuel and vehicle operating cost (to be revisioned):
The unit is kilometers. Distance is an appropriate cost proxy because fuel consumption, wear, and mileage costs are all roughly linear in distance and additive across vehicles, even vehicles running in parallel accumulate separate costs. A more precise formulation would multiply by a fuel rate rᶠ, yielding zᶠᵘᵉˡ = rᶠ · Σ ℓB / 1000, but the structure remains the same.
Fleet mode minimizes the number of active bundles:
This approximates the number of vehicles deployed. Minimizing fleet size reduces fixed vehicle costs, driver headcount, and administrative complexity.
Hybrid mode combines both:
The coefficient 3600 converts fleet count into seconds (to be revisioned), a heuristic scaling factor that makes the two terms comparable in magnitude. The hybrid value is meaningful only as a ranking score, not as a physical quantity with interpretable units.
A more complete economic objective would integrate all operational cost components:
where rᶠ is fuel cost per kilometer, rᵈ is driver wage per second, rᵥ is fixed vehicle usage cost, rₘ is maintenance cost per kilometer, and rₑ is an emissions penalty. This formulation would make cost objectives interpretable in monetary terms and allow operators to calibrate the tradeoff between operating cost and fleet utilization based on real-world rates.
Stage 7: Rescue and Recovery
Greedy construction inevitably fails for some clusters. Early allocation decisions may consume capacity in ways that leave later clusters with no feasible vehicle. Rather than immediately rejecting unplaceable clusters or rebuilding the entire solution from scratch, PARCEL applies a staged rescue process that attempts increasingly invasive local repairs.
Geographic split rescue divides a cluster between the two nearest bundle centroids. If no single bundle can absorb the cluster, perhaps two nearby bundles each absorb a portion. This is most appropriate in fleet-oriented modes where distributing demand across existing routes preserves vehicle count.
Resequencing attempts to insert the cluster into an existing bundle and rebuild its stop sequence from scratch. This is less invasive than eviction, existing clusters remain in place but it allows a fresh geometric arrangement to pass duration feasibility where the previous sequence failed.
Evict-swap sacrifices the worst-fitting cluster currently in a bundle to make room for the incoming cluster, then attempts to rehome the evicted cluster elsewhere. This is a controlled local exchange: one displacement for one placement, with a secondary placement attempt.
Failure escalation applies when all repair stages fail. The cluster is ejected and the framework attempts to open a new bundle on a fresh vehicle. If no vehicle remains available, the cluster is recorded as unserviced.
The multi-stage structure is preferable to both immediate rejection and full reconstruction. Immediate rejection wastes potentially recoverable capacity. Full reconstruction is computationally expensive and discards the work already done. Staged repair preserves the bulk of the constructed solution while allowing controlled, local modifications targeted at specific failures.
Stage 8: Refinement
After construction and rescue, refinement improves the solution without risking feasibility.
Consolidation (in fleet mode) attempts to merge pairs of bundles when the combined demand fits within a single vehicle. Merges are accepted when the resulting bundle remains both capacity- and duration-feasible, directly reducing fleet size.
Local search (in cost and hybrid modes) tries cluster relocations and bundle merges evaluated against the final objective. A relocation moves a cluster from its current bundle to a better-fitting alternative. A merge combines two bundles and accepts the result only if it improves the objective. Rather than using local fit scores, the final objective ensures that refinement is aligned with what the solution is actually being optimized for.
Refinement is deliberately separated from construction. Construction prioritizes feasibility and speed; refinement improves the result once a complete candidate exists. This coarse-to-fine structure is standard in heuristic routing, and it keeps each phase focused on a single concern.
What PARCEL Does and Does Not Claim
PARCEL is a heuristic framework, not an exact solver. It does not guarantee optimal solutions. What it does provide is:
- Feasibility by design: no solution it produces violates capacity or duration constraints.
- Explainability: every decision is traceable to a formula, a weight, or a threshold.
- Modularity: each stage: clustering, seeding, construction, rescue, refinement and routing are independent and replaceable.
- Extensibility: the framework is designed to accept richer objectives, metaheuristic search layers, and calibrated economic models as engineering investment grows.
The design acknowledges its own limitations honestly. Hard-coded weights — 0.6/0.4 for seeding, 0.5/0.3/0.2 for local scoring, 0.5/0.5 for the hybrid objective are heuristic starting points, not empirically calibrated constants. A rigorous system would derive these from benchmark experiments, sensitivity analysis, and operator feedback.
The Road Ahead
The framework’s roadmap identifies several directions for future development, in roughly increasing order of complexity.
Economic cost modeling replaces the implicit fuel-rate-of-1 with real calibrated rates for fuel, driver wages, maintenance, and emissions, producing objectives with genuine monetary interpretation.
Makespan optimization adds zₘₐₖₑₛₚₐₙ = max_B t(ρ_B), minimizing the time until the last vehicle completes its route rather than the sum across all vehicles.
Multi-objective optimization exposes the Pareto frontier between distance, fleet size, duration, and service quality, giving operators a principled way to select operating policies.
Metaheuristic search: genetic algorithms, large neighborhood search andsimulated annealing can be layered on top of the existing construction pipeline to explore broader solution neighborhoods than greedy construction allows.
Route metric caching reduces repeated road-network queries by storing computed (duration, distance) pairs for previously evaluated stop sequences, dramatically accelerating refinement and search.
Each of these extensions is compatible with PARCEL’s modular architecture. The framework is deliberately designed to evolve not to be replaced.
Summary
Good routing frameworks separate what they know from what they estimate, what is hard from what is soft, and what is fast from what is accurate. PARCEL does all three.
Duration is used for feasibility because it captures whether service is completed in time, the operationally meaningful deadline. Distance is used for cost because fuel and wear are proportional to kilometers traveled, even across vehicles running simultaneously. Haversine geometry is used for local heuristics because it is fast and monotonic. Road-network metrics are reserved for the decisions where accuracy matters most.
The seeding weights (0.6 demand, 0.4 isolation) reflect the primacy of capacity as a hard constraint. The local scoring weights (0.5 insertion, 0.3 slack, 0.2 proximity) reflect the dominance of route geometry over packing efficiency over spatial cohesion. The rescue stages reflect the value of preserving partial work over the cost of full reconstruction.
None of these choices are arbitrary. Each one is a defensible response to a specific structural problem in vehicle routing. The ability to point at any decision and explain why it was made is arguably, the framework’s most valuable contribution.
PARCEL is a proof-of-concept heuristic framework. All weights, thresholds, and parameter values referenced in this article are current defaults intended as calibration baselines, not empirically optimized constants.
