TITLE
Parallel Multi-Agent Path Planning for Coordinated Robotic Manipulation
Dylan Sun (bdsun), Jongsun Park (jongsunp)
URL
SUMMARY OF PROGRESS
We now have a working end-to-end planning pipeline for our simplified MAPF setting. The current implementation includes a configurable benchmark generator, time-indexed collision detection with vertex and edge conflicts, a space-time A* low-level planner, SVG visualization for benchmark instances and solutions, correctness tests, and a benchmark harness that runs on both local machines and PSC. We also added metrics such as runtime, total cost, total time, repair iterations, A* calls, states expanded/generated, and simple path-quality indicators like wait time and detours for the final robot.
The biggest change from our original proposal is that we are no longer using traditional exact CBS as the active baseline. We implemented it, but it was not practical even on relatively modest instances: the high-level search tree grew too quickly and exact branching was too expensive to use as the foundation for our parallel study. In response, we replaced that baseline with a greedy repair planner that repeatedly detects conflicts, replans one robot at a time, and guarantees termination with a repair cap. This gives us a working sequential baseline that is fast enough to benchmark and parallelize.
WORK COMPLETED SO FAR
- Implemented grid-based benchmark generation with fixed seeds and stored benchmark cases for
few,medium, andabundantinstances. - Implemented sequential A* in space-time with time-indexed vertex and edge constraints.
- Implemented time-step collision detection for vertex conflicts and edge swaps.
- Built a sequential greedy repair baseline that returns collision-free paths and records detailed planning statistics.
- Built benchmark and visualization infrastructure: input SVGs, solution SVGs, benchmark text output, and correctness tests.
- Implemented a first OpenMP parallel version that evaluates repair tasks generated from the first
Kconflicts in parallel. - Created a separate sequential
_seqbenchmark path so we can compare PSC sequential and PSC parallel results fairly on the same machine.
CHANGES FROM THE ORIGINAL PLAN
The main change is the algorithmic baseline. Our proposal centered around sequential CBS and then multiple parallel CBS variants. In practice, exact CBS was too expensive for the benchmark sizes we want to study. Even before parallelization, exact high-level branching was already overwhelming the search on cases such as few_16. That meant CBS was not giving us a useful platform for studying parallel speedup, because the sequential algorithm itself was not robust enough for the benchmark regime we care about.
Because of that, we shifted to a greedy repair formulation. This is a real change from the proposal, and it also changes the story of the project. Instead of parallelizing an exact optimal CBS implementation, we are now studying a more practical heuristic planner that still has clear conflict-resolution structure, but is less dominated by a global high-level search tree. We think this is a reasonable adjustment for a parallel systems project: it lets us focus on exposing and measuring useful concurrency instead of spending most of the project trying to stabilize a sequential exact solver.
PRELIMINARY RESULTS
We have preliminary sequential and parallel numbers on PSC for our current top-K greedy repair implementation. A few representative sequential runs are shown below.
| Case | Workers | Runtime | Repair Iterations | A* Calls | States Expanded |
|---|---|---|---|---|---|
few_8_seq |
1 | 10.290 ms | 1 | 10 | 9,793 |
medium_16_seq |
1 | 3867.740 ms | 95 | 216 | 3,188,676 |
medium_32_seq |
1 | 16757.627 ms (local) / 17395.536 ms (PSC) | 90 | 290 | 12,914,198 |
For PSC scaling on medium_32, we observed that runtime dropped from about 17.4 s at 1 thread to about 8.2 s at 16 threads, but then largely flattened: 32, 64, and 128 threads all stayed near 8.5 s. This is only about a 2.1x speedup at 16 threads and does not scale well beyond that. The key observation is that the amount of algorithmic work is identical across thread counts: the same number of repair iterations, candidate repairs, A* calls, and states generated appear in every run. That suggests the current OpenMP version is not reducing work; it is only parallelizing a limited amount of speculative evaluation.
CURRENT OPENMP PARALLEL APPROACH
Our current OpenMP version detects all conflicts in the current global solution, sorts them by earliest timestep, takes the first K conflicts, and turns each conflict into two repair tasks: one task that constrains and replans the first robot, and one task that constrains and replans the second robot. These repair tasks are evaluated in parallel with OpenMP. After all tasks finish, the main thread compares the resulting candidate solutions and commits exactly one best candidate as the next global solution.
This approach does create parallel work, but it is also inherently wasteful. Threads can work on repair tasks involving the same robot, and after all that parallel A* work, only one candidate is committed. In other words, the algorithm still makes only one global step of progress per round, even if many threads found locally useful repairs. That is why the current top-K OpenMP version gives only modest speedup and then quickly saturates. The comparison and synchronization overhead are part of the problem, but the larger issue is algorithmic: the parallel version exposes more speculative work, not more committed progress.
WHY THE CURRENT PARALLEL VERSION IS NOT WORKING WELL
- It evaluates many candidate repairs in parallel but commits only one candidate each round.
- Different threads can still work on repairs involving the same robot, so the work is not as diverse as it looks.
- The top-
Kparallel version is actually a more expensive algorithm than the earlier single-conflict greedy version, because it performs more low-level replanning even when run with 1 thread. - There is a synchronization barrier after each repair batch, so progress is still serialized at the end of every iteration.
- The identical state counts across thread counts suggest we are not reducing search work, only overlapping some of it.
Our current conclusion is that this version is a useful stepping stone but not the final parallel algorithm we want to present.
REVISED PLAN: BATCHED MULTI-COMMIT REPAIR
The next direction we want to implement is batched multi-commit repair. Instead of taking the first K conflicts and eventually committing only one repair, we want to choose a set of conflicts that involve disjoint robots, replan those robots in parallel, and then try to commit multiple repaired paths in one round.
Our current idea is to use a greedy subset commit strategy. After a batch of repaired paths comes back, we would try to merge them into the current global solution one by one, keeping only the repaired paths that remain compatible with the already-accepted subset. This would let us keep multiple safe repairs from the same parallel round instead of discarding all but one. We expect this to make better use of high thread counts because more of the parallel work would turn into actual committed progress.
UPDATED GOALS FOR THE POSTER SESSION
- Show the working sequential baseline and the OpenMP parallel top-
Kversion. - Show sequential-vs-parallel timing and speedup graphs on PSC for at least
fewandmediumcases. - Implement and compare the batched multi-commit repair version against the current single-commit top-
Kversion. - Be explicit about the change from exact CBS to greedy repair and explain why that change was necessary.
- Devise ways to parallelize low-level A* search.
POSTER SESSION PLAN
For the poster session, we plan to show:
- input and solution SVGs for representative
few,medium, and possiblyabundantcases, - speedup and scalability graphs from PSC runs,
- a table or bar chart showing how many repair tasks were evaluated versus how many repairs were actually committed,
- a diagram of the current top-
Kalgorithm and the proposed batched multi-commit replacement, and - a short explanation of why the first OpenMP attempt plateaued and what architectural change we are making next.
If we have time, we would also like to show a small step-by-step visualization of one planning round: detect conflicts, generate repair tasks, run parallel replans, and then commit either one path or a safe subset of paths.
UPDATED SCHEDULE
We revised the remaining project schedule into short increments with named owners.
| Dates | Task | Owner(s) |
|---|---|---|
| Apr 14 - Apr 17 | Finalize milestone report, clean PSC benchmark workflow, and design disjoint-conflict batching plus greedy subset commit. | Dylan, Jongsun |
| Apr 18 - Apr 21 | Implement batched multi-commit repair. Dylan will handle conflict batching and OpenMP task structure; Jongsun will handle merge logic and subset-commit validation. | Dylan, Jongsun |
| Apr 22 - Apr 25 | Expand correctness tests, compare sequential greedy, top-K sequential, and parallel versions, and run PSC experiments on few and medium. |
Dylan, Jongsun |
| Apr 26 - Apr 29 | Generate final SVGs and performance graphs, decide what to include from abundant runs, and prepare poster figures. |
Dylan, Jongsun |
| Apr 30 - May 3 | Polish poster, write final report sections, and finalize the presentation story around correctness vs scalability and why the algorithm changed. | Dylan, Jongsun |
MAIN RISKS AND OPEN QUESTIONS
- How should we choose a high-quality set of disjoint conflicts to resolve together in each round?
- How aggressive can greedy subset commit be before it starts throwing away too much useful work or introducing too many new collisions?
- Can batched multi-commit repair produce substantially better scaling than top-
Ksingle-commit repair, especially past 16 threads? - How much solution quality are we sacrificing by leaving exact CBS behind, and what is the right way to explain that tradeoff in the final report?
- Do we need additional deterministic tie-breaking in A* so that runs on different machines produce more comparable path shapes?
CONCLUSION
We have a working planner, a benchmark harness, PSC results, and an initial OpenMP implementation. The main honest takeaway from the milestone is that our original exact CBS plan was not viable for the scale we want, and our first OpenMP design exposes work without exposing enough committed progress. The next phase of the project is therefore not just to “optimize” the current code, but to change the parallel algorithm so that many threads can produce many useful updates in the same round.