Project Milestone Report

Parallel Multi-Agent Path Planning for Coordinated Robotic Manipulation

Team Members: Dylan Sun (bdsun), Jongsun Park (jongsunp)

TITLE

Parallel Multi-Agent Path Planning for Coordinated Robotic Manipulation
Dylan Sun (bdsun), Jongsun Park (jongsunp)

URL

https://bdyldo.github.io/418-final-project/

SUMMARY OF PROGRESS

We now have a working end-to-end planning pipeline for our simplified MAPF setting. The current implementation includes a configurable benchmark generator, time-indexed collision detection with vertex and edge conflicts, a space-time A* low-level planner, SVG visualization for benchmark instances and solutions, correctness tests, and a benchmark harness that runs on both local machines and PSC. We also added metrics such as runtime, total cost, total time, repair iterations, A* calls, states expanded/generated, and simple path-quality indicators like wait time and detours for the final robot.

The biggest change from our original proposal is that we are no longer using traditional exact CBS as the active baseline. We implemented it, but it was not practical even on relatively modest instances: the high-level search tree grew too quickly and exact branching was too expensive to use as the foundation for our parallel study. In response, we replaced that baseline with a greedy repair planner that repeatedly detects conflicts, replans one robot at a time, and guarantees termination with a repair cap. This gives us a working sequential baseline that is fast enough to benchmark and parallelize.

WORK COMPLETED SO FAR

CHANGES FROM THE ORIGINAL PLAN

The main change is the algorithmic baseline. Our proposal centered around sequential CBS and then multiple parallel CBS variants. In practice, exact CBS was too expensive for the benchmark sizes we want to study. Even before parallelization, exact high-level branching was already overwhelming the search on cases such as few_16. That meant CBS was not giving us a useful platform for studying parallel speedup, because the sequential algorithm itself was not robust enough for the benchmark regime we care about.

Because of that, we shifted to a greedy repair formulation. This is a real change from the proposal, and it also changes the story of the project. Instead of parallelizing an exact optimal CBS implementation, we are now studying a more practical heuristic planner that still has clear conflict-resolution structure, but is less dominated by a global high-level search tree. We think this is a reasonable adjustment for a parallel systems project: it lets us focus on exposing and measuring useful concurrency instead of spending most of the project trying to stabilize a sequential exact solver.

PRELIMINARY RESULTS

We have preliminary sequential and parallel numbers on PSC for our current top-K greedy repair implementation. A few representative sequential runs are shown below.

Case Workers Runtime Repair Iterations A* Calls States Expanded
few_8_seq 1 10.290 ms 1 10 9,793
medium_16_seq 1 3867.740 ms 95 216 3,188,676
medium_32_seq 1 16757.627 ms (local) / 17395.536 ms (PSC) 90 290 12,914,198

For PSC scaling on medium_32, we observed that runtime dropped from about 17.4 s at 1 thread to about 8.2 s at 16 threads, but then largely flattened: 32, 64, and 128 threads all stayed near 8.5 s. This is only about a 2.1x speedup at 16 threads and does not scale well beyond that. The key observation is that the amount of algorithmic work is identical across thread counts: the same number of repair iterations, candidate repairs, A* calls, and states generated appear in every run. That suggests the current OpenMP version is not reducing work; it is only parallelizing a limited amount of speculative evaluation.

CURRENT OPENMP PARALLEL APPROACH

Our current OpenMP version detects all conflicts in the current global solution, sorts them by earliest timestep, takes the first K conflicts, and turns each conflict into two repair tasks: one task that constrains and replans the first robot, and one task that constrains and replans the second robot. These repair tasks are evaluated in parallel with OpenMP. After all tasks finish, the main thread compares the resulting candidate solutions and commits exactly one best candidate as the next global solution.

This approach does create parallel work, but it is also inherently wasteful. Threads can work on repair tasks involving the same robot, and after all that parallel A* work, only one candidate is committed. In other words, the algorithm still makes only one global step of progress per round, even if many threads found locally useful repairs. That is why the current top-K OpenMP version gives only modest speedup and then quickly saturates. The comparison and synchronization overhead are part of the problem, but the larger issue is algorithmic: the parallel version exposes more speculative work, not more committed progress.

WHY THE CURRENT PARALLEL VERSION IS NOT WORKING WELL

Our current conclusion is that this version is a useful stepping stone but not the final parallel algorithm we want to present.

REVISED PLAN: BATCHED MULTI-COMMIT REPAIR

The next direction we want to implement is batched multi-commit repair. Instead of taking the first K conflicts and eventually committing only one repair, we want to choose a set of conflicts that involve disjoint robots, replan those robots in parallel, and then try to commit multiple repaired paths in one round.

Our current idea is to use a greedy subset commit strategy. After a batch of repaired paths comes back, we would try to merge them into the current global solution one by one, keeping only the repaired paths that remain compatible with the already-accepted subset. This would let us keep multiple safe repairs from the same parallel round instead of discarding all but one. We expect this to make better use of high thread counts because more of the parallel work would turn into actual committed progress.

UPDATED GOALS FOR THE POSTER SESSION

POSTER SESSION PLAN

For the poster session, we plan to show:

If we have time, we would also like to show a small step-by-step visualization of one planning round: detect conflicts, generate repair tasks, run parallel replans, and then commit either one path or a safe subset of paths.

UPDATED SCHEDULE

We revised the remaining project schedule into short increments with named owners.

Dates Task Owner(s)
Apr 14 - Apr 17 Finalize milestone report, clean PSC benchmark workflow, and design disjoint-conflict batching plus greedy subset commit. Dylan, Jongsun
Apr 18 - Apr 21 Implement batched multi-commit repair. Dylan will handle conflict batching and OpenMP task structure; Jongsun will handle merge logic and subset-commit validation. Dylan, Jongsun
Apr 22 - Apr 25 Expand correctness tests, compare sequential greedy, top-K sequential, and parallel versions, and run PSC experiments on few and medium. Dylan, Jongsun
Apr 26 - Apr 29 Generate final SVGs and performance graphs, decide what to include from abundant runs, and prepare poster figures. Dylan, Jongsun
Apr 30 - May 3 Polish poster, write final report sections, and finalize the presentation story around correctness vs scalability and why the algorithm changed. Dylan, Jongsun

MAIN RISKS AND OPEN QUESTIONS

CONCLUSION

We have a working planner, a benchmark harness, PSC results, and an initial OpenMP implementation. The main honest takeaway from the milestone is that our original exact CBS plan was not viable for the scale we want, and our first OpenMP design exposes work without exposing enough committed progress. The next phase of the project is therefore not just to “optimize” the current code, but to change the parallel algorithm so that many threads can produce many useful updates in the same round.