Milestone Report | 15-418 Final Project

TITLE

Parallel Multi-Agent Path Planning for Coordinated Robotic Manipulation
Dylan Sun (bdsun), Jongsun Park (jongsunp)

URL

https://bdyldo.github.io/418-final-project/

SUMMARY OF PROGRESS

We now have a working end-to-end planning pipeline for our simplified MAPF setting. The current implementation includes a configurable benchmark generator, time-indexed collision detection with vertex and edge conflicts, a space-time A* low-level planner, SVG visualization for benchmark instances and solutions, correctness tests, and a benchmark harness that runs on both local machines and PSC. We also added metrics such as runtime, total cost, total time, repair iterations, A* calls, states expanded/generated, and simple path-quality indicators like wait time and detours for the final robot.

The biggest change from our original proposal is that we are no longer using traditional exact CBS as the active baseline. We implemented it, but it was not practical even on relatively modest instances: the high-level search tree grew too quickly and exact branching was too expensive to use as the foundation for our parallel study. In response, we replaced that baseline with a greedy repair planner that repeatedly detects conflicts, replans one robot at a time, and guarantees termination with a repair cap. This gives us a working sequential baseline that is fast enough to benchmark and parallelize.

WORK COMPLETED SO FAR

Implemented grid-based benchmark generation with fixed seeds and stored benchmark cases for few, medium, and abundant instances.
Implemented sequential A* in space-time with time-indexed vertex and edge constraints.
Implemented time-step collision detection for vertex conflicts and edge swaps.
Built a sequential greedy repair baseline that returns collision-free paths and records detailed planning statistics.
Built benchmark and visualization infrastructure: input SVGs, solution SVGs, benchmark text output, and correctness tests.
Implemented a first OpenMP parallel version that evaluates repair tasks generated from the first K conflicts in parallel.
Created a separate sequential _seq benchmark path so we can compare PSC sequential and PSC parallel results fairly on the same machine.

CHANGES FROM THE ORIGINAL PLAN

The main change is the algorithmic baseline. Our proposal centered around sequential CBS and then multiple parallel CBS variants. In practice, exact CBS was too expensive for the benchmark sizes we want to study. Even before parallelization, exact high-level branching was already overwhelming the search on cases such as few_16. That meant CBS was not giving us a useful platform for studying parallel speedup, because the sequential algorithm itself was not robust enough for the benchmark regime we care about.

Because of that, we shifted to a greedy repair formulation. This is a real change from the proposal, and it also changes the story of the project. Instead of parallelizing an exact optimal CBS implementation, we are now studying a more practical heuristic planner that still has clear conflict-resolution structure, but is less dominated by a global high-level search tree. We think this is a reasonable adjustment for a parallel systems project: it lets us focus on exposing and measuring useful concurrency instead of spending most of the project trying to stabilize a sequential exact solver.

PRELIMINARY RESULTS

We have preliminary sequential and parallel numbers on PSC for our current top-K greedy repair implementation. A few representative sequential runs are shown below.

Case	Workers	Runtime	Repair Iterations	A* Calls	States Expanded
`few_8_seq`	1	10.290 ms	1	10	9,793
`medium_16_seq`	1	3867.740 ms	95	216	3,188,676
`medium_32_seq`	1	16757.627 ms (local) / 17395.536 ms (PSC)	90	290	12,914,198

For PSC scaling on medium_32, we observed that runtime dropped from about 17.4 s at 1 thread to about 8.2 s at 16 threads, but then largely flattened: 32, 64, and 128 threads all stayed near 8.5 s. This is only about a 2.1x speedup at 16 threads and does not scale well beyond that. The key observation is that the amount of algorithmic work is identical across thread counts: the same number of repair iterations, candidate repairs, A* calls, and states generated appear in every run. That suggests the current OpenMP version is not reducing work; it is only parallelizing a limited amount of speculative evaluation.

CURRENT OPENMP PARALLEL APPROACH

Our current OpenMP version detects all conflicts in the current global solution, sorts them by earliest timestep, takes the first K conflicts, and turns each conflict into two repair tasks: one task that constrains and replans the first robot, and one task that constrains and replans the second robot. These repair tasks are evaluated in parallel with OpenMP. After all tasks finish, the main thread compares the resulting candidate solutions and commits exactly one best candidate as the next global solution.

This approach does create parallel work, but it is also inherently wasteful. Threads can work on repair tasks involving the same robot, and after all that parallel A* work, only one candidate is committed. In other words, the algorithm still makes only one global step of progress per round, even if many threads found locally useful repairs. That is why the current top-K OpenMP version gives only modest speedup and then quickly saturates. The comparison and synchronization overhead are part of the problem, but the larger issue is algorithmic: the parallel version exposes more speculative work, not more committed progress.

WHY THE CURRENT PARALLEL VERSION IS NOT WORKING WELL

It evaluates many candidate repairs in parallel but commits only one candidate each round.
Different threads can still work on repairs involving the same robot, so the work is not as diverse as it looks.
The top-K parallel version is actually a more expensive algorithm than the earlier single-conflict greedy version, because it performs more low-level replanning even when run with 1 thread.
There is a synchronization barrier after each repair batch, so progress is still serialized at the end of every iteration.
The identical state counts across thread counts suggest we are not reducing search work, only overlapping some of it.

Our current conclusion is that this version is a useful stepping stone but not the final parallel algorithm we want to present.

REVISED PLAN: BATCHED MULTI-COMMIT REPAIR

The next direction we want to implement is batched multi-commit repair. Instead of taking the first K conflicts and eventually committing only one repair, we want to choose a set of conflicts that involve disjoint robots, replan those robots in parallel, and then try to commit multiple repaired paths in one round.

Our current idea is to use a greedy subset commit strategy. After a batch of repaired paths comes back, we would try to merge them into the current global solution one by one, keeping only the repaired paths that remain compatible with the already-accepted subset. This would let us keep multiple safe repairs from the same parallel round instead of discarding all but one. We expect this to make better use of high thread counts because more of the parallel work would turn into actual committed progress.

UPDATED GOALS FOR THE POSTER SESSION

Show the working sequential baseline and the OpenMP parallel top-K version.
Show sequential-vs-parallel timing and speedup graphs on PSC for at least few and medium cases.
Implement and compare the batched multi-commit repair version against the current single-commit top-K version.
Be explicit about the change from exact CBS to greedy repair and explain why that change was necessary.
Devise ways to parallelize low-level A* search.

POSTER SESSION PLAN

For the poster session, we plan to show:

input and solution SVGs for representative few, medium, and possibly abundant cases,
speedup and scalability graphs from PSC runs,
a table or bar chart showing how many repair tasks were evaluated versus how many repairs were actually committed,
a diagram of the current top-K algorithm and the proposed batched multi-commit replacement, and
a short explanation of why the first OpenMP attempt plateaued and what architectural change we are making next.

If we have time, we would also like to show a small step-by-step visualization of one planning round: detect conflicts, generate repair tasks, run parallel replans, and then commit either one path or a safe subset of paths.

UPDATED SCHEDULE

We revised the remaining project schedule into short increments with named owners.

Dates	Task	Owner(s)
Apr 14 - Apr 17	Finalize milestone report, clean PSC benchmark workflow, and design disjoint-conflict batching plus greedy subset commit.	Dylan, Jongsun
Apr 18 - Apr 21	Implement batched multi-commit repair. Dylan will handle conflict batching and OpenMP task structure; Jongsun will handle merge logic and subset-commit validation.	Dylan, Jongsun
Apr 22 - Apr 25	Expand correctness tests, compare sequential greedy, top-`K` sequential, and parallel versions, and run PSC experiments on `few` and `medium`.	Dylan, Jongsun
Apr 26 - Apr 29	Generate final SVGs and performance graphs, decide what to include from `abundant` runs, and prepare poster figures.	Dylan, Jongsun
Apr 30 - May 3	Polish poster, write final report sections, and finalize the presentation story around correctness vs scalability and why the algorithm changed.	Dylan, Jongsun

MAIN RISKS AND OPEN QUESTIONS

How should we choose a high-quality set of disjoint conflicts to resolve together in each round?
How aggressive can greedy subset commit be before it starts throwing away too much useful work or introducing too many new collisions?
Can batched multi-commit repair produce substantially better scaling than top-K single-commit repair, especially past 16 threads?
How much solution quality are we sacrificing by leaving exact CBS behind, and what is the right way to explain that tradeoff in the final report?
Do we need additional deterministic tie-breaking in A* so that runs on different machines produce more comparable path shapes?

CONCLUSION

We have a working planner, a benchmark harness, PSC results, and an initial OpenMP implementation. The main honest takeaway from the milestone is that our original exact CBS plan was not viable for the scale we want, and our first OpenMP design exposes work without exposing enough committed progress. The next phase of the project is therefore not just to “optimize” the current code, but to change the parallel algorithm so that many threads can produce many useful updates in the same round.