AP Commander trains large language models to navigate enterprise Accounts Payable workflows with the rigor a CFO would require. 24 tasks. 2 AI agents. Rewards with no shortcuts.
Every episode is a complete enterprise financial scenario — generated fresh from a seeded RNG. No static dataset. The agent must reason, not memorise.
The agent receives a structured observation: vendor invoice, purchase orders, goods receipts, paid ledger, and company policy — all randomised per seed.
Multi-step reasoning up to 16 steps. Intermediate actions trigger simulated workplace actors — a vendor response, a manager escalation, a compliance review.
Five-component partial credit: decision accuracy, amount within 1%, reason code, explanation quality, and process bonus for the correct investigative sequence.
A real episode from the environment — same invoice, same policy, same reward function the model trains against. This is what a well-trained agent learns to do.
| Item | Qty | Unit | Total |
|---|---|---|---|
| ThinkPad L15 Gen-4 | 12 | $385.00 | $4,620.00 |
| Freight | $42.00 | ||