Automated PDF Retrieval from Financial Accounting System — Saving 92 Hours per Year
Quick update on today's work.
I built a tool to automate the retrieval of payment voucher PDFs from a financial accounting system (GLOVIAUX).
The Problem
As part of budget execution tracking, payment voucher PDFs need to be periodically retrieved and archived.
This was an entirely manual process. The workflow for each voucher looked like this:
- Log in to the financial accounting system
- Enter a budget detail code and search
- Click a voucher number to open the detail screen
- Click "Print Voucher" → "Preview"
- Save the displayed PDF
Each item took about 2 minutes. With roughly 3,000 items per year, over 100 hours annually were spent on this repetitive task.
The Solution
I automated the entire workflow using Python and Playwright as an RPA (Robotic Process Automation) solution.
Key Design Decisions
- CDP connection to existing browser: Connects to an already-running Edge browser via CDP, bypassing the login process entirely
- Non-engineer friendly: Voucher number lists are managed in Excel, making it easy for non-technical staff to operate
- Automatic PDF extraction: Uses pyMuPDF to extract only the pages matching the target budget code from the downloaded PDFs
- Simple deployment: Distributed as an Edge launcher batch file + exe — just two clicks to run
Results
| Before | After | |
|---|---|---|
| Time per item | ~2 min | ~10 sec |
| Annual processing time (3,000 items) | ~100 hours | ~8 hours |
| Time saved | 92 hours/year |
Tech Stack
Python Playwright pyMuPDF openpyxl PyInstaller
RPA and workflow automation is a space I'll continue exploring alongside SheetToolBox. If you have repetitive tasks you'd like to automate, feel free to reach out.