You can paste a contract into an AI tool today and get a clean table back in seconds. The demo looks like magic. Then you run it on a thousand real documents and the magic breaks, quietly, in a way you do not see until it costs you.
I learned this on a project for a boutique safari agency in South Africa. They had a decade of supplier pricing sitting in PDF and Excel folders, read by hand into a spreadsheet for every client quote. They wanted it structured and loaded into their CRM. Five hundred and eleven suppliers. Thousands of products. Tens of thousands of seasonal rates.
They had already tried a general AI tool. It produced a draft. The draft was random and impossible to check. And that word, check, is the whole story.
The extraction is the easy half
Reading a contract is what language models are good at. Point one at a rate sheet and it will pull out properties, room types, seasons, and prices. For a clean document it works on the first try.
The problem is not getting a number out. The problem is knowing whether the number is right. Across 6,731 products you cannot eyeball the output. One rate keyed wrong does not round off a few dollars. In luxury travel a per-room rate read as per-person is a 2x to 7x overcharge, and on the page it looks completely normal. That single error erases the margin on a trip or loses the booking. The founder put it plainly: a slight mistake equals very big financial loss.
So the real question is never whether AI can extract this. It is which of these thousands of numbers is wrong, and whether you find it before a client sees it.
Never let the model have the last word
The fix is a rule: AI reads the patterns, code makes the decisions. Keep them apart on purpose.
A language model is excellent at reading a messy layout and proposing structure. It is also confident when it is wrong. If the model output flows straight into your database, its confident mistakes flow in with it. So you put deterministic code between the model and the final value. Code decides. The model only suggests.
In practice that meant a few things working together. A triage pass sorts every file first and routes it to the right path, because native-text PDFs, scanned PDFs, and Excel sheets each need different handling and no two suppliers lay theirs out the same way. Real rate tables get parsed by code wherever the structure allows it, because a number you can read deterministically you should read deterministically. The hard documents go through three independent AI passes, and a value locks only when all three agree. Disagreement is not averaged away. It escalates to a stronger model. Consensus is your cheapest error detector. And a vision model reads the scanned pages the text layer cannot, so nothing falls through because a PDF was really a photo.
Trace every value back to its source
Here is the step that separates a real pipeline from a clever demo. Every rate is checked back against the page it came from before it ships.
This does two things. It catches the errors that survived extraction. And it gives you something to point at when someone asks why a price is what it is. A number you can trace to a contract is a number you can defend. A number with no source trail is a liability.
On this project the verification layer caught 24 pricing-basis errors before go-live. Twenty-four trips that would have been quoted wrong. That is the work paying for itself before the system is even live.
Deliver it ready to use, not ready to fix
The last stage is unglamorous and it matters. The extracted data was merged into the CRM import format with currency, tax fields, and stay-length variants already set, so a trip designer picks the right rate without doing math.
Structured data that still needs interpretation is not finished. The goal is output someone uses directly, on the first try, with no second cleanup pass. If the person on the other end has to verify your verification, you stopped one step too early.
The result: 511 suppliers, 6,731 products, and 8,854 seasonal rate periods, live in the CRM, every price traced to the contract it came from. You can read the full build in the supplier-contracts case study.
The lesson holds beyond contracts
Invoices, resumes, lab results, insurance forms. Any time you point AI at a pile of documents and pull structured data out, the same shape applies. Extraction is the part that demos well. Verification is the part that makes it safe to ship. Build the second half with the same care as the first, and put code, not the model, in charge of the final number.
If you have a pile of documents you have been reading by hand, here is how we work together.