Skip to content
Tamas Demeter
Luxury Travel / Tour Operator/Phase 1 delivery/Boutique luxury safari agency, around ten people, South Africa

511 Supplier Contracts Into a Working CRM Catalog, Not One at the Wrong Price

A boutique luxury safari agency in South Africa had a decade of supplier pricing sitting in PDF and Excel folders, read by hand into spreadsheets for every quote. I built the structured catalog their tour-operator CRM runs on. 511 suppliers, 6,731 products, and 8,854 seasonal rate periods, extracted from the source contracts and delivered into the CRM in working order, with every price traced back to the contract it came from.

Role
Solo build, architecture, extraction pipeline, QC, CRM delivery
Tools
Python ,pdfplumber ,Claude (multi-model) ,Vision OCR ,CRM API
Client review

It was a pleasure working with Tamas. He is a real professional in his field. He has definitely overdelivered on this project and I truly value his work. He delivered a comprehensive AI-driven data transfer for my company. Thank you Tamas!

Ksenia M. / Founder, South Africa

Watch the walkthrough.

Coming soon

The problem

01

A decade of pricing locked in PDFs

Hundreds of supplier rate contracts in mixed PDF and Excel formats, stored across Google Drive folders and read by hand into spreadsheets every time a quote went out. The agency had bought a tour-operator CRM to automate itinerary building, but it needed a clean, structured catalog of suppliers, products, and seasonal rates to run on. Getting a decade of contracts into that shape by hand was a multi-week job per supplier batch, and every manual rate entry was a chance to misprice a safari.

02

The contracts fought clean extraction

Native-text PDFs, scanned PDFs, and Excel sheets, no two suppliers laid out the same way. Each supplier carried multiple properties, each property multiple room types, each room type seasonal rates, stay-length tiers, and special offers. Single contracts spanned several countries, each with its own currency and rate logic. Some rates priced per person, some per room, some per unit for a whole villa. Read the basis wrong and a per-room rate becomes a per-person rate, a 2x to 7x overcharge that looks completely normal on the page.

03

Verification, not extraction, was the hard part

The agency had already tried the obvious route, running a few contracts through general AI tools and building a draft sheet. It worked until they checked it. In the founder's words, it found some mistakes, but it was random, and very hard to check every single price of every single variation. Pulling numbers out of a PDF is the easy half. Knowing which of 6,731 numbers is wrong, before it reaches a client quote, is the half that matters.

The solution.

Architecture diagram, click to zoom

01

Stage 1-2: Triage and deterministic table extraction

A first pass sorts every file, single-supplier or multi-supplier, native text or scanned, large or small, and routes each contract to the right extraction path. Plain code then parses the rate tables first. Where a PDF has real tabular structure, the numbers come out without a model touching a single cell.

02

Stage 3: Three-pass consensus

Three independent AI passes read each contract for suppliers, products, seasons, and prices. A value locks only when all three agree. Disagreements escalate to a stronger model to arbitrate. AI does the pattern-reading, code does the deciding, and the two are kept apart on purpose so the model never has the last word on a number.

03

Stage 4-5: Reconciliation and source verification

Code links every season to its product and every product to its supplier, resolves naming variants, and splits multi-country brands into one clean record per country. Every rate then gets checked back against the page it came from. A number that will not trace to a source line gets flagged, not shipped.

04

Stage 6: Vision fallback and CRM merge

For scanned contracts the text layer cannot read, a vision model reads the page directly, image by image. A final step merges every supplier output into the CRM's import format: currency filled per country, tax fields set to the agency's rule, and stay-length tiers turned into pickable product variants so a designer chooses the right rate for the trip length without doing math.

The impact

What the rigor caught before go-live

  • 24 pricing-basis errors caught and corrected at the source-verification stage, the per-room-read-as-per-person kind that overcharges two to seven times, including a private-use lodge that would have billed at eight to twelve times its real price
  • Three suppliers built but never pushed, surfaced by a live reconciliation that compared what was built against what was actually live in the CRM rather than trusting the progress sheet
  • A 69-supplier coverage alarm read down to 3 real gaps by checking the source contracts, where almost all flagged suppliers were a folder-naming artifact and their rates were already live

The result

  • The agency went from a decade of pricing locked in PDF folders to a live, structured catalog their team builds itineraries from
  • More than 500 suppliers, 6,731 products, 8,854 seasonal rates, every price traceable to the contract it came from
  • Phase 1 complete, with no need for the founder to touch the pipeline that built it
511
Suppliers live in the tour-operator CRM
6,731
Products extracted and delivered into the catalog
8,854
Seasonal rate periods structured from the source contracts
24
Pricing-basis errors caught before go-live

Have a similar problem?

Tell me what is going on and I will tell you what I would do about it. No obligation.