I am Andrew Langevin. I run a CFIA-licensed (Canadian Food Inspection Agency) mushroom production facility in Brantford, Ontario under the SFCR (Safe Food for Canadians Regulations), and a private-label division that ships under multiple client brand names from one kitchen. Every document type this scanner handles is one my own facility has to read, log, and retain. I built the scanner because I needed it. Here is what it does, where it is honest about its limits, and what happens to your documents after you upload them.
01The job
What the scanner is actually for.
The HACCPlan AI document scanner reads a photo or PDF of a regulated food-safety document and returns the fields you would otherwise type into a log. It is not a productivity gadget. It is the data-entry layer for records you are legally required to keep — under 21 CFR Part 117 (the FDA Preventive Controls rule), under the Canadian SFCR section 47 and section 89, under FSMA 204 (the FDA Food Traceability Rule, effective January 20, 2026 for Food Traceability List items), and under whichever audit scheme your customer demands (SQF, BRCGS, FSSC 22000).
A short glossary so we are working from the same words:
- CoA is Certificate of Analysis — the per-lot lab report a supplier sends with an ingredient shipment showing micro counts, moisture, allergen tests, sometimes pesticide residue.
- SOP is Standard Operating Procedure — the written instructions for how a task gets done.
- BOL is Bill of Lading — the legal shipping document that follows a truckload from origin to destination.
- SDS is Safety Data Sheet — the 16-section chemical hazard document required under OSHA HazCom in the U.S. and WHMIS 2015 in Canada.
- OCR is Optical Character Recognition — older technology that reads pixels into letters but does not understand what the letters mean.
- LLM is Large Language Model — the newer technology behind the scanner; it reads the document the way a person does, in context.
- PCQI is Preventive Controls Qualified Individual — the FDA-recognised role responsible for the food safety plan.
The scanner uses an LLM with vision, not classic OCR, so it can read a phone photo of a creased CoA at an angle and still pull the lot code, the test date, the testing lab, the species-or-pathogen panel, and the pass-or-fail line. OCR alone would give you a wall of letters; the scanner gives you structured data ready to drop into a log.
247 CoAs
My own facility, 2024. Roughly 30 fields per document, two minutes per field. That is about 4 work-weeks a year of typing the same kinds of numbers off the same kinds of paper. The reason I built this.
11
Document types the scanner handles today. Receiving BOL, CoA, supplier audit cert, calibration cert, training cert, pest control report, SDS, water test, outbound shipping BOL, customer Purchase Order, regulatory ID. Each maps to a specific regulation.
7 days
Anthropic API log retention per their commercial terms (reduced from 30 days in September 2025). After that, deleted. Customer data is never used to train Anthropic models. Citations below.
02The 11 documents
The eleven document types the scanner reads — and the log it fills.
Every document the scanner handles maps to a specific regulation and a specific log a food business has to keep. The point is not breadth-for-its-own-sake. The point is that these are the documents already on your desk, already required, already being typed by hand somewhere in your operation.
- 01
Receiving BOL, packing slip, commercial invoice
Fills the receiving log. Authority: 21 CFR section 117.135(c)(3) and section 117.475 (US Preventive Controls); SFCR section 47 (Canada). From January 20, 2026 onward, FSMA 204 adds Critical Tracking Event records for any item on the FDA Food Traceability List (leafy greens, shell eggs, certain cheeses, nut butters, more). Fields the scanner pulls: supplier name, supplier address, BOL number, ship date, receive date, line items with lot code and quantity and unit, total weight, carrier, trailer seal number, temperature on arrival when recorded.
- 02
Certificate of Analysis (CoA)
Fills the supplier verification log. Authority: 21 CFR section 117.410(d)(2) names CoA review as an acceptable supplier verification activity; section 117.420 requires review BEFORE the ingredient is used; section 117.475 mandates documentation. Canadian side: SFCR section 47, records held two years per section 89(2). Fields the scanner pulls: supplier, product, lot code, manufacture date, expiry, the testing lab, every micro and chemical and physical test parameter with its result and its specification limit, the analyst signature line, the issue date.
- 03
Supplier audit certificate (SQF, BRCGS, FSSC 22000, NSF, AIB)
Fills the supplier approval log. Required by every GFSI-recognised scheme and by FDA section 117.410(d)(2) where third-party audit is the chosen verification activity. Fields: scheme, certificate number, site name and address, scope of certification, issue date, expiry date, score or grade, certification body name.
- 04
Calibration certificate
Fills the calibration log. Authority: 21 CFR section 117.110(c) and section 117.135(b); ISO/IEC 17025 traceability to NIST in the US or NRC in Canada. SQF Edition 9 section 11.2.10. BRCGS version 9 section 6.3. Fields: instrument name, serial number, calibration date, next-due date, the standard used, the as-found readings, the as-left readings, calibration technician, lab accreditation number.
- 05
Training certificate
Fills the training record. Authority: 21 CFR section 117.4(b) (general food hygiene training), section 117.150 (allergen training), section 117.180 (PCQI). Canada: SFCR section 75. Fields: trainee name, course name (ServSafe, FSPCA PCQI, AllerTrain, HACCP Level 2, others), provider, issue date, expiry, certificate number, instructor signature line when present.
- 06
Pest control report
Fills the pest control log. Authority: 21 CFR section 117.35(c); SFCR section 56. SQF section 11.2.13. BRCGS section 4.14. Pest control is a consistent FDA Form 483 top-ten observation, so the records here matter at audit time. Fields: PCO company, service technician, service date, devices serviced, activity at each device, materials applied (active ingredient, EPA registration number, concentration), recommended corrective action, technician signature.
- 07
Safety Data Sheet (SDS)
Fills the chemical inventory and the hazard communication binder. Authority: OSHA 29 CFR section 1910.1200(g) HazCom in the US; WHMIS 2015 in Canada. 16-section UN GHS format. Fields: product name, manufacturer, emergency phone, hazard pictograms, signal word, hazard statements, precautionary statements, first-aid measures, storage class, expiry of the SDS itself.
- 08
Water test report
Fills the water testing log. Authority: 21 CFR section 117.37(a) and section 117.95; EPA Safe Drinking Water Act limits in the US; Health Canada guidelines in Canada. Pass thresholds: E. coli less than 1 per 100 mL, total coliforms less than 1 per 100 mL. Fields: sample location, sample date, lab, test parameters, result, regulatory limit, pass-or-fail flag.
- 09
Outbound shipping BOL
Fills the shipping log and, from January 20, 2026 onward, the outbound Critical Tracking Event under FSMA 204. Fields: shipper, consignee, ship date, route, carrier, trailer number, seal number, line items shipped with lot codes and quantities, temperature setpoint when reefer.
- 10
Customer Purchase Order
Feeds order-to-invoice workflow plus the inbound side of your customer's FSMA 204 record. Fields: customer name, PO number, ship-to address, line items with SKU and quantity and price, requested ship date, special handling notes.
- 11
Regulatory ID
Onboarding for facility identity. Fields the scanner pulls from a CFIA SFC Licence: licence number, licence holder, address, licensed activities, expiry. From an FDA Food Facility Registration: registration number, owner-operator, address, food categories. From a business registration: legal entity name, registry number, jurisdiction. Required on file at every SQF or BRCGS audit.
From my own desk
Every one of these is a document I have on my desk right now, in real life. The receiving BOL pile from this morning's truck. The CoA folder from our spent-substrate supplier. The pest control ticket from Orkin's last visit. The water test from our well. The calibration cert from the thermometer lab in Hamilton. The training cert from the last FSPCA PCQI course I sat through. If a document is on the scanner list, it is one I personally need to read, log, and retain — and so do you, if you operate under the SFCR or 21 CFR Part 117.
03Per-field confidence
Per-field confidence scoring — why every field comes with a number.
This is the architectural decision that separates the HACCPlan scanner from most others I have looked at. Every field the scanner extracts comes back not as a flat value but as a pair: the value, plus a confidence score between 0 and 1. The model self-rates how sure it is about each field, individually.
What that looks like on screen: a CoA scan returns 30 fields. Twenty-seven of them get a green badge (confidence at or above 0.85). Two get an amber badge (between 0.6 and 0.85). One gets a red badge (below 0.6). The red one is usually the handwritten testing-lab signature, or a creased corner where the lot code is half-visible. You know exactly where to look first when you verify.
Three reasons this matters:
- 01
It surfaces ambiguity instead of hiding it
Most AI tools return extracted data as if it were certain. They give you a value and leave you to guess whether to trust it. A handwritten lot code on a creased carbonless duplicate from a pest control technician is genuinely shaky. The scanner says so. You verify the shaky fields and skim the clean ones, instead of treating every field with equal suspicion or, worse, equal trust.
- 02
It lets you write policy thresholds into the system, not the SOP binder
You can set a rule: any CoA field below 0.7 requires a second-person review before the lot ships to production. Any pest report field below 0.6 routes to the QA manager for sign-off. That rule lives in the data, not on a page of your manual that the new hire might not read.
- 03
It is honest about a real limitation of language models
Independent benchmarking (Viventine and the LLMStructBench paper on arxiv) shows that the raw confidence numbers from a language model are not perfectly calibrated as probabilities. GPT-4o-mini once reported 16% on a filing it extracted perfectly. Grok reported 96% on the same one. So I do not claim the number is a calibrated probability. What I claim is that within a single model on a single schema, the rank order is reliable. The 0.4 field really is shakier than the 0.97 field on the same document. That ordering is what makes the badges useful.
The point in operator terms: the scanner does not pretend to be certain when it is not. The badge says "look here first." You decide what to do.
04Verification workflow
The AI-filled badge, the verification step, and the audit trail.
Every record written from a scan carries an ai_extracted: true flag in the database plus a snapshot of the original confidences. The UI shows an "AI-filled" badge on the record until a human ticks the verify box. Once verified, the badge changes to "verified by [user] at [timestamp]" — and the original AI extraction state is retained alongside, not overwritten.
This is the workflow that matters at audit time:
- 01
The scanner drafts. You sign.
The role of the scanner is the typing. The role of the operator (you, the PCQI, the QA tech, the manager on duty) is the review and the signature. The scanner does not make the release decision. You make the release decision. The scanner just stops you from spending four work-weeks a year copying numbers off paper before you can make it.
- 02
21 CFR Part 11 audit trail compliance
The FDA's Part 11 rule on electronic records and electronic signatures (section 11.10(e)) requires secure, computer-generated, time-stamped audit trails of operator actions, with no overwriting of prior data, retained as long as the underlying record. "AI extracted at time A with confidence X. Human verified at time B by user Y. Both states preserved." That maps directly onto what Part 11 asks for.
- 03
Mirrors FDA's own AI guardrail (Project Elsa)
The FDA's internal inspection-support AI, Project Elsa, operates under a stated rule: no enforcement action is ever based solely on AI analysis without human review. The HACCPlan scanner workflow uses the same guardrail. The regulator built it this way for a reason; matching that pattern means your records read the way the regulator's own do.
What the scanner does not do
The scanner does not replace a Preventive Controls Qualified Individual. Under 21 CFR section 117.180, the PCQI is the role responsible for the food safety plan. The scanner is a productivity tool. It drafts. You — or your PCQI — review and sign. If a CoA result is out of specification, the scanner will pull the number and badge the field, but it will not decide whether to release the lot. That decision is yours.
The scanner does not replace your supplier approval program. It fills the verification log faster. It does not vet a new supplier, audit their facility, or judge whether their lab is competent. The judgment work stays with you.
The scanner does not generate a HACCP plan. A separate tool inside HACCPlan does that (free; sign up below). The scanner is for the documents that flow in and out of your operation after the plan is built.
05How it works
What is under the hood — plainspoken.
The vision model behind the scanner is Anthropic's Claude Sonnet 4.5, called server-side via the Vercel AI SDK. The model is instructed with a system prompt tailored to each document type (a CoA prompt is different from a pest report prompt is different from an SDS prompt), then handed the image, then asked to return structured JSON that conforms to a strict schema built in Zod (a TypeScript validation library). The model cannot return free text. It can only return fields that match the schema, each one wrapped in the { value, confidence } pair.
Two more details worth knowing:
- 01
Temperature is set to 0.1
Temperature is the model's variability knob. At 1.0 the model is more creative and less repeatable; at 0.0 it is as deterministic as the model gets. Running at 0.1 keeps extractions essentially identical across repeat scans of the same document. You can scan a CoA twice and get the same numbers.
- 02
Image cap is 6 MB
Photos larger than that resize on the client before upload. Phone cameras default to higher resolution than the model needs; the resize shaves the file without losing text legibility.
There is one more piece I want to be transparent about, because it is the unique part. During development, I run a multi-LLM review pattern on schema changes: Claude Sonnet, GPT-5, and Gemini 2.5 Pro all audit the same scanner output on the same documents, and I resolve disagreements before shipping. That pattern caught five silent data-loss bugs across one development phase alone — including one where the SDS schema was rejecting pictograms when fewer than three were present (the schema enforced a tuple of length three; the model returned a list; the list failed validation; the pictograms vanished silently). A second model spotted the inconsistency. The bug was fixed before any customer document was processed.
I mention it because the part most operators worry about — "is the AI engine itself any good?" — is one I tested harder than I had to. The scanner has been audited by independent AI before any of its output ever hit a customer log.
06Where your documents go
Data privacy — the part procurement officers ask about first.
Every QA manager I have spoken to has the same first question about an AI document tool: where do my supplier documents go? Here is the honest answer with citations a procurement officer can verify.
What
happens
The image is sent server-side from HACCPlan to the Anthropic API. The model extracts the fields. The structured response comes back to HACCPlan. The extracted fields are saved in your HACCPlan database, where you can edit them, verify them, export them as CSV or PDF, or delete them. The image itself is held in your HACCPlan storage attached to the record so you have the original alongside the extraction.
What
never
Anthropic does not use API customer data to train its models. The full text of that commitment is in their commercial terms (linked in the footnote below). API log retention is 7 days as of the September 2025 update (down from 30); after that, deleted. HACCPlan does not share documents with any other language model provider. HACCPlan does not use customer documents to refine its own prompts — that work is done on synthetic documents I generate myself.
The link in the footnotes goes to Anthropic's own privacy center page on training data, plus the September 2025 terms update. Read both. Send them to your procurement officer if you have one. The position is verifiable in primary sources, not just in my prose.
07Accuracy
What we claim about accuracy, and what we do not.
I am not going to tell you 99% accuracy. Every vendor that puts a single accuracy number on a marketing page is lying to themselves about how variable real documents are. The honest version, based on my own testing against my own facility's document mix:
Clean
≥95%
Typed clean CoAs from major commercial labs (Eurofins, ALS, NSF, SGS). Field-level accuracy runs in the 95 to 98 percent range. The model essentially never invents a lot code. The most common miss is a non-standard parameter abbreviation in the microbiological summary — a lab using "TPC" where another uses "APC" for the same aerobic plate count, for instance.
Photo
~85%
Phone photos of paper BOLs taken at an angle in a dim receiving bay. Field-level accuracy runs 80 to 92 percent. Confidence scores drop appropriately on the fields that suffer (handwritten trailer seal numbers, faded carbon-copy lot codes). You verify the amber and red badges. The green ones are reliable.
Handwritten
65-80%
Handwritten pest control reports (technician ballpoint on carbonless duplicate). Field-level accuracy drops to 65 to 80 percent. The structured fields — technician name, service date, device counts — are reliable. The handwritten observation notes degrade fastest. The confidence layer flags exactly the fields a human would re-read. This is the honest worst case.
The test
that matters
The test you should actually run is on your own documents. Pick five real documents from your own receiving pile or supplier folder. Scan them. Verify the extracted fields against the original. Decide whether the scanner is good enough on your real mix. Marketing numbers from any vendor — including me — are weaker evidence than five minutes of your own documents.
The free-tier scan allowance below is exactly so you can run that test before paying for anything.
08Free templates
Where to start — the free logs the scanner fills.
If you want to see the format the scanner outputs into before you ever scan anything, the same logs are available as fillable PDFs. Pull one or two, run them by hand for a week, and you will have a clear picture of which document types are the worst tax on your time. Those are the ones to scan first.
Free logs the scanner fills
Free, ungated. Fillable on a tablet or computer in any PDF viewer. Print blank and fill on a clipboard. No account needed.
09Pricing + getting started
What it costs, and the first five documents you should scan.
The AI scanner is included in the HACCPlan Pro tier at $149 a month, with no per-scan cap on Pro. New accounts get 10 free scans on the free tier — the right number to test whether the scanner reads your real document mix without paying for anything. The co-packer and multi-tenant tier adds API access for automated pipelines and per-tenant scanning isolation.
The five documents I recommend scanning first, in order:
- 01
One CoA from your highest-volume supplier
This is the document type where the time savings are biggest and the extraction tends to be cleanest. If the CoA scan works on your real supplier's CoA format, the rest is upside.
- 02
One receiving BOL from a recent shipment
Test what the scanner does with multi-line items, lot codes, and the temperature-on-arrival field. This is the FSMA 204 hook for the Food Traceability List items.
- 03
One calibration certificate from a recent instrument check
Calibration certs are usually clean PDFs and a good calibration test of the scanner's structured-extraction baseline. The next-due date is the field your auditor will look for.
- 04
One pest control report from your PCO's most recent visit
This is the honest worst case — handwritten on carbonless paper. Test it precisely because it is the hardest. The confidence badges will be useful here.
- 05
One supplier audit certificate (SQF, BRCGS, FSSC, NSF)
Tests the scanner's ability to pull the scheme, expiry, and score line. The fields the auditor will check first when they ask for your supplier approval file.
By the time you have run those five, you will know whether the scanner saves you the typing or not. That is the only test that matters.
Try the scanner with your own documents
Start free — 10 scans across any of the 11 document types
Free tier: 10 scans across receiving BOL, CoA, calibration cert, pest report, SDS, water test, training cert, supplier audit cert, shipping BOL, customer PO, regulatory ID. Pro tier ($149/mo) lifts the cap. Every scan is human-verifiable, every field is confidence-scored, every record is Part-11 audit-trail compliant.
Email required to save your scans. No credit card. No upgrade prompts during the free tier.
Footnotes
1.21 CFR §117.420 — Using approved suppliers (the review-before-use rule) — ecfr.gov
2.21 CFR §117.475 — Records of supplier verification activities — ecfr.gov
3.21 CFR Part 11 — Electronic records and audit trail requirements — ecfr.gov
4.CFIA — Regulatory requirements for a Preventive Control Plan — inspection.canada.ca
5.CFIA — Incoming ingredients, materials and non-food chemicals — inspection.canada.ca
6.Anthropic — Is my data used for model training? (API customer data is not used) — privacy.claude.com
7.Anthropic — September 2025 updates to consumer and commercial terms (7-day API log retention) — anthropic.com
8.Civil Eats — FDA expands use of advanced AI for safety reviews and inspections (Project Elsa, December 2025) — civileats.com
9.Subhajit Bhar — Confidence scoring in document extraction (third-party primer) — subhajitbhar.com
Andrew Langevin·CFIA-licensed facility, Brantford ON· Published 2026-06-04· 10 min read· Wikidata Q139112497
