Most HS Code classification tools send a product name to an LLM and return whatever the model generates. This approach typically achieves 24-46% accuracy at the 6-digit level. POTAL takes a fundamentally different approach: codified WCO rules that achieve 100% accuracy with structured input. Here's the science behind it.

The Problem with AI-Only Classification

Hallucination: LLMs generate plausible-looking HS codes that don't exist or are wrong. There are only 5,371 valid 6-digit codes, but models frequently produce invalid combinations.
Inconsistency: The same product classified twice can return different codes. Temperature, context window, and prompt variations cause drift.
Missing context: A product name alone is insufficient. "Leather strap" could be a watch strap (Section XVIII), a belt (Section VIII), or a machine part (Section XVI). Only the material + category combination resolves this.
Cost: Every classification requires a full LLM call at $0.01-0.03. At 10,000 products, that's $100-300 per classification run.

POTAL's GRI Pipeline

The WCO's General Rules of Interpretation (GRI) define how every customs broker in the world classifies products. POTAL codified this entire process:

Step 0 — Input Validation: Normalize 9 fields, extract keywords, validate materials against 91 WCO groups
Step 1 — Cache Lookup: Check if this product was classified before. If yes, return cached result ($0, <10ms)
Step 2-1 — Section Selection: Material + category determine the WCO Section (21 possible)
Step 2-2 — Section Notes: Apply 592 codified rules (inclusion/exclusion/numeric thresholds)
Step 2-3 — Chapter Selection: Material detail + processing narrow to specific Chapter
Step 2-4 — Chapter Notes: Apply chapter-specific rules for boundary cases
Step 3-1 — Heading Selection: Product keywords match against 1,233 Heading descriptions
Step 3-2 — Subheading Selection: Composition + weight + price determine the 6th digit from 5,621 conditions
Step 4-6 — Country Router: Origin country routes to 7-10 digit national code using 131,794 government tariff lines

Ablation Study: Which Fields Matter Most

We ran 466 field combinations × 50 products = 23,300 pipeline executions to measure the impact of each field on classification accuracy:

material: +45.1% accuracy impact (CRITICAL). Without it, the system can't determine even the basic WCO Section. A cotton product and a steel product go to completely different parts of the HS system.
category: +32.8%. Resolves the "material vs function" conflict. A leather watch strap goes to watches (Section XVIII), not leather goods (Section VIII).
product_name: +18.0%. Provides the base keyword matching for Heading selection.
description: +4.8%. Adds context at the Chapter/Heading boundary.
processing, composition, weight_spec, price: 0% at Section/Chapter level, but critical for Subheading (6-digit) and national code (7-10 digit) accuracy.

Results

With all 9 fields provided:

Section accuracy: 100%
Chapter accuracy: 100%
Heading accuracy: 100%
HS6 accuracy: 100%
AI calls: 0 (for standard products)
Cost per classification: $0 (cached after first run)
Response time: <10ms (cached), <50ms (computed)

Verified across Amazon 50-product benchmarks (100% all levels) and 7-country 10-digit verification (1,183 test cases, 100% duty rate accuracy).

Cost Comparison

GPT-4 per classification: ~$0.03 (input + output tokens)
GPT-4o-mini: ~$0.001
POTAL (cached): $0.00 (database lookup only)
POTAL (first classification): $0.00-$0.001 (0-2 AI calls for edge cases)

At 10,000 products/month, POTAL costs effectively $0 vs $100-300 for AI-only approaches. The first classification is computed and cached; all subsequent lookups are free.

Why This Matters for Developers

If you're building a platform that needs HS classification, you need deterministic, reproducible results. The same product should always get the same code. POTAL's rule-based pipeline guarantees this — the code path is deterministic, auditable, and backed by the same legal framework customs brokers use worldwide.

AI is not the enemy — POTAL uses it as a fallback for genuinely ambiguous cases (about 0.7% of products). But AI should refine answers, not guess them.

Why 9 Fields Beat AI Guessing: The Science Behind HS Code Classification

The Problem with AI-Only Classification

POTAL's GRI Pipeline

Ablation Study: Which Fields Matter Most

Results

Cost Comparison

Why This Matters for Developers

Ready to show true landed costs?

More Articles

POTAL vs Avalara vs Zonos: Customs Duty API Comparison (2026)