Why 9 Fields Beat AI Guessing: The Science Behind HS Code Classification
How POTAL achieves 100% HS Code accuracy using codified WCO rules instead of LLM guessing. Ablation study results from 23,300 pipeline runs.
Most HS Code classification tools send a product name to an LLM and return whatever the model generates. This approach typically achieves 24-46% accuracy at the 6-digit level. POTAL takes a fundamentally different approach: codified WCO rules that achieve 100% accuracy with structured input. Here's the science behind it.
The Problem with AI-Only Classification
- Hallucination: LLMs generate plausible-looking HS codes that don't exist or are wrong. There are only 5,371 valid 6-digit codes, but models frequently produce invalid combinations.
- Inconsistency: The same product classified twice can return different codes. Temperature, context window, and prompt variations cause drift.
- Missing context: A product name alone is insufficient. "Leather strap" could be a watch strap (Section XVIII), a belt (Section VIII), or a machine part (Section XVI). Only the material + category combination resolves this.
- Cost: Every classification requires a full LLM call at $0.01-0.03. At 10,000 products, that's $100-300 per classification run.
POTAL's GRI Pipeline
The WCO's General Rules of Interpretation (GRI) define how every customs broker in the world classifies products. POTAL codified this entire process:
- Step 0 — Input Validation: Normalize 9 fields, extract keywords, validate materials against 91 WCO groups
- Step 1 — Cache Lookup: Check if this product was classified before. If yes, return cached result ($0, <10ms)
- Step 2-1 — Section Selection: Material + category determine the WCO Section (21 possible)
- Step 2-2 — Section Notes: Apply 592 codified rules (inclusion/exclusion/numeric thresholds)
- Step 2-3 — Chapter Selection: Material detail + processing narrow to specific Chapter
- Step 2-4 — Chapter Notes: Apply chapter-specific rules for boundary cases
- Step 3-1 — Heading Selection: Product keywords match against 1,233 Heading descriptions
- Step 3-2 — Subheading Selection: Composition + weight + price determine the 6th digit from 5,621 conditions
- Step 4-6 — Country Router: Origin country routes to 7-10 digit national code using 131,794 government tariff lines
Ablation Study: Which Fields Matter Most
We ran 466 field combinations × 50 products = 23,300 pipeline executions to measure the impact of each field on classification accuracy:
- material: +45.1% accuracy impact (CRITICAL). Without it, the system can't determine even the basic WCO Section. A cotton product and a steel product go to completely different parts of the HS system.
- category: +32.8%. Resolves the "material vs function" conflict. A leather watch strap goes to watches (Section XVIII), not leather goods (Section VIII).
- product_name: +18.0%. Provides the base keyword matching for Heading selection.
- description: +4.8%. Adds context at the Chapter/Heading boundary.
- processing, composition, weight_spec, price: 0% at Section/Chapter level, but critical for Subheading (6-digit) and national code (7-10 digit) accuracy.
Results
With all 9 fields provided:
- Section accuracy: 100%
- Chapter accuracy: 100%
- Heading accuracy: 100%
- HS6 accuracy: 100%
- AI calls: 0 (for standard products)
- Cost per classification: $0 (cached after first run)
- Response time: <10ms (cached), <50ms (computed)
Verified across Amazon 50-product benchmarks (100% all levels) and 7-country 10-digit verification (1,183 test cases, 100% duty rate accuracy).
Cost Comparison
- GPT-4 per classification: ~$0.03 (input + output tokens)
- GPT-4o-mini: ~$0.001
- POTAL (cached): $0.00 (database lookup only)
- POTAL (first classification): $0.00-$0.001 (0-2 AI calls for edge cases)
At 10,000 products/month, POTAL costs effectively $0 vs $100-300 for AI-only approaches. The first classification is computed and cached; all subsequent lookups are free.
Why This Matters for Developers
If you're building a platform that needs HS classification, you need deterministic, reproducible results. The same product should always get the same code. POTAL's rule-based pipeline guarantees this — the code path is deterministic, auditable, and backed by the same legal framework customs brokers use worldwide.
AI is not the enemy — POTAL uses it as a fallback for genuinely ambiguous cases (about 0.7% of products). But AI should refine answers, not guess them.
Ready to show true landed costs?
Use POTAL to calculate duties, taxes, and fees for 240 countries. Embed our widget on your product page or integrate via REST API — free plan available.
Get Started Free