Real-time document validation and fraud scoring

AI Fraud Detection for Freight Documents

A production-ready fraud detection engine that validates transportation document packages, extracts structured fields, and returns fraud scores in JSON. The service combines OCR (for coordinates), LLMs (for classification, extraction, and manipulation checks), and deterministic validations. It was deployed as an independent REST API with configurable rules per client and an admin layer for prompts and policies.

  • Industry

    gear AI
  • Headquarters

    globe USA
Image

Services we provided

  • AI/ML & Document Intelligence (OCR + LLM)
  • MVP Development
  • Microservice & API Architecture
  • Backend Engineering (Node.js/Express)
  • Data Engineering & Integrations
  • QA & Test Automation
  • DevOps & CI/CD
  • Security & Compliance
  • Pilot Rollout & Support

About the Client

A platform serving factoring companies, carriers, shippers, and brokers required automated, evidence-based document verification to reduce manual effort, expedite onboarding, and minimize payout disputes, without re-architecting their existing system.

Challenge

Manual review couldn’t keep pace with volume and variability in PDFs and scans. The team faced forged or altered values (e.g., rates), inconsistent data across documents, and missing artifacts (such as signatures/OS&D notes). Strict latency targets (≤ 30 seconds per document) and changing scope demanded a solution that handled Rate Confirmations and PODs first, produced structured JSON with confidence scores, and seamlessly integrated into current workflows.

Image

Solution

  • Document preprocessing (PDF Worker Service)

    We built a preprocessing layer that rotates/deskews pages, removes backgrounds, and enhances low-quality scans. Using a classic CV combined with Amazon Textract, the service enhances OCR fidelity, allowing downstream LLMs to “see” cleaner text.

  • LLM-powered classification & extraction

    A multi-model flow classifies each file (Rate Confirmation, POD), then extracts payer, stops, line items, OS&D, and signatures. Each step returns structured JSON with rationales and confidence scores, making outcomes auditable.

  • Fraud analysis for values

    Rate and totals fields are cropped using Textract coordinates, enhanced, and inspected for manipulations (replaced numerals, font anomalies, inconsistent arithmetic). The engine assigns a per-field fraud likelihood and highlights suspect regions.

  • Deterministic validation layer

    Business rules check addresses, dates, prices, stop sequences, and cross-document consistency. JSON schema validation runs after every LLM step to ensure predictable outputs.

  • Observability & testing

    Full LLM tracing (requests, tokens, latencies) and load tests on real samples harden quality and help operations spot regressions early.

Features delivered

  • AI-based parsing & field extraction

    Semi-structured PDFs and images are converted into clean, JSON-enriched, concise explanations, ready for downstream automation.

  • Signature & OS&D validation (POD)

    Detects presence and readability of signatures, scans for handwritten OS&D notes, and flags missing or suspicious confirmations at delivery.

  • Rate manipulation detection

    Surfaces altered numerals, mismatched fonts/sizes, and inconsistent totals or line items; links every alert to exact on-page regions for quick review.

  • Timestamp & stop comparison

    Verifies pickup/delivery windows and stop order; checks alignment between Rate Confirmation and POD to catch impossible sequences.

  • JSON fraud scoring output

    Produces clear per-field and per-document validity/fraud scores with confidence levels, allowing systems or humans to triage instantly.

  • Scalable microservice

    Stateless by design and ready to extend with new doc types or telemetry feeds in later phases.

Key results and business value

  • ~20s average processing time (≤30s SLA)

    Meets throughput targets for multi-page PDFs, enabling near-real-time decisions in factoring workflows.

  • 90–95% extraction accuracy on quality docs

    Reliable automation eliminates manual re-entry and reduces the need for back-and-forth communication with carriers and brokers.

  • 80–85% accuracy on POD signature checks

    Strong performance on noisy paperwork lowers dispute rates tied to delivery confirmation.

  • 60–65% PDF quality improvement

    Preprocessing boosts OCR/LLM accuracy and raises reviewer confidence in automated results.

  • Lower fraud exposure

    Early detection of manipulated values and cross-doc mismatches curbs payout disputes and chargebacks.

  • Faster onboarding & scale

    Automated verification accelerates client intake and absorbs volume growth without adding headcount.

Share your needs, we’ll deliver the solution

Full Name*
Email*
Describe your business need*

Budget in USD

By submitting this form I agree with the Privacy Policy

What happens after you get in touch?

  • Intro call

    During a 30-minute meeting, our domain expert dives into your business and describes the steps for future collaboration.

  • Free discovery workshop

    Together with you, we clarify the requirements and define the user flow, feature list, and project risks. After that, we set up an engagement process to make your journey smooth.

  • Project planning

    Based on the info gathered and your business objectives, we provide the implementation plan, timelines and estimations for your project.