code<spar>

Natural Language Understanding

How CodeSpar interprets natural language messages and routes them to the correct command using a three-tier NLU pipeline — regex matching, Haiku classification, and Sonnet smart responses.

Natural Language Understanding

CodeSpar agents understand natural language in addition to explicit commands. You do not need to memorize command syntax — just describe what you want in plain language, and the agent will figure out the right action.

This page explains the full NLU pipeline in detail: how each tier works, when each model is invoked, how open questions are detected, and how costs are optimized.

The Three-Tier Pipeline

CodeSpar uses a three-tier strategy to interpret messages, optimizing for speed and cost:

Message → Regex Match → Claude Haiku NLU → Claude Sonnet Smart Response
          (fast, free)   (fast, cheap)       (slower, comprehensive)

Each tier acts as a fallback for the previous one. The pipeline short-circuits as soon as a tier produces a confident result, meaning most messages never reach Tier 3.

How It Works in Practice

When a user sends @codespar deploy staging, the pipeline processes it as follows:

  1. Tier 1 (Regex): The message matches the pattern /^deploy\s+(\S+)/i, extracting staging as the target. Confidence is 1.0. The command executes immediately — no LLM is called.

When a user sends @codespar quantos PRs temos abertos?, the pipeline processes it differently:

  1. Tier 1 (Regex): No English regex pattern matches the Portuguese text. Falls through.
  2. Tier 2 (Haiku NLU): The message is sent to Claude Haiku. It returns { intent: "prs", confidence: 0.95, arguments: { filter: "open" } }. Confidence is high but below 1.0, and the message contains ?.
  3. Tier 3 (Sonnet Smart): Because the message contains a question mark and NLU confidence is below 1.0, it is also eligible for Sonnet. However, since the NLU confidence (0.95) exceeds the classification threshold, the classified intent (prs) executes directly. Sonnet is not called.

When a user sends @codespar what's the best approach for handling rate limiting?:

  1. Tier 1 (Regex): No match.
  2. Tier 2 (Haiku NLU): Returns { intent: "smart", confidence: 0.4, arguments: {} }. Low confidence — this is a conversational question, not a command.
  3. Tier 3 (Sonnet Smart): Message contains ?, is longer than 25 characters, and NLU confidence is well below 1.0. All three open-question criteria are met. Sonnet receives the full project context and generates a comprehensive response.

Tier 1: Regex Match (instant, free)

The agent first attempts to match the message against a set of regex patterns for each command. This handles explicit commands and common variations with zero latency and zero API cost.

Examples That Match via Regex

@codespar status          → status intent (exact match)
@codespar deploy staging  → deploy intent (exact match)
@codespar review PR #42   → review intent (pattern match)
@codespar logs 20         → logs intent (pattern match)
@codespar autonomy L3     → autonomy intent (pattern match)
@codespar help            → help intent (exact match)
@codespar kill            → kill intent (exact match)
@codespar link org/repo   → link intent (pattern match)

If a regex matches with high confidence, the command executes immediately without calling any LLM. This tier handles approximately 40% of all traffic in teams that primarily use explicit command syntax.

Limitations

Regex patterns are defined for English command syntax only. Non-English messages (Portuguese, Spanish, etc.) almost always fall through to Tier 2. This is by design — maintaining regex patterns for every language would be brittle and hard to maintain, while Claude Haiku handles multilingual classification natively.


Tier 2: Claude Haiku NLU (fast, cheap)

When regex does not match, the message is sent to Claude Haiku for intent classification. The NLU model receives a structured prompt with the list of available intents and returns a JSON response:

{
  "intent": "prs",
  "confidence": 0.95,
  "arguments": { "filter": "open" }
}

Response Fields

  • intent — the matched command (e.g., status, deploy, review, prs, fix)
  • confidence — a score from 0.0 to 1.0 indicating how certain the model is
  • arguments — extracted parameters (e.g., { target: "staging" }, { filter: "open" })

Confidence Threshold Behavior

The confidence score determines what happens next:

ConfidenceBehavior
1.0Execute the classified intent immediately. No further processing.
≥ 0.8 (high)Execute the classified intent. Sonnet is not called.
0.5 – 0.79 (medium)Execute the classified intent, but if the message also triggers open-question detection (see below), Sonnet may be called instead.
< 0.5 (low)Do not execute the classified intent. Route to Sonnet for a conversational response.

The threshold between "execute classified intent" and "route to Sonnet" is not a single hard cutoff — it interacts with open-question detection criteria.

English Examples

Natural Language InputDetected IntentConfidenceArguments
"can you review the latest PR?"review0.92{ target: "latest" }
"deploy to staging please"deploy0.98{ target: "staging" }
"what's the build status?"status0.96{ scope: "build" }
"fix the failing auth test"fix0.94{ issue: "failing auth test" }
"show me recent logs"logs0.91{ count: 10 }
"who am I?"whoami0.97{}
"link this to our API repo"link0.88{}
"stop everything now"kill0.93{}
"set autonomy to level 2"autonomy0.96{ level: 2 }

Portuguese Examples

Natural Language InputDetected IntentConfidenceArguments
"quantos PRs temos abertos?"prs0.95{ filter: "open" }
"fazer deploy no staging"deploy0.93{ target: "staging" }
"revisar o PR 42"review0.91{ target: "42" }
"qual o status do build?"status0.96{ scope: "build" }
"aprovar o deploy"approve0.90{}
"lista os pull requests fechados"prs0.92{ filter: "closed" }
"corrigir o bug de autenticação"fix0.89{ issue: "bug de autenticação" }
"mostra os logs recentes"logs0.88{ count: 10 }
"quem sou eu?"whoami0.94{}
"vincular ao repositório acme/api"link0.86{ repo: "acme/api" }

Spanish Examples

Natural Language InputDetected IntentConfidenceArguments
"cuántos PRs hay abiertos?"prs0.93{ filter: "open" }
"desplegar en staging"deploy0.91{ target: "staging" }
"revisar el PR más reciente"review0.90{ target: "latest" }
"cuál es el estado del build?"status0.94{ scope: "build" }
"arreglar el error de login"fix0.88{ issue: "error de login" }
"mostrar los logs"logs0.87{}
"aprobar el despliegue"approve0.89{}
"establecer autonomía en nivel 3"autonomy0.85{ level: 3 }

Detailed Example: Portuguese Command Flow

When a user sends:

@codespar quantos PRs temos abertos?

The full processing flow is:

  1. Regex Tier: No English pattern matches quantos PRs temos abertos?. Falls through.
  2. Haiku NLU: Receives the message and classifies it:
    • Intent: prs (maps to the "list pull requests" command)
    • Confidence: 0.95
    • Arguments: { filter: "open" } (extracted from "abertos" = "open")
  3. Decision: Confidence is 0.95 (high). Even though the message contains ? and is longer than 25 characters (two open-question triggers), the high confidence means the classified intent is trusted. The prs command executes.
  4. Result: The agent lists open PRs from the linked repository and responds in the same language:
📋 Pull Requests Abertos (3)

#87  fix: auth timeout          @alice    2h ago
#85  feat: rate limiting        @bob      1d ago
#82  chore: update deps         @charlie  3d ago

Tier 3: Claude Sonnet Smart Response (comprehensive)

When a message is not confidently classified as a known command, it is routed to Claude Sonnet for a comprehensive, contextual response.

Open Question Detection

A message is classified as an "open question" when it meets one or more of these criteria:

  1. Contains a question mark (?) — indicates the user is asking, not commanding
  2. Length exceeds 25 characters — longer messages tend to be conversational or complex
  3. NLU confidence below 1.0 — the model is not fully certain about the intent

These criteria are evaluated together. In practice:

  • A short explicit command like status matches regex (Tier 1) and never reaches this check.
  • A message like "deploy to staging" matches regex or gets high NLU confidence, so it executes as a command even though it is longer than 25 characters.
  • A message like "What's the best approach for implementing rate limiting in our API?" fails regex, gets low NLU confidence (it is not a known command), contains ?, and is well over 25 characters. All criteria point to Sonnet.

The key insight: the open-question criteria are tie-breakers when NLU confidence is ambiguous, not absolute rules. A message with ? that also has 0.98 NLU confidence for deploy will still execute the deploy command.

What Sonnet Receives

When routed to Tier 3, Sonnet receives full project context:

  • The user's message
  • Linked repository name, default branch, and recent activity
  • Current agent state (autonomy level, active tasks)
  • Recent conversation history (last 5 messages)
  • Available commands and their descriptions

This enables Sonnet to give project-aware answers, not generic responses.

Examples That Route to Sonnet

"What's the best approach for implementing rate limiting in our API?"
"Can you explain why the last 3 builds failed?"
"How should we structure the database migration for the new user model?"
"Compare the performance of our staging vs production environments"
"Qual a melhor estratégia para migrar o banco de dados?"
"¿Cómo debería estructurar los tests de integración?"

Multilingual Support

CodeSpar's NLU pipeline supports multiple languages out of the box. Claude Haiku and Sonnet natively understand dozens of languages, so you can interact with your agent in whatever language is natural for your team.

How Each Tier Handles Languages

TierMultilingual Support
Regex (Tier 1)English only. Non-English messages fall through to Tier 2.
Haiku NLU (Tier 2)Full multilingual support. Claude Haiku understands Portuguese, Spanish, French, German, Japanese, and many more.
Sonnet Smart (Tier 3)Full multilingual support. Responds in the same language as the user's message.

Tested Languages

LanguageExample CommandExample Open Question
English"deploy to staging""What's going on with the build?"
Portuguese"fazer deploy no staging""Por que o último build falhou?"
Spanish"desplegar en staging""¿Por qué falló el último build?"
French"déployer sur staging""Pourquoi le dernier build a échoué ?"
German"deploy auf staging""Warum ist der letzte Build fehlgeschlagen?"

The agent responds in the same language the user writes in. If you ask in Portuguese, you get a Portuguese response.


Pipeline Flow Diagram

┌─────────────────────────────────────────────────────────┐
│                    Incoming Message                      │
└─────────────┬───────────────────────────────────────────┘


┌─────────────────────────┐
│    Tier 1: Regex Match  │──── Match found ───▶ Execute Command
│    (instant, $0)        │
└─────────────┬───────────┘
              │ No match

┌─────────────────────────┐
│   Tier 2: Haiku NLU     │──── Confidence ≥ threshold ───▶ Execute Command
│   (fast, ~$0.001)       │
└─────────────┬───────────┘
              │ Low confidence OR open question

┌─────────────────────────┐
│  Tier 3: Sonnet Smart   │──── Generate Response ───▶ Reply
│  (comprehensive, ~$0.01)│
└─────────────────────────┘

Decision Logic Pseudocode

function routeMessage(message):
  // Tier 1: Regex
  regexResult = matchRegex(message)
  if regexResult.matched:
    return executeCommand(regexResult.intent, regexResult.args)

  // Tier 2: Haiku NLU
  nluResult = classifyWithHaiku(message)

  // Check open-question criteria
  isOpenQuestion = (
    message.includes("?") OR
    message.length > 25 OR
    nluResult.confidence < 1.0
  )

  // High confidence → execute even if open-question criteria met
  if nluResult.confidence >= 0.8 AND nluResult.intent !== "smart":
    return executeCommand(nluResult.intent, nluResult.args)

  // Medium confidence, not clearly an open question → execute
  if nluResult.confidence >= 0.5 AND NOT isOpenQuestion:
    return executeCommand(nluResult.intent, nluResult.args)

  // Low confidence or open question → Sonnet
  return generateSmartResponse(message, projectContext)

Cost Optimization

The three-tier pipeline is designed to minimize API costs while maintaining high accuracy.

Cost per Tier

TierModelCost per RequestLatencyWhen Invoked
RegexNone (string matching)$0under 1msEvery message (first check)
Haiku NLUclaude-haiku-4-5-20251001~$0.001~200msWhen regex does not match (~60% of messages)
Sonnet Smartclaude-sonnet-4-20250514~$0.01~1-3sOpen questions and low-confidence messages (~25% of messages)

Typical Traffic Distribution

Tier% of TrafficMonthly Cost (1,000 msgs/day)
Regex~40%$0
Haiku NLU~35%~$10.50
Sonnet Smart~25%~$75.00
Total100%~$85.50

For teams that primarily use explicit commands (e.g., deploy staging, status, logs 20), the regex tier handles most traffic and costs approach zero.

Cost Optimization Tips

  1. Use explicit commands when possible. @codespar deploy staging is free (regex match). @codespar can you please deploy to the staging environment? costs ~$0.001 (Haiku NLU).
  2. Set NLU_MODEL to Haiku. This is the default and is the most cost-effective for intent classification. Do not set it to Sonnet unless you need higher accuracy.
  3. Sonnet costs scale with context. Open questions send more context to Sonnet (repo info, recent activity). Keeping your project context lean reduces per-request cost.

Configuration

Environment Variables

VariableDefaultDescription
NLU_MODELclaude-haiku-4-5-20251001Model used for intent classification (Tier 2)
SMART_MODELclaude-sonnet-4-20250514Model used for open-ended responses (Tier 3)

You can override the NLU model to use a different provider or version:

# Use a specific Haiku version (default)
NLU_MODEL=claude-haiku-4-5-20251001
 
# Use Sonnet for NLU (more accurate but slower and more expensive)
NLU_MODEL=claude-sonnet-4-20250514

Debugging NLU

If the agent misinterprets a message, you can check what happened:

@codespar logs 5

The activity log includes NLU routing decisions:

[14:32] 📩 Message: "deploy to prod please"
        NLU: regex miss → haiku → intent:deploy (0.97)
        Action: deploy requested (target: production)

[14:33] 📩 Message: "quantos PRs temos abertos?"
        NLU: regex miss → haiku → intent:prs (0.95)
        Action: listed 3 open PRs

[14:35] 📩 Message: "what's the best way to handle auth?"
        NLU: regex miss → haiku → intent:smart (0.40)
        → routed to sonnet (open question detected)
        Action: smart response generated (auth architecture advice)

Next Steps