What is Probe?

Probe is an AI-native code intelligence engine built in Rust. It reads code as structure — not just text — using tree-sitter AST parsing and ripgrep-speed file scanning to return grounded, accurate context for humans and AI across large, real-world codebases.

It powers every workflow in the ProbeLabs platform: code search, AI assistants, code review, GitHub automation, and custom tooling. Probe runs locally, requires no cloud indexing, and works across 15 languages out of the box.

Vision

Code search tools fall into three camps: text-based (grep, ripgrep), embedding-based (vector search requiring indexing and an embedding model), and AST-aware (Probe). Most AI coding tools use the first two — grep some files and hope for the best, or build a vector index that returns approximate text chunks.

Probe takes the third path. When you search for "authentication middleware", Probe doesn't return random lines or 512-character text chunks that split a function in half — it returns the complete function, its surrounding context, and ranks results by actual relevance using BM25/TF-IDF scoring.

Why not embeddings? Embedding-based tools (like grepai, Octocode) solve vocabulary mismatch — finding "authentication" when the code says verify_credentials. But when an AI agent is the consumer, the LLM already handles this. It translates intent into precise boolean queries: "verify_credentials OR authenticate OR login OR auth_handler". Probe gives it a powerful query language purpose-built for this, and returns results in milliseconds with zero setup.

This matters because AI agents need complete, accurate, deterministic context. Partial snippets lead to hallucinations. Stale indexes lead to wrong answers. Probe eliminates both problems at the engine level — no indexing to maintain, no embedding model to run, and every result is a complete AST block.

Architecture

Probe is a three-layer system designed for performance and flexibility:

Rust Core (Foundation)

The foundation is a high-performance Rust engine handling the search pipeline:

  • Ripgrep file scanning — ~1GB/s throughput across codebases
  • Tree-sitter AST parsing — understands code structure in 15 languages
  • BM25/TF-IDF ranking — SIMD-optimized scoring (4-8x speedup)
  • Smart extraction — returns complete functions, classes, and modules instead of partial snippets
  • Elastic query parser — boolean operators, wildcards, search hints

Node.js SDK (Orchestration)

The SDK wraps the Rust engine in a programmable layer for AI workflows:

  • ProbeAgent — multi-turn AI conversations with tool execution loops (up to 30 iterations)
  • Multi-provider support — Anthropic, OpenAI, Google, AWS
  • Session caching — prevents duplicate code blocks across related searches
  • Token tracking — keeps results within LLM context windows

Interfaces

  • CLIprobe search, probe query, probe extract, probe-chat
  • MCP Server — bidirectional protocol for AI editors (Claude Code, Cursor, Windsurf)
  • Web UI — browser-based chat interface for non-engineers
  • Node.js SDK — build custom tools and pipelines programmatically

Core Commands

Probe has three core commands for different code intelligence tasks.

Search — Semantic Code Discovery

Find code patterns using natural language, boolean operators, and wildcards:

bash
probe search "authentication middleware" ./src
bash
probe search "error handling AND retry logic" ./

Use search hints to filter results:

bash
probe search "database connection" ./ ext:py lang:python
bash
probe search "Router" go:github.com/gin-gonic/gin  # search dependencies

Control output for AI context windows:

bash
probe search "auth flow" ./ --max-tokens 12000 --format json

Search reference →

Find code by structure using tree-sitter patterns, independent of naming:

bash
probe query "fn $NAME($$$PARAMS) -> Result<$$$TYPES>" ./src --language rust
bash
probe query "async function $NAME($$$PARAMS) { $$$BODY }" ./src --language javascript

Metavariables match flexibly:

  • $NAME — matches a single node
  • $$$BODY — matches multiple nodes
  • $_ — anonymous wildcard

Query reference →

Extract — Targeted Code Retrieval

Pull specific code blocks by line, symbol, or range:

bash
probe extract src/auth.rs:42              # by line number
probe extract src/auth.rs#authenticate    # by symbol name
probe extract src/auth.rs:10-50           # by range

Extract from git diffs and clipboard:

bash
git diff HEAD~1 | probe extract --diff
probe extract --from-clipboard

Built-in LLM templates for AI-assisted analysis:

bash
probe extract src/auth.rs#authenticate --prompt engineer

Extract reference →

AI Chat

Interactive AI-powered code exploration from the terminal or browser:

bash
probe-chat ./my-project
bash
probe-chat ./my-project --web    # browser-based UI

Features:

  • Multi-turn conversations grounded in your actual code
  • Automatic tool selection (search, query, extract, grep, bash)
  • Code editing with --allow-edit
  • Conversation history and session continuity
  • Custom personas with --prompt

AI Editor Integration (MCP)

Probe runs as an MCP server, giving AI editors deep code intelligence:

bash
# Add to Claude Code
claude mcp add probe -- npx -y @probelabs/probe@latest agent --mcp
json
// Claude Desktop / Cursor / Windsurf config
{
  "mcpServers": {
    "probe": {
      "command": "probe",
      "args": ["mcp"]
    }
  }
}

Five tools exposed via MCP: search, query, extract, listFiles, searchFiles

Probe is bidirectional — it acts as both an MCP server (sharing search capabilities) and an MCP client (consuming external tools like Jira, Slack, and databases).

Transport methods: stdio (local), HTTP/SSE (remote), WebSocket (real-time)

MCP reference →

Node.js SDK

Build custom tools and pipelines on top of Probe programmatically:

bash
npm install @probelabs/probe@latest
javascript
import { search, query, extract } from '@probelabs/probe';

// Semantic search
const results = await search({
  path: '/project',
  query: 'authentication middleware',
  maxResults: 10,
  reranker: 'hybrid'
});

// AST pattern matching
const functions = await query({
  path: '/project',
  pattern: 'fn $NAME($$$PARAMS) $$$BODY',
  language: 'rust',
  format: 'json'
});

// Targeted extraction
const code = await extract({
  files: ['/project/src/main.rs:42'],
  contextLines: 5,
  format: 'markdown'
});

AI Framework Integration

Generate tool definitions for Vercel AI SDK and LangChain:

javascript
import { createTools } from '@probelabs/probe';

const tools = createTools({
  sessionId: 'my-session',
  debug: true,
  maxTokens: 4000
});

Session-based caching ensures identical session IDs prevent duplicate code blocks across related searches.

Node.js SDK reference →

Language Support

Probe understands code structure in 15 languages through tree-sitter grammars:

LanguageExtensionsAST Support
Rust.rsFull
JavaScript.js, .jsxFull
TypeScript.ts, .tsxFull
Python.pyFull
Go.goFull
C.c, .hFull
C++.cpp, .hpp, .ccFull
Java.javaFull
Ruby.rbFull
PHP.phpFull
Swift.swiftFull
C#.csFull
YAML.yaml, .ymlFull
HTML.htmlFull
Markdown.mdFull

Each language implementation provides:

  • Tree-sitter grammar for AST parsing
  • Parent node validation for block extraction
  • Test code detection and filtering
  • Comment association with related code

Language support reference →

Dependency Searching

Search inside your project's dependencies directly from the CLI:

bash
# Go modules
probe search "Router" go:github.com/gin-gonic/gin

# npm packages
probe search "middleware" js:express

# Rust crates
probe search "Serialize" rust:serde

Probe resolves and downloads dependency source code automatically, so you can understand how libraries work without leaving your terminal.

Output Formats

Six output formats for different consumers:

bash
probe search "auth" ./ --format json       # for machines and AI
probe search "auth" ./ --format markdown   # for documentation
probe search "auth" ./ --format xml        # for XML-based tools
probe search "auth" ./ --format plain      # for piping
probe search "auth" ./ --format terminal   # for terminals without color
probe search "auth" ./ --format color      # for terminals with color (default)

JSON output structure:

json
{
  "results": [
    {
      "file": "src/auth.rs",
      "lines": { "start": 10, "end": 25 },
      "node_type": "function",
      "code": "fn authenticate(token: &str) -> Result<User> { ... }",
      "score": 0.95,
      "rank": 1
    }
  ],
  "summary": {
    "count": 5,
    "total_bytes": 1024,
    "total_tokens": 256
  }
}

Output formats reference →

Security and Performance

Security

  • Local-first — no cloud indexing, no embeddings, code stays on your machine
  • File access boundariesallowedFolders restricts what the agent can access
  • Path canonicalization — prevents directory traversal attacks
  • Command filtering — pattern-based bash command restrictions per skill
  • Method filtering — MCP method whitelisting with wildcard patterns

Performance

  • ~1GB/s file scanning via ripgrep
  • SIMD-optimized ranking — 4-8x speedup on BM25/TF-IDF scoring
  • Parser pooling — tree-sitter parsers reused across queries
  • Session caching — prevents duplicate processing across related searches
  • Token-aware limits--max-tokens keeps results within LLM context windows

Installation

bash
# npm (recommended)
npm install -g @probelabs/probe@latest

# Docker
docker pull probelabs/probe:latest

# curl (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/probelabs/probe/main/install.sh | bash

# From source
git clone https://github.com/probelabs/probe.git
cd probe && cargo build --release

Verify:

bash
probe --version

Installation reference →

How Probe Compares

grep/ripgrepEmbedding toolsKnowledge graphsProbe
SetupNoneMinutes (indexing + API)Heavy (Neo4j, LSP)None
Result unitLines~512-char chunksGraph nodesComplete AST blocks
Natural languageNoBuilt-inLimitedLLM generates boolean queries
Exact searchRegexWeakBy name onlySIMD-accelerated + boolean operators
DeterministicYesNo (model-dependent)YesYes
Token awarenessNoPartialLimitedBuilt-in (--max-tokens, session dedup)
External depsNoneEmbedding APIDatabase + LSPNone
Call graphNoNoYesComing soon (LSP integration)

Embedding tools (grepai, Octocode) are best when humans search with natural language. Knowledge graphs (Stakgraph, ABCoder) are best for structural questions ("who calls this?"). Probe is best when AI agents need code context — fast, deterministic, zero-setup, and AST-aware.

Where Probe Fits

Probe is the context engine behind every ProbeLabs workflow:

Full Reference (GitHub)

Detailed reference docs live on GitHub:


Questions? Join the Discord community or book a demo.