Search Functionality Reference
This page documents Probe search syntax and behavior.
Complete reference documentation for Probe's search capabilities, including query syntax, ranking algorithms, and advanced search techniques.
SEARCH COMMAND
probe search <QUERY> [PATH] [OPTIONS]CORE PARAMETERS
| Parameter | Description |
|---|---|
<QUERY> | Required: Search terms or expression |
[PATH] | Directory to search (defaults to current directory) |
KEY OPTIONS
| Option | Description | Default |
|---|---|---|
--files-only | List matching files without code blocks | Off |
--ignore <PATTERN> | Additional patterns to ignore | None |
--exclude-filenames, -n | Exclude filenames from matching | Off |
--reranker, -r <TYPE> | Ranking algorithm: hybrid, hybrid2, bm25, tfidf | hybrid |
--frequency, -s | Enable smart token matching | On |
--max-results <N> | Limit number of results | No limit |
--max-bytes <N> | Limit total bytes of code returned | No limit |
--max-tokens <N> | Limit total tokens | No limit |
--allow-tests | Include test files and code | Off |
--any-term | Match any search term (OR logic) | Off |
--no-merge | Keep code blocks separate | Off |
--merge-threshold <N> | Max lines between blocks to merge | 5 |
--session <ID> | Session ID for caching results | None |
--format <TYPE> | Output format: color, plain, markdown, json | color |
For complete option details, see probe search --help.
QUERY PROCESSING
Probe enhances search queries through several techniques:
TOKENIZATION
Breaks down terms into tokens:
findUserByEmail → [find, user, by, email]STEMMING
Reduces words to their root form:
implementing, implementation → implementSMART PATTERN GENERATION
- Term Boundaries: Understands where code tokens start/end
- Case Handling: Works with camelCase, snake_case, etc.
- Compound Handling: Breaks down compound terms
QUERY SYNTAX
Probe supports an Elasticsearch-like query syntax:
BASIC TERMS
probe search "authentication" # Single term
probe search "user authentication" # Multiple terms (AND logic)BOOLEAN OPERATORS
probe search "error AND handling" # Require both terms
probe search "login OR authentication" # Match either term
probe search "database NOT sqlite" # Exclude termGROUPING
probe search "(error OR exception) AND (handle OR process)"TERM MODIFIERS
probe search "+authentication login" # Required term
probe search "database -sqlite" # Excluded term
probe search "\"handle error\"" # Exact phraseFIELD SPECIFIERS
probe search "function:authenticate" # Search in function namesSEARCH HINTS
Search hints allow you to filter results by file properties. These filters are applied at the file discovery stage and removed from the query before content searching:
probe search "error AND ext:rs" # Only search in .rs files
probe search "class AND file:src/**/*.py" # Only search in Python files under src/
probe search "function AND dir:tests" # Only search in files under tests/ directory
probe search "struct AND type:rust" # Only search in Rust files
probe search "component AND lang:javascript" # Only search in JavaScript filesAvailable Search Hints:
| Hint | Description | Examples |
|---|---|---|
ext:<extension> | Filter by file extension | ext:rs, ext:py,js,ts |
file:<pattern> | Filter by file path pattern (supports globs) | file:src/**/*.rs, file:*test* |
path:<pattern> | Alias for file: | path:src/main.rs |
dir:<pattern> | Filter by directory pattern | dir:src, dir:tests |
type:<filetype> | Filter by ripgrep file type | type:rust, type:javascript |
lang:<language> | Filter by programming language | lang:rust, lang:python |
Combining Search Hints:
# Multiple filters (all must match)
probe search "config AND ext:rs AND dir:src"
# Multiple extensions (any can match)
probe search "import AND ext:js,ts,jsx,tsx"
# Complex combinations
probe search "(error OR exception) AND ext:rs AND file:src/**/*"WILDCARDS
probe search "auth*" # Matches "auth", "authentication", "authorize", etc.RANKING ALGORITHMS
Probe uses sophisticated algorithms to rank search results:
TF-IDF RANKING
Term Frequency-Inverse Document Frequency balances how often terms appear in a specific code block against how common they are across the codebase.
HOW IT WORKS
Term Frequency (TF): How often a term appears in a code block
TF(term, block) = (Number of times term appears in block) / (Total number of terms in block)Inverse Document Frequency (IDF): Measures how unique or rare a term is
IDF(term) = ln(Total number of blocks / Number of blocks containing term)TF-IDF Score: Combines these factors
TF-IDF(term, block) = TF(term, block) * IDF(term)^2
Key benefits:
- Rewards matches on rare, important terms
- Penalizes common terms that appear everywhere
- Considers term frequency within each code block
BM25 RANKING
BM25 (Best Matching 25) is an improved version of TF-IDF that addresses some of its limitations.
HOW IT WORKS
BM25(block, query) = ∑ IDF(term) * (TF(term, block) * (k1 + 1)) / (TF(term, block) + k1 * (1 - b + b * (block_length / average_block_length)))Where:
k1(1.2): Controls term frequency saturationb(0.75): Controls length normalization
Key benefits:
- Better handling of document length
- Diminishing returns for repeated terms
- More accurate for longer code blocks
- Improved handling of edge cases
HYBRID RANKING
Probe's default ranking algorithm combines multiple signals for superior results.
HOW IT WORKS
The hybrid algorithm considers:
Combined score: Weighted combination of TF-IDF and BM25
Combined = α * TF-IDF + (1-α) * BM25Position weights: Terms in function names, class names, and identifiers receive higher scores
Block metrics:
- Number of unique terms matched
- Total matches in the block
- Block type (methods score higher than comments)
File metrics:
- File match rank
- Number of unique terms in the file
- Total matches in the file
Key benefits:
- More balanced scoring across different code structures
- Better handling of both short and long code blocks
- Prioritizes meaningful code over comments or boilerplate
HYBRID2 RANKING
An enhanced version of the hybrid algorithm with improved relevance:
- Better normalization of scores across different metrics
- Enhanced weighting for structural elements
- Improved handling of term proximity
- More sophisticated position weighting
PRACTICAL EXAMPLES
FINDING ERROR HANDLING CODE
probe search "error handling try catch"This search:
- Tokenizes to: ["error", "handl", "try", "catch"]
- Matches files containing these terms
- Ranks results based on term frequency and importance
- Returns complete code blocks with error handling logic
SEARCHING FOR AUTHENTICATION FLOWS
probe search "(login OR authenticate) AND (user OR account) NOT test"This complex query:
- Finds code with either "login" or "authenticate"
- Requires either "user" or "account" to be present
- Excludes results containing "test"
- Returns ranked, complete code blocks
FINDING SPECIFIC API ENDPOINTS
probe search "function:create* api endpoint"This search:
- Targets functions starting with "create"
- Requires "api" and "endpoint" terms
- Returns complete function definitions
- Ranks results with the most relevant endpoints first
LIMITING RESULTS FOR AI INTEGRATION
probe search "database connection pool" --max-tokens 4000 --format jsonThis search:
- Finds code related to database connection pools
- Limits results to fit within 4000 tokens
- Returns JSON-formatted output suitable for AI processing
PERFORMANCE TIPS
- Be specific: More specific queries yield more relevant results
- Use field specifiers: Target specific code elements with
function:,class:, etc. - Leverage boolean operators: Combine terms with AND, OR, NOT for precision
- Control result size: Use
--max-results,--max-bytes, or--max-tokensfor large codebases - Session caching: Use
--sessionto avoid seeing the same code blocks repeatedly - Experiment with rankers: Try different ranking algorithms for different types of searches
For more information on how Probe works internally, see How Probe Works. For details on code extraction, see Code Extraction Reference.