Search Functionality Reference 
Complete reference documentation for Probe's search capabilities, including query syntax, ranking algorithms, and advanced search techniques.
SEARCH COMMAND 
probe search <QUERY> [PATH] [OPTIONS]CORE PARAMETERS 
| Parameter | Description | 
|---|---|
| <QUERY> | Required: Search terms or expression | 
| [PATH] | Directory to search (defaults to current directory) | 
KEY OPTIONS 
| Option | Description | Default | 
|---|---|---|
| --files-only | List matching files without code blocks | Off | 
| --ignore <PATTERN> | Additional patterns to ignore | None | 
| --exclude-filenames, -n | Exclude filenames from matching | Off | 
| --reranker, -r <TYPE> | Ranking algorithm: hybrid,hybrid2,bm25,tfidf | hybrid | 
| --frequency, -s | Enable smart token matching | On | 
| --max-results <N> | Limit number of results | No limit | 
| --max-bytes <N> | Limit total bytes of code returned | No limit | 
| --max-tokens <N> | Limit total tokens | No limit | 
| --allow-tests | Include test files and code | Off | 
| --any-term | Match any search term (OR logic) | Off | 
| --no-merge | Keep code blocks separate | Off | 
| --merge-threshold <N> | Max lines between blocks to merge | 5 | 
| --session <ID> | Session ID for caching results | None | 
| --format <TYPE> | Output format: color,plain,markdown,json | color | 
For complete option details, see probe search --help.
QUERY PROCESSING 
Probe enhances search queries through several techniques:
TOKENIZATION 
Breaks down terms into tokens:
findUserByEmail → [find, user, by, email]STEMMING 
Reduces words to their root form:
implementing, implementation → implementSMART PATTERN GENERATION 
- Term Boundaries: Understands where code tokens start/end
- Case Handling: Works with camelCase, snake_case, etc.
- Compound Handling: Breaks down compound terms
QUERY SYNTAX 
Probe supports an Elasticsearch-like query syntax:
BASIC TERMS 
probe search "authentication"  # Single term
probe search "user authentication"  # Multiple terms (AND logic)BOOLEAN OPERATORS 
probe search "error AND handling"  # Require both terms
probe search "login OR authentication"  # Match either term
probe search "database NOT sqlite"  # Exclude termGROUPING 
probe search "(error OR exception) AND (handle OR process)"TERM MODIFIERS 
probe search "+authentication login"  # Required term
probe search "database -sqlite"  # Excluded term
probe search "\"handle error\""  # Exact phraseFIELD SPECIFIERS 
probe search "function:authenticate"  # Search in function namesWILDCARDS 
probe search "auth*"  # Matches "auth", "authentication", "authorize", etc.RANKING ALGORITHMS 
Probe uses sophisticated algorithms to rank search results:
TF-IDF RANKING 
Term Frequency-Inverse Document Frequency balances how often terms appear in a specific code block against how common they are across the codebase.
HOW IT WORKS 
- Term Frequency (TF): How often a term appears in a code block - TF(term, block) = (Number of times term appears in block) / (Total number of terms in block)
- Inverse Document Frequency (IDF): Measures how unique or rare a term is - IDF(term) = ln(Total number of blocks / Number of blocks containing term)
- TF-IDF Score: Combines these factors - TF-IDF(term, block) = TF(term, block) * IDF(term)^2
Key benefits:
- Rewards matches on rare, important terms
- Penalizes common terms that appear everywhere
- Considers term frequency within each code block
BM25 RANKING 
BM25 (Best Matching 25) is an improved version of TF-IDF that addresses some of its limitations.
HOW IT WORKS 
BM25(block, query) = ∑ IDF(term) * (TF(term, block) * (k1 + 1)) / (TF(term, block) + k1 * (1 - b + b * (block_length / average_block_length)))Where:
- k1(1.2): Controls term frequency saturation
- b(0.75): Controls length normalization
Key benefits:
- Better handling of document length
- Diminishing returns for repeated terms
- More accurate for longer code blocks
- Improved handling of edge cases
HYBRID RANKING 
Probe's default ranking algorithm combines multiple signals for superior results.
HOW IT WORKS 
The hybrid algorithm considers:
- Combined score: Weighted combination of TF-IDF and BM25 - Combined = α * TF-IDF + (1-α) * BM25
- Position weights: Terms in function names, class names, and identifiers receive higher scores 
- Block metrics: - Number of unique terms matched
- Total matches in the block
- Block type (methods score higher than comments)
 
- File metrics: - File match rank
- Number of unique terms in the file
- Total matches in the file
 
Key benefits:
- More balanced scoring across different code structures
- Better handling of both short and long code blocks
- Prioritizes meaningful code over comments or boilerplate
HYBRID2 RANKING 
An enhanced version of the hybrid algorithm with improved relevance:
- Better normalization of scores across different metrics
- Enhanced weighting for structural elements
- Improved handling of term proximity
- More sophisticated position weighting
PRACTICAL EXAMPLES 
FINDING ERROR HANDLING CODE 
probe search "error handling try catch"This search:
- Tokenizes to: ["error", "handl", "try", "catch"]
- Matches files containing these terms
- Ranks results based on term frequency and importance
- Returns complete code blocks with error handling logic
SEARCHING FOR AUTHENTICATION FLOWS 
probe search "(login OR authenticate) AND (user OR account) NOT test"This complex query:
- Finds code with either "login" or "authenticate"
- Requires either "user" or "account" to be present
- Excludes results containing "test"
- Returns ranked, complete code blocks
FINDING SPECIFIC API ENDPOINTS 
probe search "function:create* api endpoint"This search:
- Targets functions starting with "create"
- Requires "api" and "endpoint" terms
- Returns complete function definitions
- Ranks results with the most relevant endpoints first
LIMITING RESULTS FOR AI INTEGRATION 
probe search "database connection pool" --max-tokens 4000 --format jsonThis search:
- Finds code related to database connection pools
- Limits results to fit within 4000 tokens
- Returns JSON-formatted output suitable for AI processing
PERFORMANCE TIPS 
- Be specific: More specific queries yield more relevant results
- Use field specifiers: Target specific code elements with function:,class:, etc.
- Leverage boolean operators: Combine terms with AND, OR, NOT for precision
- Control result size: Use --max-results,--max-bytes, or--max-tokensfor large codebases
- Session caching: Use --sessionto avoid seeing the same code blocks repeatedly
- Experiment with rankers: Try different ranking algorithms for different types of searches
For more information on how Probe works internally, see How Probe Works. For details on code extraction, see Code Extraction.