System Architecture
Core Components
1. MCP Server
The Model Context Protocol server that connects to AI assistants:- Tool Handlers: Implements 7 MCP tools for indexing/search
- File Watcher: Real-time incremental indexing
- Code Chunker: AST-based splitting for semantic units
2. Backend API
High-performance server handling:- Embeddings: Generate 4096-dimensional vectors
- Search: Hybrid search + reranking
- Storage: Vector collections + sync metadata
3. Models
| Model | Purpose | Dimensions |
|---|---|---|
| Sharc-Embed | Code embeddings | 4096 |
| Sharc-Rerank | Relevance scoring | - |
Data Flow
Indexing Flow
Search Flow
Key Design Decisions
Why Hybrid Search?
Dense vectors excel at semantic similarity, but miss exact keyword matches. BM25 catches these:| Query | Dense Only | Hybrid |
|---|---|---|
| ”authenticate user” | Finds auth code | Same |
| ”JWT validation” | Might miss | Finds exact match |
| ”function getUserById” | Misses | BM25 finds it |
Why AST-Based Chunking?
Traditional text chunking breaks code at arbitrary points. AST chunking:- Extracts complete functions/classes
- Preserves semantic context
- Injects parent class/module information
- Includes decorator/annotation context
Why 4096 Dimensions?
Full embedding dimension provides:- Maximum semantic resolution
- Better differentiation of similar code
- No information loss from truncation
Why Incremental Sync?
Merkle-based sync enables:- O(log n) change detection
- ~0.3s for unchanged codebases
- Only re-index modified files
Performance tiers
SHARC supports two performance tiers:| Tier | Summary | Best for |
|---|---|---|
| Standard | Slower baseline tier with higher latency. | General usage and non-latency-critical workflows. |
| Performance | Fastest tier with the lowest latency. | Interactive workflows where speed is critical. |