king-of-the-grackles/reddit-mcp-poc # codebase.md

This is page 3 of 3. Use http://codebase.md/king-of-the-grackles/reddit-mcp-poc?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .env.sample
├── .gemini
│   └── settings.json
├── .gitignore
├── .python-version
├── .specify
│   ├── memory
│   │   └── constitution.md
│   ├── scripts
│   │   └── bash
│   │       ├── check-implementation-prerequisites.sh
│   │       ├── check-task-prerequisites.sh
│   │       ├── common.sh
│   │       ├── create-new-feature.sh
│   │       ├── get-feature-paths.sh
│   │       ├── setup-plan.sh
│   │       └── update-agent-context.sh
│   └── templates
│       ├── agent-file-template.md
│       ├── plan-template.md
│       ├── spec-template.md
│       └── tasks-template.md
├── package.json
├── pyproject.toml
├── README.md
├── reddit-research-agent.md
├── reports
│   ├── ai-llm-weekly-trends-reddit-analysis-2025-01-20.md
│   ├── saas-solopreneur-reddit-communities.md
│   ├── top-50-active-AI-subreddits.md
│   ├── top-50-subreddits-saas-ai-builders.md
│   └── top-50-subreddits-saas-solopreneurs.md
├── server.json
├── specs
│   ├── 003-fastmcp-context-integration.md
│   ├── 003-implementation-summary.md
│   ├── 003-phase-1-context-integration.md
│   ├── 003-phase-2-progress-monitoring.md
│   ├── agent-reasoning-visibility.md
│   ├── agentic-discovery-architecture.md
│   ├── chroma-proxy-architecture.md
│   ├── deep-research-reddit-architecture.md
│   └── reddit-research-agent-spec.md
├── src
│   ├── __init__.py
│   ├── chroma_client.py
│   ├── config.py
│   ├── models.py
│   ├── resources.py
│   ├── server.py
│   └── tools
│       ├── __init__.py
│       ├── comments.py
│       ├── discover.py
│       ├── posts.py
│       └── search.py
├── tests
│   ├── test_context_integration.py
│   └── test_tools.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/specs/agentic-discovery-architecture.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Agentic Discovery Architecture with OpenAI Agents SDK
  2 | 
  3 | ## Overview
  4 | This document outlines the refactoring of the monolithic `discover.py` tool into a modular, agentic architecture using OpenAI's Python Agents SDK. Each agent has a single, well-defined responsibility and can hand off to other specialized agents as needed.
  5 | 
  6 | ### Why Agentic Architecture?
  7 | 
  8 | The current monolithic `discover.py` file (400+ lines) combines multiple concerns:
  9 | - Query processing and analysis
 10 | - API interaction and error handling
 11 | - Scoring and ranking algorithms
 12 | - Result formatting and synthesis
 13 | - Batch operations management
 14 | 
 15 | This creates several problems:
 16 | 1. **Testing Complexity**: Can't test scoring without API calls
 17 | 2. **Limited Reusability**: Can't use validation logic elsewhere
 18 | 3. **Performance Issues**: Sequential processing of batch requests
 19 | 4. **Maintenance Burden**: Changes risk breaking unrelated functionality
 20 | 5. **Scaling Challenges**: Adding features requires modifying core logic
 21 | 
 22 | The agentic approach solves these issues by decomposing functionality into specialized, autonomous agents that collaborate through well-defined interfaces.
 23 | 
 24 | ## Architecture Principles
 25 | 
 26 | 1. **Single Responsibility**: Each agent performs one specific task excellently
 27 | 2. **Composability**: Agents can be combined in different ways for various workflows
 28 | 3. **Testability**: Each agent can be tested in isolation
 29 | 4. **Observability**: Full tracing of agent decision-making process
 30 | 5. **Efficiency**: Smart routing and parallel execution where possible
 31 | 
 32 | ## Directory Structure
 33 | 
 34 | ```
 35 | reddit-research-mcp/src/
 36 | ├── agents/
 37 | │   ├── __init__.py
 38 | │   ├── discovery_orchestrator.py
 39 | │   ├── query_analyzer.py
 40 | │   ├── subreddit_scorer.py
 41 | │   ├── search_executor.py
 42 | │   ├── batch_manager.py
 43 | │   ├── validator.py
 44 | │   └── synthesizer.py
 45 | ├── models/
 46 | │   ├── __init__.py
 47 | │   ├── discovery_context.py
 48 | │   └── discovery_models.py
 49 | ├── tools/
 50 | │   └── discover_agent.py
 51 | ```
 52 | 
 53 | ## Agent Specifications
 54 | 
 55 | ### 1. Discovery Orchestrator Agent
 56 | **File**: `agents/discovery_orchestrator.py`
 57 | 
 58 | **Purpose**: Routes discovery requests to the appropriate specialized agent based on query type and requirements.
 59 | 
 60 | **Why This Agent?**
 61 | The Discovery Orchestrator serves as the intelligent entry point that prevents inefficient processing. In the monolithic approach, every query goes through the same pipeline regardless of complexity. This agent enables:
 62 | - **Smart Routing**: Simple queries skip unnecessary analysis steps
 63 | - **Resource Optimization**: Uses appropriate agents based on query complexity
 64 | - **Error Isolation**: Failures in one path don't affect others
 65 | - **Scalability**: New discovery strategies can be added without modifying core logic
 66 | 
 67 | **Architectural Role**:
 68 | - **Entry Point**: First agent in every discovery workflow
 69 | - **Traffic Director**: Routes to specialized agents based on intent
 70 | - **Fallback Handler**: Manages errors and edge cases gracefully
 71 | - **Performance Optimizer**: Chooses fastest path for each query type
 72 | 
 73 | **Problem Solved**: 
 74 | The monolithic `discover.py` processes all queries identically, wasting resources on simple validations and lacking optimization for batch operations. The orchestrator eliminates this inefficiency.
 75 | 
 76 | **Key Interactions**:
 77 | - **Receives**: Raw discovery requests from the main entry point
 78 | - **Delegates To**: Query Analyzer (complex), Batch Manager (multiple), Validator (verification), Search Executor (simple)
 79 | - **Returns**: Final results from delegated agents
 80 | 
 81 | **Key Responsibilities**:
 82 | - Analyze incoming discovery requests
 83 | - Determine optimal discovery strategy
 84 | - Route to appropriate specialized agent
 85 | - Handle edge cases and errors gracefully
 86 | 
 87 | **Model**: `gpt-4o-mini` (lightweight routing decisions)
 88 | 
 89 | **Handoffs**:
 90 | - Query Analyzer (for complex queries)
 91 | - Batch Manager (for multiple queries)
 92 | - Validator (for direct validation)
 93 | - Search Executor (for simple searches)
 94 | 
 95 | **Implementation**:
 96 | ```python
 97 | from agents import Agent, handoff
 98 | from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX
 99 | 
100 | discovery_orchestrator = Agent[DiscoveryContext](
101 |     name="Discovery Orchestrator",
102 |     instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
103 |     You are a routing agent for Reddit discovery requests.
104 |     
105 |     Analyze the incoming request and determine the best path:
106 |     - Complex queries needing analysis → Query Analyzer
107 |     - Batch/multiple queries → Batch Manager  
108 |     - Direct subreddit validation → Validator
109 |     - Simple searches → Search Executor
110 |     
111 |     Consider efficiency and accuracy when routing.
112 |     """,
113 |     model="gpt-4o-mini",
114 |     handoffs=[query_analyzer, batch_manager, validator, search_executor]
115 | )
116 | ```
117 | 
118 | ### 2. Query Analyzer Agent
119 | **File**: `agents/query_analyzer.py`
120 | 
121 | **Purpose**: Analyzes and enhances search queries for better results.
122 | 
123 | **Why This Agent?**
124 | Reddit's search API is notoriously limited and literal. The Query Analyzer transforms vague or complex user queries into optimized search strategies. This agent provides:
125 | - **Semantic Understanding**: Interprets user intent beyond literal keywords
126 | - **Query Expansion**: Adds synonyms and related terms for comprehensive results
127 | - **Search Strategy**: Determines best approach (broad vs. specific search)
128 | - **Intent Classification**: Distinguishes between topic exploration vs. specific community search
129 | 
130 | **Architectural Role**:
131 | - **Query Preprocessor**: Enhances queries before they hit the Reddit API
132 | - **Intent Detector**: Classifies what the user is really looking for
133 | - **Strategy Advisor**: Recommends search approaches to downstream agents
134 | - **NLP Specialist**: Applies language understanding to improve results
135 | 
136 | **Problem Solved**:
137 | The monolithic approach uses raw queries directly, leading to poor results when users use natural language or ambiguous terms. This agent bridges the gap between human expression and API requirements.
138 | 
139 | **Key Interactions**:
140 | - **Receives From**: Discovery Orchestrator (complex queries)
141 | - **Processes**: Raw user queries into structured search plans
142 | - **Hands Off To**: Search Executor (with enhanced query and strategy)
143 | - **Provides**: Keywords, expanded terms, and intent classification
144 | 
145 | **Key Responsibilities**:
146 | - Extract keywords and intent
147 | - Expand query with related terms
148 | - Classify query type (topic, community, specific)
149 | - Generate search strategies
150 | 
151 | **Tools**:
152 | ```python
153 | @function_tool
154 | def extract_keywords(wrapper: RunContextWrapper[DiscoveryContext], text: str) -> List[str]:
155 |     """Extract meaningful keywords from query text."""
156 |     # Implementation from current discover.py
157 |     
158 | @function_tool
159 | def expand_query(wrapper: RunContextWrapper[DiscoveryContext], query: str) -> QueryExpansion:
160 |     """Expand query with synonyms and related terms."""
161 |     # Generate variations and related terms
162 |     
163 | @function_tool
164 | def classify_intent(wrapper: RunContextWrapper[DiscoveryContext], query: str) -> QueryIntent:
165 |     """Classify the intent behind the query."""
166 |     # Return: topic_search, community_search, validation, etc.
167 | ```
168 | 
169 | **Output Type**:
170 | ```python
171 | class AnalyzedQuery(BaseModel):
172 |     original: str
173 |     keywords: List[str]
174 |     expanded_terms: List[str]
175 |     intent: QueryIntent
176 |     suggested_strategy: str
177 |     confidence: float
178 | ```
179 | 
180 | **Model**: `gpt-4o` (complex language understanding)
181 | 
182 | **Handoffs**: Search Executor (with enhanced query)
183 | 
184 | ### 3. Subreddit Scorer Agent
185 | **File**: `agents/subreddit_scorer.py`
186 | 
187 | **Purpose**: Scores and ranks subreddit relevance with detailed confidence metrics.
188 | 
189 | **Why This Agent?**
190 | Reddit's search API returns results in arbitrary order with many false positives. The Subreddit Scorer applies sophisticated ranking algorithms to surface the most relevant communities. This agent provides:
191 | - **Multi-Factor Scoring**: Combines name match, description relevance, and activity levels
192 | - **False Positive Detection**: Identifies and penalizes misleading matches
193 | - **Confidence Metrics**: Provides transparency about why results are ranked
194 | - **Activity Weighting**: Prioritizes active communities over dead ones
195 | 
196 | **Architectural Role**:
197 | - **Quality Filter**: Ensures only relevant results reach the user
198 | - **Ranking Engine**: Orders results by true relevance, not API defaults
199 | - **Confidence Calculator**: Provides scoring transparency
200 | - **Post-Processor**: Refines raw search results into useful recommendations
201 | 
202 | **Problem Solved**:
203 | The monolithic approach has scoring logic embedded throughout, making it hard to tune or test. False positives (like "pythonball" for "python") pollute results. This agent centralizes and perfects scoring logic.
204 | 
205 | **Key Interactions**:
206 | - **Receives From**: Search Executor (raw search results)
207 | - **Processes**: Unranked subreddits into scored, ranked list
208 | - **Sends To**: Result Synthesizer (for final formatting)
209 | - **Collaborates With**: Batch Manager (for scoring multiple search results)
210 | 
211 | **Key Responsibilities**:
212 | - Calculate name match scores
213 | - Evaluate description relevance
214 | - Assess community activity
215 | - Apply penalties for false positives
216 | - Generate confidence scores
217 | 
218 | **Tools**:
219 | ```python
220 | @function_tool
221 | def calculate_name_match(wrapper: RunContextWrapper[DiscoveryContext], 
222 |                          subreddit_name: str, query: str) -> float:
223 |     """Calculate how well subreddit name matches query."""
224 |     # Implementation from current discover.py
225 |     
226 | @function_tool
227 | def calculate_description_score(wrapper: RunContextWrapper[DiscoveryContext],
228 |                                description: str, query: str) -> float:
229 |     """Score based on query presence in description."""
230 |     # Implementation from current discover.py
231 |     
232 | @function_tool
233 | def calculate_activity_score(wrapper: RunContextWrapper[DiscoveryContext],
234 |                             subscribers: int) -> float:
235 |     """Score based on community size and activity."""
236 |     # Implementation from current discover.py
237 |     
238 | @function_tool
239 | def calculate_penalties(wrapper: RunContextWrapper[DiscoveryContext],
240 |                        subreddit_name: str, query: str) -> float:
241 |     """Apply penalties for likely false positives."""
242 |     # Implementation from current discover.py
243 | ```
244 | 
245 | **Output Type**:
246 | ```python
247 | class ScoredSubreddit(BaseModel):
248 |     name: str
249 |     confidence: float
250 |     match_type: str
251 |     score_breakdown: Dict[str, float]
252 |     ranking: int
253 | ```
254 | 
255 | **Model**: `gpt-4o-mini` (mathematical calculations)
256 | 
257 | **Tool Use Behavior**: `stop_on_first_tool` (direct scoring results)
258 | 
259 | ### 4. Search Executor Agent
260 | **File**: `agents/search_executor.py`
261 | 
262 | **Purpose**: Executes Reddit API searches efficiently with error handling.
263 | 
264 | **Why This Agent?**
265 | Direct API interaction requires careful error handling, rate limit management, and caching. The Search Executor isolates all Reddit API complexity from other agents. This agent provides:
266 | - **API Abstraction**: Other agents don't need to know Reddit API details
267 | - **Error Resilience**: Handles rate limits, timeouts, and API failures gracefully
268 | - **Caching Layer**: Prevents redundant API calls for identical queries
269 | - **Result Validation**: Ensures data integrity before passing downstream
270 | 
271 | **Architectural Role**:
272 | - **API Gateway**: Single point of contact with Reddit API
273 | - **Error Handler**: Manages all API-related failures and retries
274 | - **Cache Manager**: Stores and retrieves recent search results
275 | - **Data Validator**: Ensures results are complete and valid
276 | 
277 | **Problem Solved**:
278 | The monolithic approach mixes API calls with business logic, making it hard to handle errors consistently or implement caching. This agent centralizes all API interaction concerns.
279 | 
280 | **Key Interactions**:
281 | - **Receives From**: Query Analyzer (enhanced queries) or Orchestrator (simple queries)
282 | - **Interacts With**: Reddit API via PRAW client
283 | - **Sends To**: Subreddit Scorer (for ranking)
284 | - **Caches**: Results in context for reuse by other agents
285 | 
286 | **Key Responsibilities**:
287 | - Execute Reddit API search calls
288 | - Handle API errors and rate limits
289 | - Validate returned results
290 | - Cache results for efficiency
291 | 
292 | **Tools**:
293 | ```python
294 | @function_tool
295 | async def search_reddit(wrapper: RunContextWrapper[DiscoveryContext],
296 |                         query: str, limit: int = 250) -> List[RawSubreddit]:
297 |     """Execute Reddit search API call."""
298 |     reddit = wrapper.context.reddit_client
299 |     results = []
300 |     for subreddit in reddit.subreddits.search(query, limit=limit):
301 |         results.append(RawSubreddit.from_praw(subreddit))
302 |     return results
303 |     
304 | @function_tool
305 | def handle_api_error(wrapper: RunContextWrapper[DiscoveryContext],
306 |                      error: Exception) -> ErrorStrategy:
307 |     """Determine how to handle API errors."""
308 |     # Retry logic, fallback strategies, etc.
309 | ```
310 | 
311 | **Output Type**:
312 | ```python
313 | class SearchResults(BaseModel):
314 |     query: str
315 |     results: List[RawSubreddit]
316 |     total_found: int
317 |     api_calls: int
318 |     cached: bool
319 |     errors: List[str]
320 | ```
321 | 
322 | **Model**: `gpt-4o-mini` (simple execution)
323 | 
324 | **Handoffs**: Subreddit Scorer (for ranking results)
325 | 
326 | ### 5. Batch Discovery Manager Agent
327 | **File**: `agents/batch_manager.py`
328 | 
329 | **Purpose**: Manages batch discovery operations for multiple queries.
330 | 
331 | **Why This Agent?**
332 | Users often need to discover communities across multiple related topics. The Batch Manager orchestrates parallel searches efficiently. This agent provides:
333 | - **Parallel Execution**: Runs multiple searches concurrently for speed
334 | - **Deduplication**: Removes duplicate subreddits across different searches
335 | - **API Optimization**: Minimizes total API calls through smart batching
336 | - **Result Aggregation**: Combines multiple search results intelligently
337 | 
338 | **Architectural Role**:
339 | - **Parallel Coordinator**: Manages multiple Search Executor instances
340 | - **Resource Manager**: Optimizes API usage across batch operations
341 | - **Result Aggregator**: Merges and deduplicates results from multiple searches
342 | - **Performance Optimizer**: Ensures batch operations complete quickly
343 | 
344 | **Problem Solved**:
345 | The monolithic approach processes batch queries sequentially, leading to slow performance. It also lacks sophisticated deduplication and aggregation logic for multiple searches.
346 | 
347 | **Key Interactions**:
348 | - **Receives From**: Discovery Orchestrator (batch requests)
349 | - **Spawns**: Multiple Search Executor agents in parallel
350 | - **Coordinates**: Parallel execution and result collection
351 | - **Sends To**: Result Synthesizer (aggregated results)
352 | 
353 | **Key Responsibilities**:
354 | - Coordinate multiple search operations
355 | - Optimize API calls through batching
356 | - Aggregate results from multiple searches
357 | - Manage parallel execution
358 | 
359 | **Tools**:
360 | ```python
361 | @function_tool
362 | async def coordinate_batch(wrapper: RunContextWrapper[DiscoveryContext],
363 |                           queries: List[str]) -> BatchPlan:
364 |     """Plan optimal batch execution strategy."""
365 |     # Determine parallelization, caching opportunities
366 |     
367 | @function_tool
368 | def merge_batch_results(wrapper: RunContextWrapper[DiscoveryContext],
369 |                         results: List[SearchResults]) -> BatchResults:
370 |     """Merge results from multiple searches."""
371 |     # Deduplicate, aggregate, summarize
372 | ```
373 | 
374 | **Model**: `gpt-4o` (complex coordination)
375 | 
376 | **Handoffs**: Multiple Search Executor agents (in parallel)
377 | 
378 | **Implementation Note**: Uses dynamic handoff creation for parallel execution
379 | 
380 | ### 6. Subreddit Validator Agent
381 | **File**: `agents/validator.py`
382 | 
383 | **Purpose**: Validates subreddit existence and accessibility.
384 | 
385 | **Why This Agent?**
386 | Users often have specific subreddit names that need verification. The Validator provides quick, focused validation without the overhead of full search. This agent provides:
387 | - **Direct Validation**: Checks specific subreddit names efficiently
388 | - **Access Verification**: Confirms subreddits are public and accessible
389 | - **Alternative Suggestions**: Recommends similar communities if validation fails
390 | - **Metadata Retrieval**: Gets detailed info about valid subreddits
391 | 
392 | **Architectural Role**:
393 | - **Verification Specialist**: Focused solely on validation tasks
394 | - **Fast Path**: Provides quick responses for known subreddit names
395 | - **Fallback Provider**: Suggests alternatives when validation fails
396 | - **Metadata Fetcher**: Retrieves comprehensive subreddit information
397 | 
398 | **Problem Solved**:
399 | The monolithic approach treats validation as a special case of search, which is inefficient. Users waiting to verify "r/python" shouldn't trigger a full search pipeline.
400 | 
401 | **Key Interactions**:
402 | - **Receives From**: Discovery Orchestrator (direct validation requests)
403 | - **Validates**: Specific subreddit names via Reddit API
404 | - **Returns**: Validation status with metadata or alternatives
405 | - **May Trigger**: Search Executor (to find alternatives if validation fails)
406 | 
407 | **Key Responsibilities**:
408 | - Check if subreddit exists
409 | - Verify accessibility (not private/banned)
410 | - Get detailed subreddit information
411 | - Suggest alternatives if invalid
412 | 
413 | **Tools**:
414 | ```python
415 | @function_tool
416 | def validate_subreddit(wrapper: RunContextWrapper[DiscoveryContext],
417 |                        subreddit_name: str) -> ValidationResult:
418 |     """Validate if subreddit exists and is accessible."""
419 |     # Implementation from current discover.py
420 |     
421 | @function_tool
422 | def get_subreddit_info(wrapper: RunContextWrapper[DiscoveryContext],
423 |                        subreddit_name: str) -> SubredditInfo:
424 |     """Get detailed information about a subreddit."""
425 |     # Fetch all metadata
426 | ```
427 | 
428 | **Output Type**:
429 | ```python
430 | class ValidationResult(BaseModel):
431 |     valid: bool
432 |     name: str
433 |     reason: Optional[str]
434 |     info: Optional[SubredditInfo]
435 |     suggestions: List[str]
436 | ```
437 | 
438 | **Model**: `gpt-4o-mini` (simple validation)
439 | 
440 | ### 7. Result Synthesizer Agent
441 | **File**: `agents/synthesizer.py`
442 | 
443 | **Purpose**: Synthesizes and formats final discovery results.
444 | 
445 | **Why This Agent?**
446 | Raw scored results need intelligent synthesis to be truly useful. The Result Synthesizer transforms data into actionable insights. This agent provides:
447 | - **Intelligent Summarization**: Creates meaningful summaries from result patterns
448 | - **Actionable Recommendations**: Suggests next steps based on results
449 | - **Flexible Formatting**: Adapts output format to use case
450 | - **Insight Generation**: Identifies patterns and relationships in results
451 | 
452 | **Architectural Role**:
453 | - **Final Processor**: Last agent before results return to user
454 | - **Insight Generator**: Transforms data into understanding
455 | - **Format Adapter**: Ensures results match expected output format
456 | - **Recommendation Engine**: Provides actionable next steps
457 | 
458 | **Problem Solved**:
459 | The monolithic approach mixes result formatting throughout the code, making it hard to maintain consistent output or add new insights. This agent centralizes all presentation logic.
460 | 
461 | **Key Interactions**:
462 | - **Receives From**: Subreddit Scorer or Batch Manager (scored/aggregated results)
463 | - **Synthesizes**: Raw data into formatted, insightful output
464 | - **Generates**: Summaries, recommendations, and metadata
465 | - **Returns**: Final formatted results to the orchestrator
466 | 
467 | **Key Responsibilities**:
468 | - Format results for presentation
469 | - Generate summaries and insights
470 | - Create recommendations
471 | - Add metadata and next actions
472 | 
473 | **Tools**:
474 | ```python
475 | @function_tool
476 | def format_results(wrapper: RunContextWrapper[DiscoveryContext],
477 |                   results: List[ScoredSubreddit]) -> FormattedResults:
478 |     """Format results for final output."""
479 |     # Structure for easy consumption
480 |     
481 | @function_tool
482 | def generate_recommendations(wrapper: RunContextWrapper[DiscoveryContext],
483 |                             results: FormattedResults) -> List[str]:
484 |     """Generate actionable recommendations."""
485 |     # Next steps, additional searches, etc.
486 | ```
487 | 
488 | **Output Type**:
489 | ```python
490 | class DiscoveryOutput(BaseModel):
491 |     results: List[FormattedSubreddit]
492 |     summary: DiscoverySummary
493 |     recommendations: List[str]
494 |     metadata: DiscoveryMetadata
495 | ```
496 | 
497 | **Model**: `gpt-4o` (synthesis and insights)
498 | 
499 | ## Agent Collaboration Workflow
500 | 
501 | ### Example: Complex Query Discovery
502 | 
503 | When a user searches for "machine learning communities for beginners":
504 | 
505 | 1. **Discovery Orchestrator** receives request, identifies complexity, routes to Query Analyzer
506 | 2. **Query Analyzer** extracts keywords ["machine learning", "beginners", "ML", "learn"], expands query, identifies intent as "topic_search"
507 | 3. **Search Executor** runs enhanced searches for each term variation
508 | 4. **Subreddit Scorer** ranks results, penalizing advanced communities, boosting beginner-friendly ones
509 | 5. **Result Synthesizer** formats top results with recommendations for getting started
510 | 
511 | ### Example: Batch Validation
512 | 
513 | When validating multiple subreddit names ["r/python", "r/datascience", "r/doesnotexist"]:
514 | 
515 | 1. **Discovery Orchestrator** identifies validation request, routes to Batch Manager
516 | 2. **Batch Manager** spawns three parallel Validator agents
517 | 3. **Validators** check each subreddit simultaneously
518 | 4. **Result Synthesizer** aggregates validation results, suggests alternatives for invalid entries
519 | 
520 | ## Shared Models and Context
521 | 
522 | ### Discovery Context
523 | **File**: `models/discovery_context.py`
524 | 
525 | ```python
526 | from dataclasses import dataclass
527 | import praw
528 | from typing import Dict, Any, Optional
529 | 
530 | @dataclass
531 | class DiscoveryContext:
532 |     reddit_client: praw.Reddit
533 |     query_metadata: Optional[QueryMetadata] = None
534 |     discovery_config: DiscoveryConfig = field(default_factory=DiscoveryConfig)
535 |     api_call_counter: int = 0
536 |     cache: Dict[str, Any] = field(default_factory=dict)
537 |     
538 | @dataclass
539 | class QueryMetadata:
540 |     original_query: str
541 |     intent: str
542 |     timestamp: float
543 |     user_preferences: Dict[str, Any]
544 |     
545 | @dataclass
546 | class DiscoveryConfig:
547 |     include_nsfw: bool = False
548 |     max_api_calls: int = 10
549 |     cache_ttl: int = 300
550 |     default_limit: int = 10
551 | ```
552 | 
553 | ### Discovery Models
554 | **File**: `models/discovery_models.py`
555 | 
556 | ```python
557 | from pydantic import BaseModel
558 | from typing import List, Dict, Optional, Literal
559 | 
560 | class QueryIntent(BaseModel):
561 |     type: Literal["topic_search", "community_search", "validation", "batch"]
562 |     confidence: float
563 |     
564 | class RawSubreddit(BaseModel):
565 |     name: str
566 |     title: str
567 |     description: str
568 |     subscribers: int
569 |     over_18: bool
570 |     created_utc: float
571 |     url: str
572 |     
573 |     @classmethod
574 |     def from_praw(cls, subreddit):
575 |         """Create from PRAW subreddit object."""
576 |         return cls(
577 |             name=subreddit.display_name,
578 |             title=subreddit.title,
579 |             description=subreddit.public_description[:100],
580 |             subscribers=subreddit.subscribers,
581 |             over_18=subreddit.over18,
582 |             created_utc=subreddit.created_utc,
583 |             url=f"https://reddit.com/r/{subreddit.display_name}"
584 |         )
585 | 
586 | class ConfidenceScore(BaseModel):
587 |     overall: float
588 |     name_match: float
589 |     description_match: float
590 |     activity_score: float
591 |     penalties: float
592 |     
593 | class DiscoverySummary(BaseModel):
594 |     total_found: int
595 |     returned: int
596 |     coverage: Literal["comprehensive", "good", "partial", "limited"]
597 |     top_by_confidence: List[str]
598 |     confidence_distribution: Dict[str, int]
599 | ```
600 | 
601 | ## Main Entry Point
602 | 
603 | ### Discover Agent Tool
604 | **File**: `tools/discover_agent.py`
605 | 
606 | ```python
607 | from agents import Agent, Runner
608 | from src.models.discovery_context import DiscoveryContext
609 | from src.agents import discovery_orchestrator
610 | import praw
611 | 
612 | async def discover_subreddits_agent(
613 |     query: Optional[str] = None,
614 |     queries: Optional[List[str]] = None,
615 |     reddit: praw.Reddit = None,
616 |     limit: int = 10,
617 |     include_nsfw: bool = False
618 | ) -> DiscoveryOutput:
619 |     """
620 |     Agentic version of discover_subreddits using OpenAI Agents SDK.
621 |     
622 |     Maintains backward compatibility with existing interface.
623 |     """
624 |     # Initialize context
625 |     context = DiscoveryContext(
626 |         reddit_client=reddit,
627 |         discovery_config=DiscoveryConfig(
628 |             include_nsfw=include_nsfw,
629 |             default_limit=limit
630 |         )
631 |     )
632 |     
633 |     # Prepare input
634 |     if queries:
635 |         input_text = f"Batch discovery for queries: {queries}"
636 |     else:
637 |         input_text = f"Discover subreddits for: {query}"
638 |     
639 |     # Run discovery through orchestrator
640 |     result = await Runner.run(
641 |         starting_agent=discovery_orchestrator,
642 |         input=input_text,
643 |         context=context,
644 |         run_config=RunConfig(
645 |             max_turns=20,
646 |             workflow_name="Reddit Discovery",
647 |             trace_metadata={"query": query or queries}
648 |         )
649 |     )
650 |     
651 |     return result.final_output
652 | ```
653 | 
654 | ## Implementation Strategy
655 | 
656 | ### Phase 1: Foundation (Week 1)
657 | 1. Set up project structure and dependencies
658 | 2. Create base models and context objects
659 | 3. Implement Search Executor and Validator agents
660 | 4. Basic integration tests
661 | 
662 | ### Phase 2: Core Agents (Week 2)
663 | 1. Implement Query Analyzer with NLP tools
664 | 2. Create Subreddit Scorer with confidence metrics
665 | 3. Build Result Synthesizer
666 | 4. Add comprehensive testing
667 | 
668 | ### Phase 3: Orchestration (Week 3)
669 | 1. Implement Discovery Orchestrator with routing logic
670 | 2. Create Batch Manager for parallel execution
671 | 3. Add handoff patterns and error handling
672 | 4. Integration with existing MCP server
673 | 
674 | ### Phase 4: Optimization (Week 4)
675 | 1. Add caching layer
676 | 2. Optimize model selection per agent
677 | 3. Implement tracing and monitoring
678 | 4. Performance testing and tuning
679 | 
680 | ## Benefits Over Current Implementation
681 | 
682 | 1. **Modularity**: Each agent is independent and focused
683 | 2. **Scalability**: Easy to add new discovery strategies
684 | 3. **Observability**: Full tracing of decision process
685 | 4. **Testability**: Each agent can be unit tested
686 | 5. **Flexibility**: Agents can be reused in different workflows
687 | 6. **Performance**: Parallel execution and smart caching
688 | 7. **Maintainability**: Clear separation of concerns
689 | 
690 | ## Migration Path
691 | 
692 | 1. **Parallel Development**: Build new system alongside existing
693 | 2. **Feature Flag**: Toggle between old and new implementation
694 | 3. **Gradual Rollout**: Test with subset of queries first
695 | 4. **Backward Compatible**: Same interface as current discover.py
696 | 5. **Monitoring**: Compare results between old and new
697 | 
698 | ## Testing Strategy
699 | 
700 | ### Unit Tests
701 | - Each agent tested independently
702 | - Mock Reddit client and context
703 | - Test all tools and handoffs
704 | 
705 | ### Integration Tests
706 | - End-to-end discovery workflows
707 | - Multiple query types
708 | - Error scenarios
709 | 
710 | ### Performance Tests
711 | - API call optimization
712 | - Caching effectiveness
713 | - Parallel execution benefits
714 | 
715 | ## Monitoring and Observability
716 | 
717 | 1. **Tracing**: Full agent decision tree
718 | 2. **Metrics**: API calls, latency, cache hits
719 | 3. **Logging**: Structured logs per agent
720 | 4. **Debugging**: Replay agent conversations
721 | 
722 | ## Future Enhancements
723 | 
724 | 1. **Learning**: Agents improve from feedback
725 | 2. **Personalization**: User-specific discovery preferences
726 | 3. **Advanced NLP**: Better query understanding
727 | 4. **Community Graph**: Relationship mapping between subreddits
728 | 5. **Trend Detection**: Identify emerging communities
729 | 
730 | ## Conclusion
731 | 
732 | This agentic architecture transforms the monolithic discover.py into a flexible, scalable system of specialized agents. Each agent excels at its specific task while the orchestrator ensures optimal routing and efficiency. The result is a more maintainable, testable, and powerful discovery system that can evolve with changing requirements.
```