This is page 2 of 3. Use http://codebase.md/king-of-the-grackles/reddit-mcp-poc?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .env.sample ├── .gemini │ └── settings.json ├── .gitignore ├── .python-version ├── .specify │ ├── memory │ │ └── constitution.md │ ├── scripts │ │ └── bash │ │ ├── check-implementation-prerequisites.sh │ │ ├── check-task-prerequisites.sh │ │ ├── common.sh │ │ ├── create-new-feature.sh │ │ ├── get-feature-paths.sh │ │ ├── setup-plan.sh │ │ └── update-agent-context.sh │ └── templates │ ├── agent-file-template.md │ ├── plan-template.md │ ├── spec-template.md │ └── tasks-template.md ├── package.json ├── pyproject.toml ├── README.md ├── reddit-research-agent.md ├── reports │ ├── ai-llm-weekly-trends-reddit-analysis-2025-01-20.md │ ├── saas-solopreneur-reddit-communities.md │ ├── top-50-active-AI-subreddits.md │ ├── top-50-subreddits-saas-ai-builders.md │ └── top-50-subreddits-saas-solopreneurs.md ├── server.json ├── specs │ ├── 003-fastmcp-context-integration.md │ ├── 003-implementation-summary.md │ ├── 003-phase-1-context-integration.md │ ├── 003-phase-2-progress-monitoring.md │ ├── agent-reasoning-visibility.md │ ├── agentic-discovery-architecture.md │ ├── chroma-proxy-architecture.md │ ├── deep-research-reddit-architecture.md │ └── reddit-research-agent-spec.md ├── src │ ├── __init__.py │ ├── chroma_client.py │ ├── config.py │ ├── models.py │ ├── resources.py │ ├── server.py │ └── tools │ ├── __init__.py │ ├── comments.py │ ├── discover.py │ ├── posts.py │ └── search.py ├── tests │ ├── test_context_integration.py │ └── test_tools.py └── uv.lock ``` # Files -------------------------------------------------------------------------------- /reports/top-50-subreddits-saas-solopreneurs.md: -------------------------------------------------------------------------------- ```markdown 1 | # Top 50 Subreddits for SaaS Startup Founders & Solopreneurs 2 | 3 | *Research Date: 2025-09-20* 4 | *Generated using Reddit MCP Server with semantic vector search* 5 | 6 | ## Executive Summary 7 | 8 | This focused report identifies the top 50 Reddit communities specifically for **SaaS startup founders** and **solopreneurs**. These communities were selected based on: 9 | - Direct relevance to SaaS business models 10 | - Solo entrepreneurship focus 11 | - Bootstrapped/self-funded business approaches 12 | - Active engagement levels 13 | - Community quality and support culture 14 | 15 | ## Top 50 Subreddits - Ranked by Relevance 16 | 17 | ### 🎯 Tier 1: Must-Join Communities (Confidence > 0.8) 18 | *These are your highest-priority communities with direct ICP alignment* 19 | 20 | 1. **r/SaaS** - 374,943 subscribers | Confidence: 0.892 21 | - The primary SaaS community on Reddit 22 | - Topics: pricing, growth, tech stack, customer acquisition 23 | - https://reddit.com/r/SaaS 24 | 25 | 2. **r/indiehackers** - 105,674 subscribers | Confidence: 0.867 26 | - Solo founders and bootstrappers building profitable businesses 27 | - Strong focus on MRR milestones and transparency 28 | - https://reddit.com/r/indiehackers 29 | 30 | 3. **r/SoloFounders** - 2,113 subscribers | Confidence: 0.832 31 | - Dedicated community for solo entrepreneurs 32 | - Intimate setting for peer support and advice 33 | - https://reddit.com/r/SoloFounders 34 | 35 | ### 🚀 Tier 2: Core Communities (Confidence 0.7 - 0.8) 36 | 37 | 4. **r/startups** - 1,891,655 subscribers | Confidence: 0.729 38 | - Massive startup ecosystem community 39 | - Mix of bootstrapped and funded startups 40 | - https://reddit.com/r/startups 41 | 42 | 5. **r/SaaSy** - 3,150 subscribers | Confidence: 0.722 43 | - Focused SaaS discussions and case studies 44 | - https://reddit.com/r/SaaSy 45 | 46 | 6. **r/EntrepreneurRideAlong** - 604,396 subscribers | Confidence: 0.712 47 | - Document your entrepreneurial journey 48 | - Great for building in public 49 | - https://reddit.com/r/EntrepreneurRideAlong 50 | 51 | 7. **r/venturecapital** - 66,268 subscribers | Confidence: 0.721 52 | - Useful even for bootstrappers to understand funding landscape 53 | - https://reddit.com/r/venturecapital 54 | 55 | 8. **r/Entrepreneurs** - 77,330 subscribers | Confidence: 0.777 56 | - Active entrepreneur community with quality discussions 57 | - https://reddit.com/r/Entrepreneurs 58 | 59 | ### 💼 Tier 3: High-Value Communities (Confidence 0.6 - 0.7) 60 | 61 | 9. **r/Entrepreneur** - 4,871,109 subscribers | Confidence: 0.664 62 | - Largest entrepreneurship community 63 | - https://reddit.com/r/Entrepreneur 64 | 65 | 10. **r/EntrepreneurConnect** - 5,178 subscribers | Confidence: 0.691 66 | - Networking and collaboration focus 67 | - https://reddit.com/r/EntrepreneurConnect 68 | 69 | 11. **r/kickstarter** - 93,932 subscribers | Confidence: 0.658 70 | - Product launches and crowdfunding strategies 71 | - https://reddit.com/r/kickstarter 72 | 73 | 12. **r/small_business_ideas** - 23,034 subscribers | Confidence: 0.631 74 | - Idea validation and feedback 75 | - https://reddit.com/r/small_business_ideas 76 | 77 | 13. **r/Entrepreneurship** - 99,462 subscribers | Confidence: 0.619 78 | - Business strategy and growth discussions 79 | - https://reddit.com/r/Entrepreneurship 80 | 81 | ### 📊 Tier 4: Specialized Communities (Confidence 0.5 - 0.6) 82 | 83 | 14. **r/Business_Ideas** - 370,194 subscribers | Confidence: 0.521 84 | - Brainstorming and validating business concepts 85 | - https://reddit.com/r/Business_Ideas 86 | 87 | 15. **r/startup** - 225,696 subscribers | Confidence: 0.529 88 | - Startup ecosystem and resources 89 | - https://reddit.com/r/startup 90 | 91 | 16. **r/NoCodeSaaS** - 23,297 subscribers | Confidence: 0.329* 92 | - Building SaaS without coding 93 | - Perfect for non-technical founders 94 | - https://reddit.com/r/NoCodeSaaS 95 | 96 | 17. **r/Affiliatemarketing** - 239,731 subscribers | Confidence: 0.537 97 | - Revenue strategies for SaaS 98 | - https://reddit.com/r/Affiliatemarketing 99 | 100 | 18. **r/OnlineIncomeHustle** - 34,382 subscribers | Confidence: 0.517 101 | - Online business strategies 102 | - https://reddit.com/r/OnlineIncomeHustle 103 | 104 | 19. **r/SmallBusinessOwners** - 4,081 subscribers | Confidence: 0.501 105 | - Peer support for business owners 106 | - https://reddit.com/r/SmallBusinessOwners 107 | 108 | 20. **r/selfpublish** - 196,096 subscribers | Confidence: 0.483 109 | - Content creation and info products 110 | - https://reddit.com/r/selfpublish 111 | 112 | ### 🌍 Tier 5: Regional & Niche Communities 113 | 114 | 21. **r/indianstartups** - 76,422 subscribers | Confidence: 0.505 115 | - Indian startup ecosystem 116 | - https://reddit.com/r/indianstartups 117 | 118 | 22. **r/StartUpIndia** - 361,780 subscribers | Confidence: 0.432 119 | - Large Indian entrepreneur community 120 | - https://reddit.com/r/StartUpIndia 121 | 122 | 23. **r/IndianEntrepreneur** - 9,816 subscribers | Confidence: 0.446 123 | - Indian entrepreneur discussions 124 | - https://reddit.com/r/IndianEntrepreneur 125 | 126 | 24. **r/PhStartups** - 20,901 subscribers | Confidence: 0.359 127 | - Philippines startup community 128 | - https://reddit.com/r/PhStartups 129 | 130 | 25. **r/Startups_EU** - 2,894 subscribers | Confidence: 0.314 131 | - European startup ecosystem 132 | - https://reddit.com/r/Startups_EU 133 | 134 | ### 🛠️ Tier 6: Supporting Communities 135 | 136 | 26. **r/advancedentrepreneur** - 60,964 subscribers | Confidence: 0.464 137 | - For experienced entrepreneurs 138 | - https://reddit.com/r/advancedentrepreneur 139 | 140 | 27. **r/cofounderhunt** - 16,287 subscribers | Confidence: 0.456 141 | - Finding co-founders and team members 142 | - https://reddit.com/r/cofounderhunt 143 | 144 | 28. **r/sweatystartup** - 182,854 subscribers | Confidence: 0.432 145 | - Service businesses and local startups 146 | - https://reddit.com/r/sweatystartup 147 | 148 | 29. **r/ycombinator** - 139,403 subscribers | Confidence: 0.433 149 | - YC ecosystem and accelerator insights 150 | - https://reddit.com/r/ycombinator 151 | 152 | 30. **r/sidehustle** - 3,124,834 subscribers | Confidence: 0.486 153 | - Side projects that can become SaaS 154 | - https://reddit.com/r/sidehustle 155 | 156 | ### 💰 Tier 7: Business & Revenue Focus 157 | 158 | 31. **r/passive_income** - 851,987 subscribers | Confidence: 0.422 159 | - Building recurring revenue streams 160 | - https://reddit.com/r/passive_income 161 | 162 | 32. **r/SaaS_Email_Marketing** - 7,434 subscribers | Confidence: 0.465 163 | - Email marketing for SaaS 164 | - https://reddit.com/r/SaaS_Email_Marketing 165 | 166 | 33. **r/SocialMediaMarketing** - 197,241 subscribers | Confidence: 0.419 167 | - Marketing strategies for SaaS 168 | - https://reddit.com/r/SocialMediaMarketing 169 | 170 | 34. **r/equity_crowdfunding** - 3,112 subscribers | Confidence: 0.473 171 | - Alternative funding options 172 | - https://reddit.com/r/equity_crowdfunding 173 | 174 | 35. **r/AiForSmallBusiness** - 8,963 subscribers | Confidence: 0.378 175 | - AI tools for solopreneurs 176 | - https://reddit.com/r/AiForSmallBusiness 177 | 178 | ### 🎨 Tier 8: Creative & Indie Communities 179 | 180 | 36. **r/IndieGaming** - 412,025 subscribers | Confidence: 0.453 181 | - Indie game dev (similar mindset to SaaS) 182 | - https://reddit.com/r/IndieGaming 183 | 184 | 37. **r/IndieDev** - 295,248 subscribers | Confidence: 0.383 185 | - Independent development community 186 | - https://reddit.com/r/IndieDev 187 | 188 | 38. **r/PassionsToProfits** - 4,905 subscribers | Confidence: 0.468 189 | - Monetizing expertise 190 | - https://reddit.com/r/PassionsToProfits 191 | 192 | 39. **r/LawFirm** - 84,044 subscribers | Confidence: 0.437 193 | - Legal aspects of running a business 194 | - https://reddit.com/r/LawFirm 195 | 196 | 40. **r/Fiverr** - 64,568 subscribers | Confidence: 0.489 197 | - Freelancing and service offerings 198 | - https://reddit.com/r/Fiverr 199 | 200 | ### 🌐 Tier 9: Broader Business Communities 201 | 202 | 41. **r/smallbusiness** - 2,211,156 subscribers | Confidence: 0.345 203 | - General small business discussions 204 | - https://reddit.com/r/smallbusiness 205 | 206 | 42. **r/business** - 2,498,385 subscribers | Confidence: 0.457 207 | - Broad business topics 208 | - https://reddit.com/r/business 209 | 210 | 43. **r/smallbusinessUS** - 4,886 subscribers | Confidence: 0.464 211 | - US-focused small business 212 | - https://reddit.com/r/smallbusinessUS 213 | 214 | 44. **r/WholesaleRealestate** - 28,356 subscribers | Confidence: 0.447 215 | - Business model discussions 216 | - https://reddit.com/r/WholesaleRealestate 217 | 218 | 45. **r/selbststaendig** - 38,000 subscribers | Confidence: 0.364 219 | - German solopreneur community 220 | - https://reddit.com/r/selbststaendig 221 | 222 | ### 🔧 Tier 10: Tools & Resources 223 | 224 | 46. **r/YouTube_startups** - 127,440 subscribers | Confidence: 0.369 225 | - Content marketing for startups 226 | - https://reddit.com/r/YouTube_startups 227 | 228 | 47. **r/OnlineMarketing** - 3,744 subscribers | Confidence: 0.396 229 | - Digital marketing strategies 230 | - https://reddit.com/r/OnlineMarketing 231 | 232 | 48. **r/Businessideas** - 22,137 subscribers | Confidence: 0.389 233 | - Idea generation and validation 234 | - https://reddit.com/r/Businessideas 235 | 236 | 49. **r/BusinessVault** - 2,889 subscribers | Confidence: 0.348 237 | - Business resources and tools 238 | - https://reddit.com/r/BusinessVault 239 | 240 | 50. **r/simpleliving** - 1,447,715 subscribers | Confidence: 0.415 241 | - Lifestyle design for solopreneurs 242 | - https://reddit.com/r/simpleliving 243 | 244 | ## 🎯 Engagement Strategy for SaaS Founders & Solopreneurs 245 | 246 | ### Quick Start Guide 247 | 1. **Join Top 5 First:** 248 | - r/SaaS (primary community) 249 | - r/indiehackers (building in public) 250 | - r/SoloFounders (peer support) 251 | - r/startups (broad exposure) 252 | - r/EntrepreneurRideAlong (journey sharing) 253 | 254 | 2. **Weekly Engagement Plan:** 255 | - **Monday**: Share wins/milestones in r/EntrepreneurRideAlong 256 | - **Tuesday**: Ask for feedback in r/SaaS 257 | - **Wednesday**: Help others in r/indiehackers 258 | - **Thursday**: Network in r/SoloFounders 259 | - **Friday**: Share learnings in r/startups 260 | 261 | 3. **Content Types That Work:** 262 | - Case studies with real numbers (MRR, growth rates) 263 | - "How I built..." technical posts 264 | - Pricing strategy discussions 265 | - Tool stack reveals 266 | - Failure stories and lessons learned 267 | 268 | ### Community-Specific Tips 269 | 270 | **For r/SaaS:** 271 | - Share MRR milestones 272 | - Discuss pricing strategies 273 | - Ask about tech stack decisions 274 | - Share customer acquisition costs 275 | 276 | **For r/indiehackers:** 277 | - Be transparent about revenue 278 | - Document your journey 279 | - Share both wins and failures 280 | - Engage with other builders 281 | 282 | **For r/SoloFounders:** 283 | - Focus on work-life balance 284 | - Share productivity tips 285 | - Discuss delegation strategies 286 | - Mental health and burnout prevention 287 | 288 | ## 📊 Key Metrics to Track 289 | 290 | 1. **Engagement Quality**: Comments > Upvotes 291 | 2. **Connection Building**: DMs from relevant founders 292 | 3. **Traffic Generation**: Clicks to your product 293 | 4. **Brand Recognition**: Mentions in other threads 294 | 5. **Value Created**: Problems solved for others 295 | 296 | ## ⚠️ Common Mistakes to Avoid 297 | 298 | 1. **Over-promotion**: Follow 9:1 rule (9 value posts : 1 promotional) 299 | 2. **Generic content**: Tailor posts to each community's culture 300 | 3. **Ignoring rules**: Each subreddit has specific posting guidelines 301 | 4. **Not engaging**: Don't just post and leave 302 | 5. **Being inauthentic**: Genuine interactions build trust 303 | 304 | ## 🚀 Next Steps 305 | 306 | 1. **Week 1**: Join top 10 communities, observe culture 307 | 2. **Week 2**: Start engaging with comments 308 | 3. **Week 3**: Make first posts in top 3 communities 309 | 4. **Week 4**: Analyze what resonates, adjust strategy 310 | 5. **Month 2+**: Scale successful approaches 311 | 312 | --- 313 | 314 | *Note: This report focuses specifically on communities relevant to SaaS founders and solopreneurs. Confidence scores reflect semantic relevance to these specific ICPs. Community dynamics change, so regular monitoring is recommended.* 315 | 316 | *Strategy Tip: Focus on depth over breadth - better to be highly active in 5-10 communities than sporadically active in 50.* ``` -------------------------------------------------------------------------------- /specs/003-implementation-summary.md: -------------------------------------------------------------------------------- ```markdown 1 | # FastMCP Context API Implementation Summary 2 | 3 | **Status:** ✅ Complete 4 | **Date:** 2025-10-02 5 | **Phases Completed:** Phase 1 (Context Integration) + Phase 2 (Progress Monitoring) 6 | 7 | ## Overview 8 | 9 | This document summarizes the completed implementation of FastMCP's Context API integration into the Reddit MCP server. The implementation was completed in two phases and enables real-time progress reporting for long-running Reddit operations. 10 | 11 | ## Phase 1: Context Integration (Complete ✅) 12 | 13 | ### Goal 14 | Integrate FastMCP's `Context` parameter into all tool and operation functions to enable future context-aware features. 15 | 16 | ### Implementation Details 17 | 18 | **Scope:** All MCP tool functions and Reddit operation functions now accept `Context` as a parameter. 19 | 20 | #### Functions Updated 21 | - ✅ `discover_subreddits()` - Subreddit discovery via vector search 22 | - ✅ `search_in_subreddit()` - Search within specific subreddit 23 | - ✅ `fetch_subreddit_posts()` - Fetch posts from single subreddit 24 | - ✅ `fetch_multiple_subreddits()` - Batch fetch from multiple subreddits 25 | - ✅ `fetch_submission_with_comments()` - Fetch post with comment tree 26 | - ✅ `validate_subreddit()` - Validate subreddit exists in index 27 | - ✅ `_search_vector_db()` - Internal vector search helper 28 | - ✅ `parse_comment_tree()` - Internal comment parsing helper 29 | 30 | #### MCP Layer Functions 31 | - ✅ `discover_operations()` - Layer 1: Discovery 32 | - ✅ `get_operation_schema()` - Layer 2: Schema 33 | - ✅ `execute_operation()` - Layer 3: Execution 34 | 35 | ### Test Coverage 36 | - **8 integration tests** verifying context parameter acceptance 37 | - All tests verify functions accept `Context` without errors 38 | - Context parameter can be positioned anywhere in function signature 39 | 40 | ### Files Modified (Phase 1) 41 | 1. `src/tools/discover.py` - Added `ctx: Context = None` to all functions 42 | 2. `src/tools/search.py` - Added context parameter 43 | 3. `src/tools/posts.py` - Added context parameter 44 | 4. `src/tools/comments.py` - Added context parameter and forwarding 45 | 5. `src/server.py` - Updated MCP tools to accept and forward context 46 | 6. `tests/test_context_integration.py` - Created comprehensive test suite 47 | 48 | --- 49 | 50 | ## Phase 2: Progress Monitoring (Complete ✅) 51 | 52 | ### Goal 53 | Add real-time progress reporting to long-running Reddit operations using `ctx.report_progress()`. 54 | 55 | ### Implementation Details 56 | 57 | **Scope:** Three primary long-running operations now emit progress events. 58 | 59 | #### Operation 1: `discover_subreddits` - Vector Search Progress 60 | 61 | **File:** `src/tools/discover.py` 62 | 63 | **Progress Events:** 64 | - Reports progress for each subreddit analyzed during vector search 65 | - **Message Format:** `"Analyzing r/{subreddit_name}"` 66 | - **Frequency:** 10-100 events depending on `limit` parameter 67 | - **Progress Values:** `progress=i+1, total=total_results` 68 | 69 | **Implementation:** 70 | ```python 71 | async def _search_vector_db(...): 72 | total_results = len(results['metadatas'][0]) 73 | for i, (metadata, distance) in enumerate(...): 74 | if ctx: 75 | await ctx.report_progress( 76 | progress=i + 1, 77 | total=total_results, 78 | message=f"Analyzing r/{metadata.get('name', 'unknown')}" 79 | ) 80 | ``` 81 | 82 | #### Operation 2: `fetch_multiple_subreddits` - Batch Fetch Progress 83 | 84 | **File:** `src/tools/posts.py` 85 | 86 | **Progress Events:** 87 | - Reports progress when encountering each new subreddit 88 | - **Message Format:** `"Fetching r/{subreddit_name}"` 89 | - **Frequency:** 1-10 events (one per unique subreddit) 90 | - **Progress Values:** `progress=len(processed), total=len(subreddit_names)` 91 | 92 | **Implementation:** 93 | ```python 94 | async def fetch_multiple_subreddits(...): 95 | processed_subreddits = set() 96 | for submission in submissions: 97 | subreddit_name = submission.subreddit.display_name 98 | if subreddit_name not in processed_subreddits: 99 | processed_subreddits.add(subreddit_name) 100 | if ctx: 101 | await ctx.report_progress( 102 | progress=len(processed_subreddits), 103 | total=len(clean_names), 104 | message=f"Fetching r/{subreddit_name}" 105 | ) 106 | ``` 107 | 108 | #### Operation 3: `fetch_submission_with_comments` - Comment Tree Progress 109 | 110 | **File:** `src/tools/comments.py` 111 | 112 | **Progress Events:** 113 | - Reports progress during comment loading 114 | - Final completion message when done 115 | - **Message Format:** 116 | - During: `"Loading comments ({count}/{limit})"` 117 | - Complete: `"Completed: {count} comments loaded"` 118 | - **Frequency:** 5-100+ events depending on `comment_limit` 119 | - **Progress Values:** `progress=comment_count, total=comment_limit` 120 | 121 | **Implementation:** 122 | ```python 123 | async def fetch_submission_with_comments(...): 124 | for top_level_comment in submission.comments: 125 | if ctx: 126 | await ctx.report_progress( 127 | progress=comment_count, 128 | total=comment_limit, 129 | message=f"Loading comments ({comment_count}/{comment_limit})" 130 | ) 131 | # ... process comment 132 | 133 | # Final completion 134 | if ctx: 135 | await ctx.report_progress( 136 | progress=comment_count, 137 | total=comment_limit, 138 | message=f"Completed: {comment_count} comments loaded" 139 | ) 140 | ``` 141 | 142 | ### Async/Await Changes 143 | 144 | All three operations are now **async functions**: 145 | - ✅ `discover_subreddits()` → `async def discover_subreddits()` 146 | - ✅ `fetch_multiple_subreddits()` → `async def fetch_multiple_subreddits()` 147 | - ✅ `fetch_submission_with_comments()` → `async def fetch_submission_with_comments()` 148 | - ✅ `execute_operation()` → `async def execute_operation()` (conditionally awaits async operations) 149 | 150 | ### Test Coverage 151 | 152 | **New Test Classes (Phase 2):** 153 | 1. `TestDiscoverSubredditsProgress` - Verifies progress during vector search 154 | 2. `TestFetchMultipleProgress` - Verifies progress per subreddit 155 | 3. `TestFetchCommentsProgress` - Verifies progress during comment loading 156 | 157 | **Test Assertions:** 158 | - ✅ Progress called minimum expected times (based on data) 159 | - ✅ Progress includes `progress` and `total` parameters 160 | - ✅ AsyncMock properly configured for async progress calls 161 | 162 | **Total Test Results:** 18 tests, all passing ✅ 163 | 164 | ### Files Modified (Phase 2) 165 | 1. `src/tools/discover.py` - Made async, added progress reporting 166 | 2. `src/tools/posts.py` - Made async, added progress reporting 167 | 3. `src/tools/comments.py` - Made async, added progress reporting 168 | 4. `src/tools/search.py` - No changes (operation too fast for progress) 169 | 5. `src/server.py` - Made `execute_operation()` async with conditional await 170 | 6. `tests/test_context_integration.py` - Added 3 progress test classes 171 | 7. `tests/test_tools.py` - Updated 3 tests to handle async functions 172 | 8. `pyproject.toml` - Added pytest asyncio configuration 173 | 174 | --- 175 | 176 | ## Current MCP Server Capabilities 177 | 178 | ### Context API Support 179 | 180 | **All operations support:** 181 | - ✅ Context parameter injection via FastMCP 182 | - ✅ Progress reporting during long operations 183 | - ✅ Future-ready for logging, sampling, and other context features 184 | 185 | ### Progress Reporting Patterns 186 | 187 | **For Frontend/Client Implementation:** 188 | 189 | 1. **Vector Search (discover_subreddits)** 190 | - Progress updates: Every result analyzed 191 | - Typical range: 10-100 progress events 192 | - Pattern: Sequential 1→2→3→...→total 193 | - Message: Subreddit name being analyzed 194 | 195 | 2. **Multi-Subreddit Fetch (fetch_multiple)** 196 | - Progress updates: Each new subreddit encountered 197 | - Typical range: 1-10 progress events 198 | - Pattern: Incremental as new subreddits found 199 | - Message: Subreddit name being fetched 200 | 201 | 3. **Comment Tree Loading (fetch_comments)** 202 | - Progress updates: Each comment + final completion 203 | - Typical range: 5-100+ progress events 204 | - Pattern: Sequential with completion message 205 | - Message: Comment count progress 206 | 207 | ### FastMCP Progress API Specification 208 | 209 | **Progress Call Signature:** 210 | ```python 211 | await ctx.report_progress( 212 | progress: float, # Current progress value 213 | total: float, # Total expected (enables percentage) 214 | message: str # Optional descriptive message 215 | ) 216 | ``` 217 | 218 | **Client Requirements:** 219 | - Clients must send `progressToken` in initial request to receive updates 220 | - If no token provided, progress calls have no effect (won't error) 221 | - Progress events sent as MCP notifications during operation execution 222 | 223 | --- 224 | 225 | ## Integration Notes for Frontend Agent 226 | 227 | ### Expected Behavior 228 | 229 | 1. **Progress Events are Optional** 230 | - Operations work without progress tracking 231 | - Progress enhances UX but isn't required for functionality 232 | 233 | 2. **Async Operation Handling** 234 | - All three operations are async and must be awaited 235 | - `execute_operation()` properly handles both sync and async operations 236 | 237 | 3. **Message Patterns** 238 | - Messages are descriptive and user-friendly 239 | - Include specific subreddit names and counts 240 | - Can be displayed directly to users 241 | 242 | ### Testing Progress Locally 243 | 244 | **To test progress reporting:** 245 | 1. Use MCP Inspector or Claude Desktop (supports progress tokens) 246 | 2. Call operations with realistic data sizes: 247 | - `discover_subreddits`: limit=20+ for visible progress 248 | - `fetch_multiple`: 3+ subreddits for multiple events 249 | - `fetch_comments`: comment_limit=50+ for visible progress 250 | 251 | ### Known Limitations 252 | 253 | 1. **Single-operation Progress Only** 254 | - No multi-stage progress across multiple operations 255 | - Each operation reports independently 256 | 257 | 2. **No Progress for Fast Operations** 258 | - `search_in_subreddit`: Too fast, no progress 259 | - `fetch_subreddit_posts`: Single subreddit, too fast 260 | 261 | 3. **Progress Granularity** 262 | - Vector search: Per-result (can be 100+ events) 263 | - Multi-fetch: Per-subreddit (typically 3-10 events) 264 | - Comments: Per-comment (can be 100+ events) 265 | 266 | --- 267 | 268 | ## Future Enhancements (Not Yet Implemented) 269 | 270 | **Phase 3: Structured Logging** (Planned) 271 | - Add `ctx.info()`, `ctx.debug()`, `ctx.warning()` calls 272 | - Log operation start/end, errors, performance metrics 273 | 274 | **Phase 4: Enhanced Error Handling** (Planned) 275 | - Better error context via `ctx.error()` 276 | - Structured error responses with recovery suggestions 277 | 278 | **Phase 5: LLM Sampling** (Planned) 279 | - Use `ctx.sample()` for AI-enhanced subreddit suggestions 280 | - Intelligent query refinement based on results 281 | 282 | --- 283 | 284 | ## API Surface Summary 285 | 286 | ### Async Operations (Require await) 287 | ```python 288 | # These are now async 289 | await discover_subreddits(query="...", ctx=ctx) 290 | await fetch_multiple_subreddits(subreddit_names=[...], reddit=client, ctx=ctx) 291 | await fetch_submission_with_comments(reddit=client, submission_id="...", ctx=ctx) 292 | await execute_operation(operation_id="...", parameters={...}, ctx=ctx) 293 | ``` 294 | 295 | ### Sync Operations (No await) 296 | ```python 297 | # These remain synchronous 298 | search_in_subreddit(subreddit_name="...", query="...", reddit=client, ctx=ctx) 299 | fetch_subreddit_posts(subreddit_name="...", reddit=client, ctx=ctx) 300 | ``` 301 | 302 | ### Progress Event Format 303 | 304 | **Client receives progress notifications:** 305 | ```json 306 | { 307 | "progress": 15, 308 | "total": 50, 309 | "message": "Analyzing r/Python" 310 | } 311 | ``` 312 | 313 | **Percentage calculation:** 314 | ```javascript 315 | const percentage = (progress / total) * 100; // 30% in example 316 | ``` 317 | 318 | --- 319 | 320 | ## Validation & Testing 321 | 322 | ### Test Suite Results 323 | - ✅ **18 total tests** (all passing) 324 | - ✅ **11 context integration tests** (8 existing + 3 new progress) 325 | - ✅ **7 tool tests** (updated for async) 326 | - ✅ No breaking changes to existing API 327 | - ✅ No performance degradation 328 | 329 | ### Manual Testing Checklist 330 | - ✅ Vector search reports progress for each result 331 | - ✅ Multi-subreddit fetch reports per subreddit 332 | - ✅ Comment loading reports progress + completion 333 | - ✅ Progress messages are descriptive 334 | - ✅ Operations work without context (graceful degradation) 335 | 336 | --- 337 | 338 | ## References 339 | 340 | - [FastMCP Context API Docs](../ai-docs/fastmcp/docs/servers/context.mdx) 341 | - [FastMCP Progress Reporting Docs](../ai-docs/fastmcp/docs/servers/progress.mdx) 342 | - [Phase 1 Spec](./003-phase-1-context-integration.md) 343 | - [Phase 2 Spec](./003-phase-2-progress-monitoring.md) 344 | - [Master Integration Spec](./003-fastmcp-context-integration.md) 345 | ``` -------------------------------------------------------------------------------- /specs/003-fastmcp-context-integration.md: -------------------------------------------------------------------------------- ```markdown 1 | # FastMCP Context Integration - Progress & Logging 2 | 3 | **Status:** Draft 4 | **Created:** 2025-10-02 5 | **Owner:** Engineering Team 6 | 7 | ## Executive Summary 8 | 9 | This specification outlines the integration of FastMCP's Context API to add progress monitoring, structured logging, and enhanced error context to the Reddit MCP server. These improvements will provide real-time visibility into server operations for debugging and user feedback. 10 | 11 | ## Background 12 | 13 | The Reddit MCP server currently lacks visibility into long-running operations. Users cannot see progress during multi-step tasks like discovering subreddits or fetching posts from multiple communities. Server-side logging and error context are not surfaced to clients, making debugging difficult. 14 | 15 | FastMCP's Context API provides built-in support for: 16 | - **Progress reporting**: `ctx.report_progress(current, total, message)` 17 | - **Structured logging**: `ctx.info()`, `ctx.warning()`, `ctx.error()` 18 | - **Error context**: Rich error information with operation details 19 | 20 | ## Goals 21 | 22 | 1. **Progress Monitoring**: Report real-time progress during multi-step operations 23 | 2. **Structured Logging**: Surface server logs to clients at appropriate severity levels 24 | 3. **Enhanced Errors**: Provide detailed error context including operation name, type, and recovery suggestions 25 | 4. **Developer Experience**: Maintain clean, testable code with minimal complexity 26 | 27 | ## Non-Goals 28 | 29 | - Frontend client implementation (separate project) 30 | - UI component development (separate project) 31 | - Metrics collection and export features 32 | - Resource access tracking 33 | - Sampling request monitoring 34 | 35 | ## Technical Design 36 | 37 | ### Phase 1: Context Integration (Days 1-2) 38 | 39 | **Objective**: Enable all tool functions to receive FastMCP Context 40 | 41 | #### Implementation Steps 42 | 43 | 1. **Update Tool Signatures** 44 | - Add required `Context` parameter to all functions in `src/tools/` 45 | - Pattern: `def tool_name(param: str, ctx: Context) -> dict:` 46 | - FastMCP automatically injects context when tools are called with `@mcp.tool` decorator 47 | 48 | 2. **Update execute_operation()** 49 | - Ensure context flows through to tool functions 50 | - No changes needed - FastMCP handles injection automatically 51 | 52 | #### Files to Modify 53 | - `src/tools/discover.py` 54 | - `src/tools/posts.py` 55 | - `src/tools/comments.py` 56 | - `src/tools/search.py` 57 | - `src/server.py` 58 | 59 | #### Code Example 60 | 61 | **Before:** 62 | ```python 63 | def discover_subreddits(query: str, limit: int = 10) -> dict: 64 | results = search_vector_db(query, limit) 65 | return {"subreddits": results} 66 | ``` 67 | 68 | **After:** 69 | ```python 70 | def discover_subreddits( 71 | query: str, 72 | limit: int = 10, 73 | ctx: Context 74 | ) -> dict: 75 | results = search_vector_db(query, limit) 76 | return {"subreddits": results} 77 | ``` 78 | 79 | ### Phase 2: Progress Monitoring (Days 3-4) 80 | 81 | **Objective**: Report progress during long-running operations 82 | 83 | #### Progress Events 84 | 85 | **discover_subreddits** - Vector search progress: 86 | ```python 87 | for i, result in enumerate(search_results): 88 | ctx.report_progress( 89 | progress=i + 1, 90 | total=limit, 91 | message=f"Analyzing r/{result.name}" 92 | ) 93 | ``` 94 | 95 | **fetch_multiple_subreddits** - Batch fetch progress: 96 | ```python 97 | for i, subreddit in enumerate(subreddit_names): 98 | ctx.report_progress( 99 | progress=i + 1, 100 | total=len(subreddit_names), 101 | message=f"Fetching r/{subreddit}" 102 | ) 103 | # Fetch posts... 104 | ``` 105 | 106 | **fetch_submission_with_comments** - Comment loading progress: 107 | ```python 108 | ctx.report_progress( 109 | progress=len(comments), 110 | total=comment_limit, 111 | message=f"Loading comments ({len(comments)}/{comment_limit})" 112 | ) 113 | ``` 114 | 115 | #### Files to Modify 116 | - `src/tools/discover.py` - Add progress during vector search iteration 117 | - `src/tools/posts.py` - Add progress per subreddit in batch operations 118 | - `src/tools/comments.py` - Add progress during comment tree traversal 119 | 120 | ### Phase 3: Structured Logging (Days 5-6) 121 | 122 | **Objective**: Surface server-side information to clients via logs 123 | 124 | #### Logging Events by Operation 125 | 126 | **Discovery Operations** (`src/tools/discover.py`): 127 | ```python 128 | ctx.info(f"Starting discovery for topic: {query}") 129 | ctx.info(f"Found {len(results)} communities (avg confidence: {avg_conf:.2f})") 130 | 131 | if avg_conf < 0.5: 132 | ctx.warning(f"Low confidence results (<0.5) for query: {query}") 133 | ``` 134 | 135 | **Fetch Operations** (`src/tools/posts.py`): 136 | ```python 137 | ctx.info(f"Fetching {limit} posts from r/{subreddit_name}") 138 | ctx.info(f"Successfully fetched {len(posts)} posts from r/{subreddit_name}") 139 | 140 | # Rate limit warnings 141 | if remaining_requests < 10: 142 | ctx.warning(f"Rate limit approaching: {remaining_requests}/60 requests remaining") 143 | 144 | # Error logging 145 | ctx.error(f"Failed to fetch r/{subreddit_name}: {str(e)}", extra={ 146 | "subreddit": subreddit_name, 147 | "error_type": type(e).__name__ 148 | }) 149 | ``` 150 | 151 | **Search Operations** (`src/tools/search.py`): 152 | ```python 153 | ctx.info(f"Searching r/{subreddit_name} for: {query}") 154 | ctx.debug(f"Search parameters: sort={sort}, time_filter={time_filter}") 155 | ``` 156 | 157 | **Comment Operations** (`src/tools/comments.py`): 158 | ```python 159 | ctx.info(f"Fetching comments for submission: {submission_id}") 160 | ctx.info(f"Loaded {len(comments)} comments (sort: {comment_sort})") 161 | ``` 162 | 163 | #### Log Levels 164 | 165 | - **DEBUG**: Internal operation details, parameter values 166 | - **INFO**: Operation start/completion, success metrics 167 | - **WARNING**: Rate limits, low confidence scores, degraded functionality 168 | - **ERROR**: Operation failures, API errors, exceptions 169 | 170 | #### Files to Modify 171 | - `src/tools/discover.py` - Confidence scores, discovery metrics 172 | - `src/tools/posts.py` - Fetch success/failure, rate limit warnings 173 | - `src/tools/comments.py` - Comment analysis metrics 174 | - `src/tools/search.py` - Search operation logging 175 | 176 | ### Phase 4: Enhanced Error Handling (Days 7-8) 177 | 178 | **Objective**: Provide detailed error context for debugging and recovery 179 | 180 | #### Error Context Pattern 181 | 182 | **Current Implementation:** 183 | ```python 184 | except Exception as e: 185 | return { 186 | "success": False, 187 | "error": str(e), 188 | "recovery": suggest_recovery(operation_id, e) 189 | } 190 | ``` 191 | 192 | **Enhanced Implementation:** 193 | ```python 194 | except Exception as e: 195 | error_type = type(e).__name__ 196 | 197 | # Log error with context 198 | ctx.error( 199 | f"Operation failed: {operation_id}", 200 | extra={ 201 | "operation": operation_id, 202 | "error_type": error_type, 203 | "parameters": parameters, 204 | "timestamp": datetime.now().isoformat() 205 | } 206 | ) 207 | 208 | return { 209 | "success": False, 210 | "error": str(e), 211 | "error_type": error_type, 212 | "operation": operation_id, 213 | "parameters": parameters, 214 | "recovery": suggest_recovery(operation_id, e), 215 | "timestamp": datetime.now().isoformat() 216 | } 217 | ``` 218 | 219 | #### Error Categories & Recovery Suggestions 220 | 221 | | Error Type | Recovery Suggestion | 222 | |------------|-------------------| 223 | | 404 / Not Found | "Verify subreddit name or use discover_subreddits" | 224 | | 429 / Rate Limited | "Reduce limit parameter or wait 30s before retrying" | 225 | | 403 / Private | "Subreddit is private - try other communities" | 226 | | Validation Error | "Check parameters match schema from get_operation_schema" | 227 | | Network Error | "Check internet connection and retry" | 228 | 229 | #### Files to Modify 230 | - `src/server.py` - Enhanced `execute_operation()` error handling 231 | - `src/tools/*.py` - Operation-specific error logging 232 | 233 | ### Phase 5: Testing & Validation (Days 9-10) 234 | 235 | **Objective**: Ensure all instrumentation works correctly 236 | 237 | #### Test Coverage 238 | 239 | **Context Integration Tests** (`tests/test_context_integration.py`): 240 | ```python 241 | async def test_context_injected(): 242 | """Verify context is properly injected into tools""" 243 | 244 | async def test_progress_events_emitted(): 245 | """Verify progress events during multi-step operations""" 246 | 247 | async def test_log_messages_captured(): 248 | """Verify logs at appropriate severity levels""" 249 | 250 | async def test_error_context_included(): 251 | """Verify error responses include operation details""" 252 | ``` 253 | 254 | **Updated Tool Tests** (`tests/test_tools.py`): 255 | - Verify tools receive and use context properly 256 | - Check progress reporting frequency (≥5 events per operation) 257 | - Validate log message content and levels 258 | - Ensure error context is complete 259 | 260 | #### Files to Create/Modify 261 | - Create: `tests/test_context_integration.py` 262 | - Modify: `tests/test_tools.py` 263 | 264 | ## Implementation Details 265 | 266 | ### Context Parameter Pattern 267 | 268 | FastMCP automatically injects Context when tools are decorated with `@mcp.tool`: 269 | 270 | ```python 271 | @mcp.tool 272 | def my_tool(param: str, ctx: Context) -> dict: 273 | # Context is automatically injected 274 | ctx.info("Tool started") 275 | ctx.report_progress(1, 10, "Processing") 276 | return {"result": "data"} 277 | ``` 278 | 279 | For functions called internally (not decorated), Context must be passed explicitly: 280 | 281 | ```python 282 | def internal_function(param: str, ctx: Context) -> dict: 283 | ctx.info("Internal operation") 284 | return {"result": "data"} 285 | ``` 286 | 287 | ### Progress Reporting Best Practices 288 | 289 | 1. **Report at regular intervals**: Every iteration in loops 290 | 2. **Provide descriptive messages**: "Fetching r/Python" not "Step 1" 291 | 3. **Include total when known**: `ctx.report_progress(5, 10, msg)` 292 | 4. **Use meaningful units**: Report actual progress (items processed) not arbitrary percentages 293 | 294 | ### Logging Best Practices 295 | 296 | 1. **Use appropriate levels**: INFO for normal ops, WARNING for issues, ERROR for failures 297 | 2. **Include context in extra**: `ctx.error(msg, extra={"operation": "name"})` 298 | 3. **Structured messages**: Consistent format for parsing 299 | 4. **Avoid spam**: Log meaningful events, not every line 300 | 301 | ### Error Handling Best Practices 302 | 303 | 1. **Specific exception types**: Catch specific errors when possible 304 | 2. **Include operation context**: Always log which operation failed 305 | 3. **Actionable recovery**: Provide specific steps to resolve 306 | 4. **Preserve stack traces**: Log full error details in extra 307 | 308 | ## Success Criteria 309 | 310 | ### Functional Requirements 311 | - ✅ All tool functions accept required Context parameter 312 | - ✅ Progress events emitted during multi-step operations (≥5 per operation) 313 | - ✅ Server logs at appropriate severity levels (DEBUG/INFO/WARNING/ERROR) 314 | - ✅ Error responses include operation name, type, and recovery suggestions 315 | - ✅ MCP client compatibility maintained (Claude, ChatGPT, etc.) 316 | 317 | ### Technical Requirements 318 | - ✅ All existing tests pass with new instrumentation 319 | - ✅ New integration tests verify context functionality 320 | - ✅ No performance degradation (progress/logging overhead <5%) 321 | - ✅ Type hints maintained throughout 322 | 323 | ### Quality Requirements 324 | - ✅ Code follows FastMCP patterns from documentation 325 | - ✅ Logging messages are clear and actionable 326 | - ✅ Error recovery suggestions are specific and helpful 327 | - ✅ Progress messages provide meaningful status updates 328 | 329 | ## File Summary 330 | 331 | ### Files to Create 332 | - `tests/test_context_integration.py` - New integration tests 333 | 334 | ### Files to Modify 335 | - `src/tools/discover.py` - Context, progress, logging 336 | - `src/tools/posts.py` - Context, progress, logging 337 | - `src/tools/comments.py` - Context, progress, logging 338 | - `src/tools/search.py` - Context, logging 339 | - `src/server.py` - Enhanced error handling in execute_operation 340 | - `tests/test_tools.py` - Updated tests for context integration 341 | 342 | ### Files Not Modified 343 | - `src/config.py` - No changes needed 344 | - `src/models.py` - No changes needed 345 | - `src/resources.py` - No changes needed (future enhancement) 346 | - `src/chroma_client.py` - No changes needed 347 | 348 | ## Dependencies 349 | 350 | ### Required 351 | - FastMCP ≥2.0.0 (already installed) 352 | - Python ≥3.10 (already using) 353 | - Context API support (available in FastMCP) 354 | 355 | ### Optional 356 | - No additional dependencies required 357 | 358 | ## Risks & Mitigations 359 | 360 | | Risk | Impact | Mitigation | 361 | |------|--------|------------| 362 | | Performance overhead from logging | Low | Log only meaningful events, avoid verbose debug logs in production | 363 | | Too many progress events | Low | Limit to 5-10 events per operation | 364 | | Breaking MCP client compatibility | Low | Context changes are server-side only; MCP protocol unchanged | 365 | | Testing complexity | Low | Use FastMCP's in-memory transport for tests | 366 | 367 | ## Backward Compatibility 368 | 369 | **MCP Client Compatibility**: Changes are server-side implementation only. The MCP protocol interface remains unchanged, ensuring compatibility with all MCP clients including Claude, ChatGPT, and others. Context injection is handled by FastMCP's decorator system and is transparent to clients. 370 | 371 | ## Future Enhancements 372 | 373 | Following this implementation, future phases could include: 374 | 375 | 1. **Resource Access Tracking** - Monitor `ctx.read_resource()` calls 376 | 2. **Sampling Monitoring** - Track `ctx.sample()` operations 377 | 3. **Metrics Collection** - Aggregate operation timing and success rates 378 | 4. **Client Integration** - Frontend components to display progress/logs 379 | 380 | These are out of scope for this specification. 381 | 382 | ## References 383 | 384 | - [FastMCP Context API Documentation](../ai-docs/fastmcp/docs/python-sdk/fastmcp-server-context.mdx) 385 | - [FastMCP Progress Monitoring](../ai-docs/fastmcp/docs/clients/progress.mdx) 386 | - [FastMCP Logging](../ai-docs/fastmcp/docs/clients/logging.mdx) 387 | - Current Implementation: `src/server.py` 388 | - Original UX Improvements Spec: `../frontend-reddit-research-mcp/specs/002-ux-improvements-fastmcp-patterns/spec.md` 389 | ``` -------------------------------------------------------------------------------- /reddit-research-agent.md: -------------------------------------------------------------------------------- ```markdown 1 | --- 2 | name: reddit-research-agent 3 | description: Use this agent when you need to conduct research using Reddit MCP server tools and produce a comprehensive, well-cited research report in Obsidian-optimized markdown format. This agent specializes in gathering Reddit data (posts, comments, subreddit information), analyzing patterns and insights, and presenting findings with proper inline citations that link back to source materials. 4 | tools: Glob, Grep, LS, Read, WebFetch, TodoWrite, WebSearch, BashOutput, KillBash, ListMcpResourcesTool, ReadMcpResourceTool, Edit, MultiEdit, Write, NotebookEdit, Bash, mcp__reddit-mcp-poc__discover_operations, mcp__reddit-mcp-poc__get_operation_schema, mcp__reddit-mcp-poc__execute_operation 5 | model: opus 6 | color: purple 7 | --- 8 | 9 | You are an insightful Reddit research analyst who transforms community discussions into compelling narratives. You excel at discovering diverse perspectives, synthesizing complex viewpoints, and building analytical stories that explain not just what Reddit thinks, but why different communities think differently. 10 | 11 | ## Core Mission 12 | 13 | Create insightful research narratives that weave together diverse Reddit perspectives into coherent analytical stories, focusing on understanding the "why" behind community viewpoints rather than simply cataloging who said what. 14 | 15 | ## Technical Architecture (Reddit MCP Server) 16 | 17 | Follow the three-layer workflow for Reddit operations: 18 | 1. **Discovery**: `discover_operations()` - NO parameters 19 | 2. **Schema**: `get_operation_schema(operation_id)` 20 | 3. **Execution**: `execute_operation(operation_id, parameters)` 21 | 22 | Key operations: 23 | - `discover_subreddits`: Find diverse, relevant communities 24 | - `fetch_multiple`: Efficiently gather from multiple subreddits 25 | - `fetch_comments`: Deep dive into valuable discussions 26 | 27 | ## Research Approach 28 | 29 | ### 1. Diverse Perspective Discovery 30 | **Goal**: Find 5-7 communities with genuinely different viewpoints 31 | 32 | - Use semantic search to discover conceptually related but diverse subreddits 33 | - Prioritize variety over volume: 34 | - Professional vs hobbyist communities 35 | - Technical vs general audiences 36 | - Supportive vs critical spaces 37 | - Different geographic/demographic focuses 38 | - Look for unexpected or adjacent communities that discuss the topic differently 39 | 40 | ### 2. Strategic Data Gathering 41 | **Goal**: Quality insights over quantity of posts 42 | 43 | ```python 44 | execute_operation("fetch_multiple", { 45 | "subreddit_names": [diverse_subreddits], 46 | "listing_type": "top", 47 | "time_filter": "year", 48 | "limit_per_subreddit": 10-15 49 | }) 50 | ``` 51 | 52 | For high-value discussions: 53 | ```python 54 | execute_operation("fetch_comments", { 55 | "submission_id": post_id, 56 | "comment_limit": 50, 57 | "comment_sort": "best" 58 | }) 59 | ``` 60 | 61 | ### 3. Analytical Synthesis 62 | **Goal**: Build narratives that explain patterns and tensions 63 | 64 | - Identify themes that cut across communities 65 | - Understand WHY different groups hold different views 66 | - Find surprising connections between viewpoints 67 | - Recognize emotional undercurrents and practical concerns 68 | - Connect individual experiences to broader patterns 69 | 70 | ## Evidence & Citation Approach 71 | 72 | **Philosophy**: Mix broad community patterns with individual voices to create rich, evidence-based narratives. 73 | 74 | ### Three Types of Citations (USE ALL THREE): 75 | 76 | #### 1. **Community-Level Citations** (broad patterns) 77 | ```markdown 78 | The r/sales community consistently emphasizes [theme], with discussions 79 | about [topic] dominating recent threads ([link1], [link2], [link3]). 80 | ``` 81 | 82 | #### 2. **Individual Voice Citations** (specific quotes) 83 | ```markdown 84 | As one frustrated user (15 years in sales) explained: "Direct quote that 85 | captures the emotion and specificity" ([r/sales](link)). 86 | ``` 87 | 88 | #### 3. **Cross-Community Pattern Citations** 89 | ```markdown 90 | This sentiment spans from r/technical ([link]) where developers 91 | [perspective], to r/business ([link]) where owners [different angle], 92 | revealing [your analysis of the pattern]. 93 | ``` 94 | 95 | ### Citation Density Requirements: 96 | - **Every major claim**: 2-3 supporting citations minimum 97 | - **Each theme section**: 3-4 broad community citations + 4-5 individual quotes 98 | - **Pattern observations**: Evidence from at least 3 different subreddits 99 | - **NO unsupported generalizations**: Everything cited or framed as a question 100 | 101 | ### Example of Mixed Citation Narrative: 102 | ```markdown 103 | Small businesses are reverting to Excel not from technological ignorance, 104 | but from painful experience. Across r/smallbusiness, implementation horror 105 | stories dominate CRM discussions ([link1], [link2]), with costs frequently 106 | exceeding $70,000 for "basic functionality." One owner captured the 107 | community's frustration: "I paid $500/month to make my job harder" 108 | ([r/smallbusiness](link)). This exodus isn't limited to non-technical users— 109 | even r/programming members share Excel templates as CRM alternatives ([link]), 110 | suggesting the problem transcends technical capability. 111 | ``` 112 | 113 | ## Report Structure 114 | 115 | ```markdown 116 | # [Topic]: Understanding Reddit's Perspective 117 | 118 | ## Summary 119 | [2-3 paragraphs providing your analytical overview of what you discovered. This should tell a coherent story about how Reddit communities view this topic, major tensions, and key insights. Write this AFTER completing your analysis.] 120 | 121 | ## The Conversation Landscape 122 | 123 | [Analytical paragraph explaining the diversity of communities discussing this topic and why different groups care about it differently. For example: "The discussion spans from technical implementation in r/programming to business impact in r/smallbusiness, with surprisingly passionate debate in r/[unexpected_community]..."] 124 | 125 | Key communities analyzed: 126 | - **r/[subreddit]**: [1-line description of this community's unique perspective] 127 | - **r/[subreddit]**: [What makes their viewpoint different] 128 | - **r/[subreddit]**: [Their specific angle or concern] 129 | 130 | ## Major Themes 131 | 132 | **IMPORTANT**: No "Top 10" lists. No bullet-point compilations. Every theme must be a narrative synthesis with extensive evidence from multiple communities showing different perspectives on the same pattern. 133 | 134 | ### Theme 1: [Descriptive Title That Captures the Insight] 135 | 136 | [Opening analytical paragraph explaining what this pattern is and why it matters. Include 2-3 broad community citations showing this is a widespread phenomenon, not isolated incidents.] 137 | 138 | [Second paragraph diving into the human impact with 3-4 specific individual quotes that illustrate different facets of this theme. Show the emotional and practical reality through actual Reddit voices.] 139 | 140 | [Third paragraph connecting different community perspectives, explaining WHY different groups see this differently. Use cross-community citations to show how the same issue manifests differently across subreddits.] 141 | 142 | Example structure: 143 | ```markdown 144 | The CRM complexity crisis isn't about features—it's about fundamental misalignment 145 | between vendor assumptions and small business reality. This theme dominates 146 | r/smallbusiness discussions ([link1], [link2]), appears in weekly rant threads 147 | on r/sales ([link3]), and even surfaces in r/ExperiencedDevs when developers 148 | vent about building CRM integrations ([link4]). 149 | 150 | The frustration is visceral and specific. A sales manager with 15 years 151 | experience wrote: "I calculated it—I spend 38% of my time on CRM data entry 152 | for metrics no one looks at" ([r/sales](link)). Another user, a small business 153 | owner, was more blunt: "Salesforce is where sales go to die" ([r/smallbusiness](link)), 154 | a comment that received 450 upvotes and sparked a thread of similar experiences. 155 | Even technical users aren't immune—a developer noted: "I built our entire CRM 156 | replacement in Google Sheets in a weekend. It does everything we need and nothing 157 | we don't" ([r/programming](link)). 158 | 159 | The divide between communities reveals deeper truths. While r/sales focuses on 160 | time waste ([link1], [link2])—they have dedicated hours but resent non-selling 161 | activities—r/smallbusiness emphasizes resource impossibility ([link3], [link4])— 162 | they simply don't have anyone to dedicate to CRM management. Meanwhile, 163 | r/Entrepreneur questions the entire premise: "CRM is a solution looking for 164 | a problem" was the top comment in a recent discussion ([link5]), suggesting 165 | some view the entire category as manufactured need. 166 | ``` 167 | 168 | ### Theme 2: [Another Major Pattern or Tension] 169 | 170 | [Similar structure - lead with YOUR analysis, support with evidence] 171 | 172 | ### Theme 3: [Emerging Trend or Fundamental Divide] 173 | 174 | [Similar structure - focus on synthesis and interpretation] 175 | 176 | ## Divergent Perspectives 177 | 178 | [Paragraph analyzing why certain communities see this topic so differently. What are the underlying factors - professional background, use cases, values, experiences - that drive these different viewpoints?] 179 | 180 | Example contrasts: 181 | - **Technical vs Business**: [Your analysis of this divide] 182 | - **Veterans vs Newcomers**: [What experience changes] 183 | - **Geographic/Cultural**: [If relevant] 184 | 185 | ## What This Means 186 | 187 | [2-3 paragraphs of YOUR analysis about implications. What should someone building in this space know? What opportunities exist? What mistakes should be avoided? This should flow naturally from your research but be YOUR interpretive voice.] 188 | 189 | Key takeaways: 190 | 1. [Actionable insight based on the research] 191 | 2. [Another practical implication] 192 | 3. [Strategic consideration] 193 | 194 | ## Research Notes 195 | 196 | *Communities analyzed*: [List of subreddits examined] 197 | *Methodology*: Semantic discovery to find diverse perspectives, followed by thematic analysis of top discussions and comments 198 | *Limitations*: [Brief note on any biases or gaps] 199 | ``` 200 | 201 | ## Writing Guidelines 202 | 203 | ### Voice & Tone 204 | - **Analytical**: You're an insightful analyst, not a citation machine 205 | - **Confident**: Make clear assertions based on evidence 206 | - **Nuanced**: Acknowledge complexity without hedging excessively 207 | - **Accessible**: Write for intelligent readers who aren't Reddit experts 208 | 209 | ### What Makes Good Analysis 210 | - Explains WHY patterns exist, not just WHAT they are 211 | - Connects disparate viewpoints into coherent narrative 212 | - Identifies non-obvious insights 213 | - Provides context for understanding different perspectives 214 | - Tells a story that helps readers understand the landscape 215 | 216 | ### What to AVOID 217 | - ❌ "Top 10" or "Top X" lists of any kind 218 | - ❌ Bullet-point lists of complaints or features 219 | - ❌ Unsupported generalizations ("Users hate X" without citations) 220 | - ❌ Platform-by-platform breakdowns without narrative synthesis 221 | - ❌ Generic business writing that could exist without Reddit data 222 | - ❌ Claims without exploring WHY they exist 223 | 224 | ### What to INCLUDE 225 | - ✅ Mixed citations: broad community patterns + individual voices 226 | - ✅ Cross-community analysis showing different perspectives 227 | - ✅ "Why" explanations for every pattern identified 228 | - ✅ Narrative flow that builds understanding progressively 229 | - ✅ Specific quotes that capture emotion and nuance 230 | - ✅ Evidence from at least 3 different communities per theme 231 | 232 | ## File Handling 233 | 234 | When saving reports: 235 | 1. Always save to `./reports/` directory (create if it doesn't exist) 236 | 2. Check if file exists with Read tool first 237 | 3. Use Write for new files, Edit/MultiEdit for existing 238 | 4. Default filename: `./reports/[topic]-reddit-analysis-[YYYY-MM-DD].md` 239 | 240 | Example: 241 | ```bash 242 | # Ensure reports directory exists 243 | mkdir -p ./reports 244 | 245 | # Save with descriptive filename 246 | ./reports/micro-saas-ideas-reddit-analysis-2024-01-15.md 247 | ``` 248 | 249 | ## Quality Checklist 250 | 251 | Before finalizing: 252 | - [ ] Found genuinely diverse perspectives (5-7 different communities) 253 | - [ ] Built coherent narrative that explains the landscape 254 | - [ ] Analysis leads, evidence supports (not vice versa) 255 | - [ ] Explained WHY different groups think differently 256 | - [ ] Connected patterns across communities 257 | - [ ] Provided actionable insights based on findings 258 | - [ ] Maintained analytical voice throughout 259 | - [ ] **Each theme has 8-12 citations minimum (mixed types)** 260 | - [ ] **No "Top X" lists anywhere in the report** 261 | - [ ] **Every claim supported by 2-3 citations** 262 | - [ ] **Community-level patterns shown with multiple links** 263 | - [ ] **Individual voices included for human perspective** 264 | - [ ] **Cross-community patterns demonstrated** 265 | - [ ] **Zero unsupported generalizations** 266 | 267 | ## Core Competencies 268 | 269 | ### 1. Perspective Discovery 270 | - Use semantic search to find conceptually related but culturally different communities 271 | - Identify adjacent spaces that discuss the topic from unique angles 272 | - Recognize when different terms are used for the same concept 273 | 274 | ### 2. Narrative Building 275 | - Connect individual comments to broader patterns 276 | - Explain tensions between different viewpoints 277 | - Identify emotional and practical drivers behind opinions 278 | - Build stories that make complex landscapes understandable 279 | 280 | ### 3. Analytical Commentary 281 | - Add interpretive value beyond summarization 282 | - Explain implications and opportunities 283 | - Connect Reddit insights to real-world applications 284 | - Provide strategic guidance based on community wisdom 285 | 286 | ## Remember 287 | 288 | You're not a court reporter documenting everything said. You're an investigative analyst who: 289 | - Finds diverse perspectives across Reddit's ecosystem 290 | - Understands WHY different communities think differently 291 | - Builds compelling narratives that explain complex landscapes 292 | - Provides actionable insights through analytical synthesis 293 | 294 | Your reports should feel like reading excellent research journalism - informative, insightful, and built on solid evidence, but driven by narrative and analysis rather than exhaustive citation. ``` -------------------------------------------------------------------------------- /tests/test_context_integration.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Integration tests for Context parameter acceptance in Phase 1. 3 | 4 | This test suite verifies that all tool and operation functions 5 | accept the Context parameter as required by FastMCP's Context API. 6 | Phase 1 only validates parameter acceptance - actual context usage 7 | will be tested in Phase 2+. 8 | """ 9 | 10 | import pytest 11 | import sys 12 | import os 13 | from unittest.mock import Mock, MagicMock, AsyncMock 14 | from fastmcp import Context 15 | 16 | # Add project root to Python path 17 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) 18 | 19 | from src.tools.discover import discover_subreddits, validate_subreddit 20 | from src.tools.search import search_in_subreddit 21 | from src.tools.posts import fetch_subreddit_posts, fetch_multiple_subreddits 22 | from src.tools.comments import fetch_submission_with_comments 23 | 24 | 25 | @pytest.fixture 26 | def mock_context(): 27 | """Create a mock Context object for testing.""" 28 | return Mock(spec=Context) 29 | 30 | 31 | @pytest.fixture 32 | def mock_reddit(): 33 | """Create a mock Reddit client.""" 34 | return Mock() 35 | 36 | 37 | @pytest.fixture 38 | def mock_chroma(): 39 | """Mock ChromaDB client and collection.""" 40 | with Mock() as mock_client: 41 | mock_collection = Mock() 42 | mock_collection.query.return_value = { 43 | 'metadatas': [[ 44 | {'name': 'test', 'subscribers': 1000, 'url': 'https://reddit.com/r/test', 'nsfw': False} 45 | ]], 46 | 'distances': [[0.5]] 47 | } 48 | yield mock_client, mock_collection 49 | 50 | 51 | class TestDiscoverOperations: 52 | """Test discover_subreddits accepts context.""" 53 | 54 | async def test_discover_accepts_context(self, mock_context, monkeypatch): 55 | """Verify discover_subreddits accepts context parameter.""" 56 | # Mock the chroma client 57 | mock_client = Mock() 58 | mock_collection = Mock() 59 | mock_collection.query.return_value = { 60 | 'metadatas': [[ 61 | {'name': 'test', 'subscribers': 1000, 'url': 'https://reddit.com/r/test', 'nsfw': False} 62 | ]], 63 | 'distances': [[0.5]] 64 | } 65 | 66 | def mock_get_client(): 67 | return mock_client 68 | 69 | def mock_get_collection(name, client): 70 | return mock_collection 71 | 72 | monkeypatch.setattr('src.tools.discover.get_chroma_client', mock_get_client) 73 | monkeypatch.setattr('src.tools.discover.get_collection', mock_get_collection) 74 | 75 | # Call with context 76 | result = await discover_subreddits(query="test", limit=5, ctx=mock_context) 77 | 78 | # Verify result structure (not context usage - that's Phase 2) 79 | assert "subreddits" in result or "error" in result 80 | 81 | 82 | class TestSearchOperations: 83 | """Test search_in_subreddit accepts context.""" 84 | 85 | def test_search_accepts_context(self, mock_context, mock_reddit): 86 | """Verify search_in_subreddit accepts context parameter.""" 87 | mock_subreddit = Mock() 88 | mock_subreddit.display_name = "test" 89 | mock_subreddit.search.return_value = [] 90 | mock_reddit.subreddit.return_value = mock_subreddit 91 | 92 | result = search_in_subreddit( 93 | subreddit_name="test", 94 | query="test query", 95 | reddit=mock_reddit, 96 | limit=5, 97 | ctx=mock_context 98 | ) 99 | 100 | assert "results" in result or "error" in result 101 | 102 | 103 | class TestPostOperations: 104 | """Test post-fetching functions accept context.""" 105 | 106 | def test_fetch_posts_accepts_context(self, mock_context, mock_reddit): 107 | """Verify fetch_subreddit_posts accepts context parameter.""" 108 | mock_subreddit = Mock() 109 | mock_subreddit.display_name = "test" 110 | mock_subreddit.subscribers = 1000 111 | mock_subreddit.public_description = "Test" 112 | mock_subreddit.hot.return_value = [] 113 | mock_reddit.subreddit.return_value = mock_subreddit 114 | 115 | result = fetch_subreddit_posts( 116 | subreddit_name="test", 117 | reddit=mock_reddit, 118 | limit=5, 119 | ctx=mock_context 120 | ) 121 | 122 | assert "posts" in result or "error" in result 123 | 124 | async def test_fetch_multiple_accepts_context(self, mock_context, mock_reddit): 125 | """Verify fetch_multiple_subreddits accepts context parameter.""" 126 | mock_multi = Mock() 127 | mock_multi.hot.return_value = [] 128 | mock_reddit.subreddit.return_value = mock_multi 129 | 130 | result = await fetch_multiple_subreddits( 131 | subreddit_names=["test1", "test2"], 132 | reddit=mock_reddit, 133 | limit_per_subreddit=5, 134 | ctx=mock_context 135 | ) 136 | 137 | assert "subreddits_requested" in result or "error" in result 138 | 139 | 140 | class TestCommentOperations: 141 | """Test comment-fetching functions accept context.""" 142 | 143 | async def test_fetch_comments_accepts_context(self, mock_context, mock_reddit): 144 | """Verify fetch_submission_with_comments accepts context parameter.""" 145 | mock_submission = Mock() 146 | mock_submission.id = "test123" 147 | mock_submission.title = "Test" 148 | mock_submission.author = Mock() 149 | mock_submission.author.__str__ = Mock(return_value="testuser") 150 | mock_submission.score = 100 151 | mock_submission.upvote_ratio = 0.95 152 | mock_submission.num_comments = 0 153 | mock_submission.created_utc = 1234567890.0 154 | mock_submission.url = "https://reddit.com/test" 155 | mock_submission.selftext = "" 156 | mock_submission.subreddit = Mock() 157 | mock_submission.subreddit.display_name = "test" 158 | 159 | # Mock comments 160 | mock_comments = Mock() 161 | mock_comments.__iter__ = Mock(return_value=iter([])) 162 | mock_comments.replace_more = Mock() 163 | mock_submission.comments = mock_comments 164 | 165 | mock_reddit.submission.return_value = mock_submission 166 | 167 | result = await fetch_submission_with_comments( 168 | reddit=mock_reddit, 169 | submission_id="test123", 170 | comment_limit=10, 171 | ctx=mock_context 172 | ) 173 | 174 | assert "submission" in result or "error" in result 175 | 176 | 177 | class TestHelperFunctions: 178 | """Test helper functions accept context.""" 179 | 180 | def test_validate_subreddit_accepts_context(self, mock_context, monkeypatch): 181 | """Verify validate_subreddit accepts context parameter.""" 182 | # Mock the chroma client 183 | mock_client = Mock() 184 | mock_collection = Mock() 185 | mock_collection.query.return_value = { 186 | 'metadatas': [[ 187 | {'name': 'test', 'subscribers': 1000, 'nsfw': False} 188 | ]], 189 | 'distances': [[0.5]] 190 | } 191 | 192 | def mock_get_client(): 193 | return mock_client 194 | 195 | def mock_get_collection(name, client): 196 | return mock_collection 197 | 198 | monkeypatch.setattr('src.tools.discover.get_chroma_client', mock_get_client) 199 | monkeypatch.setattr('src.tools.discover.get_collection', mock_get_collection) 200 | 201 | result = validate_subreddit("test", ctx=mock_context) 202 | 203 | assert "valid" in result or "error" in result 204 | 205 | 206 | class TestContextParameterPosition: 207 | """Test that context parameter works in various positions.""" 208 | 209 | def test_context_as_last_param(self, mock_context, mock_reddit): 210 | """Verify context works as the last parameter.""" 211 | mock_subreddit = Mock() 212 | mock_subreddit.display_name = "test" 213 | mock_subreddit.search.return_value = [] 214 | mock_reddit.subreddit.return_value = mock_subreddit 215 | 216 | # Context is last parameter 217 | result = search_in_subreddit( 218 | subreddit_name="test", 219 | query="test", 220 | reddit=mock_reddit, 221 | sort="relevance", 222 | time_filter="all", 223 | limit=10, 224 | ctx=mock_context 225 | ) 226 | 227 | assert result is not None 228 | 229 | def test_context_with_defaults(self, mock_context, mock_reddit): 230 | """Verify context works with default parameters.""" 231 | mock_subreddit = Mock() 232 | mock_subreddit.display_name = "test" 233 | mock_subreddit.search.return_value = [] 234 | mock_reddit.subreddit.return_value = mock_subreddit 235 | 236 | # Only required params + context 237 | result = search_in_subreddit( 238 | subreddit_name="test", 239 | query="test", 240 | reddit=mock_reddit, 241 | ctx=mock_context 242 | ) 243 | 244 | assert result is not None 245 | 246 | 247 | class TestDiscoverSubredditsProgress: 248 | """Test progress reporting in discover_subreddits.""" 249 | 250 | async def test_reports_progress_during_search(self, mock_context, monkeypatch): 251 | """Verify progress is reported during vector search.""" 252 | # Mock ChromaDB response with 3 results 253 | mock_client = Mock() 254 | mock_collection = Mock() 255 | mock_collection.query.return_value = { 256 | 'metadatas': [[ 257 | {'name': 'Python', 'subscribers': 1000000, 'nsfw': False}, 258 | {'name': 'learnpython', 'subscribers': 500000, 'nsfw': False}, 259 | {'name': 'pythontips', 'subscribers': 100000, 'nsfw': False} 260 | ]], 261 | 'distances': [[0.5, 0.7, 0.9]] 262 | } 263 | 264 | # Setup async mock for progress 265 | mock_context.report_progress = AsyncMock() 266 | 267 | def mock_get_client(): 268 | return mock_client 269 | 270 | def mock_get_collection(name, client): 271 | return mock_collection 272 | 273 | monkeypatch.setattr('src.tools.discover.get_chroma_client', mock_get_client) 274 | monkeypatch.setattr('src.tools.discover.get_collection', mock_get_collection) 275 | 276 | result = await discover_subreddits(query="python", ctx=mock_context) 277 | 278 | # Verify progress was reported at least 3 times (once per result) 279 | assert mock_context.report_progress.call_count >= 3 280 | 281 | # Verify progress parameters 282 | first_call = mock_context.report_progress.call_args_list[0] 283 | # Check if arguments were passed as kwargs or positional args 284 | if first_call[1]: # kwargs 285 | assert 'progress' in first_call[1] 286 | assert 'total' in first_call[1] 287 | else: # positional 288 | assert len(first_call[0]) >= 2 289 | 290 | 291 | class TestFetchMultipleProgress: 292 | """Test progress reporting in fetch_multiple_subreddits.""" 293 | 294 | async def test_reports_progress_per_subreddit(self, mock_context, mock_reddit): 295 | """Verify progress is reported once per subreddit.""" 296 | # Setup async mock for progress 297 | mock_context.report_progress = AsyncMock() 298 | 299 | # Mock submissions from 3 different subreddits 300 | mock_sub1 = Mock() 301 | mock_sub1.subreddit.display_name = "sub1" 302 | mock_sub1.id = "id1" 303 | mock_sub1.title = "Title 1" 304 | mock_sub1.author = Mock() 305 | mock_sub1.author.__str__ = Mock(return_value="user1") 306 | mock_sub1.score = 100 307 | mock_sub1.num_comments = 10 308 | mock_sub1.created_utc = 1234567890.0 309 | mock_sub1.url = "https://reddit.com/test1" 310 | mock_sub1.permalink = "/r/sub1/comments/id1/" 311 | 312 | mock_sub2 = Mock() 313 | mock_sub2.subreddit.display_name = "sub2" 314 | mock_sub2.id = "id2" 315 | mock_sub2.title = "Title 2" 316 | mock_sub2.author = Mock() 317 | mock_sub2.author.__str__ = Mock(return_value="user2") 318 | mock_sub2.score = 200 319 | mock_sub2.num_comments = 20 320 | mock_sub2.created_utc = 1234567891.0 321 | mock_sub2.url = "https://reddit.com/test2" 322 | mock_sub2.permalink = "/r/sub2/comments/id2/" 323 | 324 | mock_sub3 = Mock() 325 | mock_sub3.subreddit.display_name = "sub3" 326 | mock_sub3.id = "id3" 327 | mock_sub3.title = "Title 3" 328 | mock_sub3.author = Mock() 329 | mock_sub3.author.__str__ = Mock(return_value="user3") 330 | mock_sub3.score = 300 331 | mock_sub3.num_comments = 30 332 | mock_sub3.created_utc = 1234567892.0 333 | mock_sub3.url = "https://reddit.com/test3" 334 | mock_sub3.permalink = "/r/sub3/comments/id3/" 335 | 336 | mock_multi = Mock() 337 | mock_multi.hot.return_value = [mock_sub1, mock_sub2, mock_sub3] 338 | mock_reddit.subreddit.return_value = mock_multi 339 | 340 | result = await fetch_multiple_subreddits( 341 | subreddit_names=["sub1", "sub2", "sub3"], 342 | reddit=mock_reddit, 343 | ctx=mock_context 344 | ) 345 | 346 | # Verify progress was reported at least 3 times (once per subreddit) 347 | assert mock_context.report_progress.call_count >= 3 348 | 349 | 350 | class TestFetchCommentsProgress: 351 | """Test progress reporting in fetch_submission_with_comments.""" 352 | 353 | async def test_reports_progress_during_loading(self, mock_context, mock_reddit): 354 | """Verify progress is reported during comment loading.""" 355 | # Setup async mock for progress 356 | mock_context.report_progress = AsyncMock() 357 | 358 | # Mock submission 359 | mock_submission = Mock() 360 | mock_submission.id = "test123" 361 | mock_submission.title = "Test" 362 | mock_submission.author = Mock() 363 | mock_submission.author.__str__ = Mock(return_value="testuser") 364 | mock_submission.score = 100 365 | mock_submission.upvote_ratio = 0.95 366 | mock_submission.num_comments = 5 367 | mock_submission.created_utc = 1234567890.0 368 | mock_submission.url = "https://reddit.com/test" 369 | mock_submission.selftext = "" 370 | mock_submission.subreddit = Mock() 371 | mock_submission.subreddit.display_name = "test" 372 | 373 | # Mock 5 comments 374 | mock_comments_list = [] 375 | for i in range(5): 376 | mock_comment = Mock() 377 | mock_comment.id = f"comment{i}" 378 | mock_comment.body = f"Comment {i}" 379 | mock_comment.author = Mock() 380 | mock_comment.author.__str__ = Mock(return_value=f"user{i}") 381 | mock_comment.score = 10 * i 382 | mock_comment.created_utc = 1234567890.0 + i 383 | mock_comment.replies = [] 384 | mock_comments_list.append(mock_comment) 385 | 386 | mock_comments = Mock() 387 | mock_comments.__iter__ = Mock(return_value=iter(mock_comments_list)) 388 | mock_comments.replace_more = Mock() 389 | mock_submission.comments = mock_comments 390 | 391 | mock_reddit.submission.return_value = mock_submission 392 | 393 | result = await fetch_submission_with_comments( 394 | reddit=mock_reddit, 395 | submission_id="test123", 396 | comment_limit=10, 397 | ctx=mock_context 398 | ) 399 | 400 | # Verify progress was reported at least 6 times (5 comments + 1 completion) 401 | assert mock_context.report_progress.call_count >= 6 402 | ``` -------------------------------------------------------------------------------- /specs/003-phase-2-progress-monitoring.md: -------------------------------------------------------------------------------- ```markdown 1 | # Phase 2: Progress Monitoring Implementation 2 | 3 | **Status:** Ready for Implementation 4 | **Created:** 2025-10-02 5 | **Owner:** Engineering Team 6 | **Depends On:** Phase 1 (Context Integration) ✅ Complete 7 | 8 | ## Executive Summary 9 | 10 | This specification details Phase 2 of the FastMCP Context API integration: adding real-time progress reporting to long-running Reddit operations. With Phase 1 complete (all tools accept `Context`), this phase focuses on implementing `ctx.report_progress()` calls to provide visibility into multi-step operations. 11 | 12 | **Timeline:** 1-2 days 13 | **Effort:** Low (foundation already in place from Phase 1) 14 | 15 | ## Background 16 | 17 | ### Phase 1 Completion Summary 18 | 19 | Phase 1 successfully integrated the FastMCP `Context` parameter into all tool and operation functions: 20 | - ✅ All MCP tool functions accept `ctx: Context` 21 | - ✅ All operation functions accept and receive context 22 | - ✅ Helper functions updated with context forwarding 23 | - ✅ 15 tests passing (8 integration tests + 7 updated existing tests) 24 | 25 | **Current State:** Context is available but unused (commented as "Phase 1: Accept context but don't use it yet") 26 | 27 | ### Why Progress Monitoring? 28 | 29 | Reddit operations can be time-consuming: 30 | - **Vector search**: Searching thousands of subreddits and calculating confidence scores 31 | - **Multi-subreddit fetches**: Fetching posts from 5-10 communities sequentially 32 | - **Comment tree loading**: Parsing nested comment threads with hundreds of replies 33 | 34 | Progress monitoring provides: 35 | - Real-time feedback to users during long operations 36 | - Prevention of timeout errors by showing active progress 37 | - Better debugging visibility into operation performance 38 | - Enhanced user experience with progress indicators 39 | 40 | ## Goals 41 | 42 | 1. ✅ Report progress during vector search iterations (`discover_subreddits`) 43 | 2. ✅ Report progress per subreddit in batch fetches (`fetch_multiple_subreddits`) 44 | 3. ✅ Report progress during comment tree traversal (`fetch_submission_with_comments`) 45 | 4. ✅ Maintain all existing test coverage (15 tests must pass) 46 | 5. ✅ Follow FastMCP progress reporting patterns from official docs 47 | 48 | ## Non-Goals 49 | 50 | - Frontend progress UI (separate project) 51 | - Progress for single-subreddit fetches (too fast to matter) 52 | - Structured logging (Phase 3) 53 | - Enhanced error handling (Phase 4) 54 | 55 | ## Implementation Plan 56 | 57 | ### Operation 1: discover_subreddits Progress 58 | 59 | **File:** `src/tools/discover.py` 60 | **Function:** `_search_vector_db()` (lines 101-239) 61 | **Location:** Result processing loop (lines 137-188) 62 | 63 | #### Current Code Pattern 64 | 65 | ```python 66 | # Process results 67 | processed_results = [] 68 | nsfw_filtered = 0 69 | 70 | for metadata, distance in zip( 71 | results['metadatas'][0], 72 | results['distances'][0] 73 | ): 74 | # Skip NSFW if not requested 75 | if metadata.get('nsfw', False) and not include_nsfw: 76 | nsfw_filtered += 1 77 | continue 78 | 79 | # Calculate confidence score... 80 | # Apply penalties... 81 | # Determine match type... 82 | 83 | processed_results.append({...}) 84 | ``` 85 | 86 | #### Enhanced Implementation 87 | 88 | ```python 89 | # Process results 90 | processed_results = [] 91 | nsfw_filtered = 0 92 | total_results = len(results['metadatas'][0]) 93 | 94 | for i, (metadata, distance) in enumerate(zip( 95 | results['metadatas'][0], 96 | results['distances'][0] 97 | )): 98 | # Report progress (async call required) 99 | if ctx: 100 | await ctx.report_progress( 101 | progress=i + 1, 102 | total=total_results, 103 | message=f"Analyzing r/{metadata.get('name', 'unknown')}" 104 | ) 105 | 106 | # Skip NSFW if not requested 107 | if metadata.get('nsfw', False) and not include_nsfw: 108 | nsfw_filtered += 1 109 | continue 110 | 111 | # Calculate confidence score... 112 | # Apply penalties... 113 | # Determine match type... 114 | 115 | processed_results.append({...}) 116 | ``` 117 | 118 | #### Changes Required 119 | 120 | 1. **Make function async**: Change `def _search_vector_db(...)` → `async def _search_vector_db(...)` 121 | 2. **Make parent function async**: Change `def discover_subreddits(...)` → `async def discover_subreddits(...)` 122 | 3. **Add await to calls**: Update `discover_subreddits` to `await _search_vector_db(...)` 123 | 4. **Add progress in loop**: Insert `await ctx.report_progress(...)` before processing each result 124 | 5. **Calculate total**: Add `total_results = len(results['metadatas'][0])` before loop 125 | 126 | **Progress Events:** ~10-100 (depending on limit parameter) 127 | 128 | --- 129 | 130 | ### Operation 2: fetch_multiple_subreddits Progress 131 | 132 | **File:** `src/tools/posts.py` 133 | **Function:** `fetch_multiple_subreddits()` (lines 102-188) 134 | **Location:** Subreddit iteration loop (lines 153-172) 135 | 136 | #### Current Code Pattern 137 | 138 | ```python 139 | # Parse posts and group by subreddit 140 | posts_by_subreddit = {} 141 | for submission in submissions: 142 | subreddit_name = submission.subreddit.display_name 143 | 144 | if subreddit_name not in posts_by_subreddit: 145 | posts_by_subreddit[subreddit_name] = [] 146 | 147 | # Only add up to limit_per_subreddit posts per subreddit 148 | if len(posts_by_subreddit[subreddit_name]) < limit_per_subreddit: 149 | posts_by_subreddit[subreddit_name].append({...}) 150 | ``` 151 | 152 | #### Enhanced Implementation 153 | 154 | ```python 155 | # Parse posts and group by subreddit 156 | posts_by_subreddit = {} 157 | processed_subreddits = set() 158 | 159 | for i, submission in enumerate(submissions): 160 | subreddit_name = submission.subreddit.display_name 161 | 162 | # Report progress when encountering a new subreddit 163 | if subreddit_name not in processed_subreddits: 164 | processed_subreddits.add(subreddit_name) 165 | if ctx: 166 | await ctx.report_progress( 167 | progress=len(processed_subreddits), 168 | total=len(subreddit_names), 169 | message=f"Fetching r/{subreddit_name}" 170 | ) 171 | 172 | if subreddit_name not in posts_by_subreddit: 173 | posts_by_subreddit[subreddit_name] = [] 174 | 175 | # Only add up to limit_per_subreddit posts per subreddit 176 | if len(posts_by_subreddit[subreddit_name]) < limit_per_subreddit: 177 | posts_by_subreddit[subreddit_name].append({...}) 178 | ``` 179 | 180 | #### Changes Required 181 | 182 | 1. **Make function async**: Change `def fetch_multiple_subreddits(...)` → `async def fetch_multiple_subreddits(...)` 183 | 2. **Track processed subreddits**: Add `processed_subreddits = set()` before loop 184 | 3. **Add progress on new subreddit**: When a new subreddit is encountered, report progress 185 | 4. **Update server.py**: Add `await` when calling this function in `execute_operation()` 186 | 187 | **Progress Events:** 1-10 (one per unique subreddit found) 188 | 189 | --- 190 | 191 | ### Operation 3: fetch_submission_with_comments Progress 192 | 193 | **File:** `src/tools/comments.py` 194 | **Function:** `fetch_submission_with_comments()` (lines 47-147) 195 | **Location:** Comment parsing loop (lines 116-136) 196 | 197 | #### Current Code Pattern 198 | 199 | ```python 200 | # Parse comments 201 | comments = [] 202 | comment_count = 0 203 | 204 | for top_level_comment in submission.comments: 205 | if hasattr(top_level_comment, 'id') and hasattr(top_level_comment, 'body'): 206 | if comment_count >= comment_limit: 207 | break 208 | if isinstance(top_level_comment, PrawComment): 209 | comments.append(parse_comment_tree(top_level_comment, ctx=ctx)) 210 | else: 211 | # Handle mock objects in tests 212 | comments.append(Comment(...)) 213 | # Count all comments including replies 214 | comment_count += 1 + count_replies(comments[-1]) 215 | ``` 216 | 217 | #### Enhanced Implementation 218 | 219 | ```python 220 | # Parse comments 221 | comments = [] 222 | comment_count = 0 223 | 224 | for top_level_comment in submission.comments: 225 | if hasattr(top_level_comment, 'id') and hasattr(top_level_comment, 'body'): 226 | if comment_count >= comment_limit: 227 | break 228 | 229 | # Report progress before processing comment 230 | if ctx: 231 | await ctx.report_progress( 232 | progress=comment_count, 233 | total=comment_limit, 234 | message=f"Loading comments ({comment_count}/{comment_limit})" 235 | ) 236 | 237 | if isinstance(top_level_comment, PrawComment): 238 | comments.append(parse_comment_tree(top_level_comment, ctx=ctx)) 239 | else: 240 | # Handle mock objects in tests 241 | comments.append(Comment(...)) 242 | # Count all comments including replies 243 | comment_count += 1 + count_replies(comments[-1]) 244 | 245 | # Report final completion 246 | if ctx: 247 | await ctx.report_progress( 248 | progress=comment_count, 249 | total=comment_limit, 250 | message=f"Completed: {comment_count} comments loaded" 251 | ) 252 | ``` 253 | 254 | #### Changes Required 255 | 256 | 1. **Make function async**: Change `def fetch_submission_with_comments(...)` → `async def fetch_submission_with_comments(...)` 257 | 2. **Add progress in loop**: Insert `await ctx.report_progress(...)` before parsing each top-level comment 258 | 3. **Add completion progress**: Report final progress after loop completes 259 | 4. **Update server.py**: Add `await` when calling this function in `execute_operation()` 260 | 261 | **Progress Events:** ~5-100 (depending on comment_limit and tree depth) 262 | 263 | --- 264 | 265 | ## FastMCP Progress Patterns 266 | 267 | ### Basic Pattern (from FastMCP docs) 268 | 269 | ```python 270 | from fastmcp import FastMCP, Context 271 | 272 | @mcp.tool 273 | async def process_items(items: list[str], ctx: Context) -> dict: 274 | """Process a list of items with progress updates.""" 275 | total = len(items) 276 | results = [] 277 | 278 | for i, item in enumerate(items): 279 | # Report progress as we process each item 280 | await ctx.report_progress(progress=i, total=total) 281 | 282 | results.append(item.upper()) 283 | 284 | # Report 100% completion 285 | await ctx.report_progress(progress=total, total=total) 286 | 287 | return {"processed": len(results), "results": results} 288 | ``` 289 | 290 | ### Key Requirements 291 | 292 | 1. **Functions must be async** to use `await ctx.report_progress()` 293 | 2. **Progress parameter**: Current progress value (e.g., 5, 24, 0.75) 294 | 3. **Total parameter**: Optional total value (enables percentage calculation) 295 | 4. **Message parameter**: Optional descriptive message (not shown in examples above but supported) 296 | 297 | ### Best Practices 298 | 299 | - Report at regular intervals (every iteration for small loops) 300 | - Provide descriptive messages when possible 301 | - Report final completion (100%) 302 | - Don't spam - limit to reasonable frequency (5-10 events minimum) 303 | 304 | ## Testing Requirements 305 | 306 | ### Update Existing Tests 307 | 308 | **File:** `tests/test_context_integration.py` 309 | 310 | Add assertions to verify progress calls: 311 | 312 | ```python 313 | import pytest 314 | from unittest.mock import AsyncMock, MagicMock, patch 315 | 316 | class TestDiscoverSubredditsProgress: 317 | """Test progress reporting in discover_subreddits.""" 318 | 319 | @pytest.mark.asyncio 320 | async def test_reports_progress_during_search(self, mock_context): 321 | """Verify progress is reported during vector search.""" 322 | # Mock ChromaDB response with 3 results 323 | mock_collection = MagicMock() 324 | mock_collection.query.return_value = { 325 | 'metadatas': [[ 326 | {'name': 'Python', 'subscribers': 1000000, 'nsfw': False}, 327 | {'name': 'learnpython', 'subscribers': 500000, 'nsfw': False}, 328 | {'name': 'pythontips', 'subscribers': 100000, 'nsfw': False} 329 | ]], 330 | 'distances': [[0.5, 0.7, 0.9]] 331 | } 332 | 333 | # Setup async mock for progress 334 | mock_context.report_progress = AsyncMock() 335 | 336 | with patch('src.tools.discover.get_chroma_client'), \ 337 | patch('src.tools.discover.get_collection', return_value=mock_collection): 338 | 339 | result = await discover_subreddits(query="python", ctx=mock_context) 340 | 341 | # Verify progress was reported at least 3 times (once per result) 342 | assert mock_context.report_progress.call_count >= 3 343 | 344 | # Verify progress parameters 345 | first_call = mock_context.report_progress.call_args_list[0] 346 | assert 'progress' in first_call[1] or len(first_call[0]) >= 1 347 | assert 'total' in first_call[1] or len(first_call[0]) >= 2 348 | ``` 349 | 350 | ### New Test Coverage 351 | 352 | Add similar tests for: 353 | - `test_fetch_multiple_subreddits_progress` - Verify progress per subreddit 354 | - `test_fetch_comments_progress` - Verify progress during comment loading 355 | 356 | ### Success Criteria 357 | 358 | - ✅ All existing 15 tests still pass 359 | - ✅ New progress assertion tests pass 360 | - ✅ Progress called at least 5 times per operation (varies by data) 361 | - ✅ No performance degradation (progress overhead <5%) 362 | 363 | ## Server.py Updates 364 | 365 | **File:** `src/server.py` 366 | **Functions:** Update calls to async operations 367 | 368 | ### Current Pattern 369 | 370 | ```python 371 | @mcp.tool 372 | def execute_operation( 373 | operation_id: str, 374 | parameters: dict, 375 | ctx: Context 376 | ) -> dict: 377 | """Execute a Reddit operation by ID.""" 378 | 379 | if operation_id == "discover_subreddits": 380 | return discover_subreddits(**params) 381 | ``` 382 | 383 | ### Updated Pattern 384 | 385 | ```python 386 | @mcp.tool 387 | async def execute_operation( 388 | operation_id: str, 389 | parameters: dict, 390 | ctx: Context 391 | ) -> dict: 392 | """Execute a Reddit operation by ID.""" 393 | 394 | if operation_id == "discover_subreddits": 395 | return await discover_subreddits(**params) 396 | ``` 397 | 398 | ### Changes Required 399 | 400 | 1. **Make execute_operation async**: `async def execute_operation(...)` 401 | 2. **Add await to async operations**: 402 | - `await discover_subreddits(**params)` 403 | - `await fetch_multiple_subreddits(**params)` 404 | - `await fetch_submission_with_comments(**params)` 405 | 406 | ## Implementation Checklist 407 | 408 | ### Code Changes 409 | 410 | - [ ] **src/tools/discover.py** 411 | - [ ] Make `discover_subreddits()` async 412 | - [ ] Make `_search_vector_db()` async 413 | - [ ] Add `await` to `_search_vector_db()` call 414 | - [ ] Add progress reporting in result processing loop 415 | - [ ] Calculate total before loop starts 416 | 417 | - [ ] **src/tools/posts.py** 418 | - [ ] Make `fetch_multiple_subreddits()` async 419 | - [ ] Add `processed_subreddits` tracking set 420 | - [ ] Add progress reporting when new subreddit encountered 421 | 422 | - [ ] **src/tools/comments.py** 423 | - [ ] Make `fetch_submission_with_comments()` async 424 | - [ ] Add progress reporting in comment parsing loop 425 | - [ ] Add final completion progress report 426 | 427 | - [ ] **src/server.py** 428 | - [ ] Make `execute_operation()` async 429 | - [ ] Add `await` to `discover_subreddits()` call 430 | - [ ] Add `await` to `fetch_multiple_subreddits()` call 431 | - [ ] Add `await` to `fetch_submission_with_comments()` call 432 | 433 | ### Testing 434 | 435 | - [ ] Update `tests/test_context_integration.py` 436 | - [ ] Add progress test for `discover_subreddits` 437 | - [ ] Add progress test for `fetch_multiple_subreddits` 438 | - [ ] Add progress test for `fetch_submission_with_comments` 439 | 440 | - [ ] Run full test suite: `pytest tests/` 441 | - [ ] All 15 existing tests pass 442 | - [ ] New progress tests pass 443 | - [ ] No regressions 444 | 445 | ### Validation 446 | 447 | - [ ] Manual testing with MCP Inspector or Claude Desktop 448 | - [ ] Verify progress events appear in client logs 449 | - [ ] Confirm no performance degradation 450 | - [ ] Check that messages are descriptive and useful 451 | 452 | ## File Summary 453 | 454 | ### Files to Modify (4 files) 455 | 456 | 1. `src/tools/discover.py` - Add progress to vector search 457 | 2. `src/tools/posts.py` - Add progress to batch fetches 458 | 3. `src/tools/comments.py` - Add progress to comment loading 459 | 4. `src/server.py` - Make execute_operation async + await calls 460 | 461 | ### Files to Update (1 file) 462 | 463 | 1. `tests/test_context_integration.py` - Add progress assertions 464 | 465 | ### Files Not Modified 466 | 467 | - `src/config.py` - No changes needed 468 | - `src/models.py` - No changes needed 469 | - `src/chroma_client.py` - No changes needed 470 | - `src/resources.py` - No changes needed 471 | - `tests/test_tools.py` - No changes needed (already passing) 472 | 473 | ## Success Criteria 474 | 475 | ### Functional Requirements 476 | 477 | - ✅ Progress events emitted during vector search (≥5 per search) 478 | - ✅ Progress events emitted during multi-subreddit fetch (1 per subreddit) 479 | - ✅ Progress events emitted during comment loading (≥5 per fetch) 480 | - ✅ Progress includes total when known 481 | - ✅ Progress messages are descriptive 482 | 483 | ### Technical Requirements 484 | 485 | - ✅ All functions properly async/await 486 | - ✅ All 15+ tests pass 487 | - ✅ No breaking changes to API 488 | - ✅ Type hints maintained 489 | - ✅ No performance degradation 490 | 491 | ### Quality Requirements 492 | 493 | - ✅ Progress messages are user-friendly 494 | - ✅ Progress updates at reasonable frequency (not spammy) 495 | - ✅ Code follows FastMCP patterns from official docs 496 | - ✅ Maintains consistency with Phase 1 implementation 497 | 498 | ## Estimated Effort 499 | 500 | **Total Time:** 1-2 days 501 | 502 | **Breakdown:** 503 | - Code implementation: 3-4 hours 504 | - Testing updates: 2-3 hours 505 | - Manual validation: 1 hour 506 | - Bug fixes & refinement: 1-2 hours 507 | 508 | **Reduced from master spec (3-4 days)** because: 509 | - Phase 1 foundation complete (Context integration done) 510 | - Clear patterns established in codebase 511 | - Limited scope (3 operations only) 512 | - Existing test infrastructure in place 513 | 514 | ## Next Steps 515 | 516 | After Phase 2 completion: 517 | - **Phase 3**: Structured Logging (2-3 days) 518 | - **Phase 4**: Enhanced Error Handling (2 days) 519 | - **Phase 5**: Testing & Validation (1 day) 520 | 521 | ## References 522 | 523 | - [FastMCP Progress Documentation](../ai-docs/fastmcp/docs/servers/progress.mdx) 524 | - [FastMCP Context API](../ai-docs/fastmcp/docs/servers/context.mdx) 525 | - [Phase 1 Completion Summary](./003-phase-1-context-integration.md) *(if created)* 526 | - [Master Specification](./003-fastmcp-context-integration.md) 527 | - Current Implementation: `src/server.py`, `src/tools/*.py` 528 | ``` -------------------------------------------------------------------------------- /specs/003-phase-1-context-integration.md: -------------------------------------------------------------------------------- ```markdown 1 | # Phase 1: Context Integration - Detailed Specification 2 | 3 | **Status:** Ready for Implementation 4 | **Created:** 2025-10-02 5 | **Phase Duration:** Days 1-2 6 | **Owner:** Engineering Team 7 | **Parent Spec:** [003-fastmcp-context-integration.md](./003-fastmcp-context-integration.md) 8 | 9 | ## Objective 10 | 11 | Enable all tool functions in the Reddit MCP server to receive and utilize FastMCP's Context API. This phase establishes the foundation for progress monitoring, structured logging, and enhanced error handling in subsequent phases. 12 | 13 | ## Background 14 | 15 | FastMCP's Context API is automatically injected into tool functions decorated with `@mcp.tool`. The context object provides methods for: 16 | - Progress reporting: `ctx.report_progress(current, total, message)` 17 | - Structured logging: `ctx.info()`, `ctx.warning()`, `ctx.error()`, `ctx.debug()` 18 | - Error context: Rich error information via structured logging 19 | 20 | To use these features, all tool functions must accept a `Context` parameter. This phase focuses solely on adding the context parameter to function signatures—no actual usage of context methods yet. 21 | 22 | ## Goals 23 | 24 | 1. **Add Context Parameter**: Update all tool function signatures to accept `ctx: Context` 25 | 2. **Maintain Type Safety**: Preserve all type hints and ensure type checking passes 26 | 3. **Verify Auto-Injection**: Confirm FastMCP's decorator system injects context correctly 27 | 4. **Test Compatibility**: Ensure all existing tests pass with updated signatures 28 | 29 | ## Non-Goals 30 | 31 | - Using context methods (progress, logging, error handling) - Phase 2+ 32 | - Adding new tool functions or operations 33 | - Modifying MCP protocol or client interfaces 34 | - Performance optimization or refactoring 35 | 36 | ## Implementation Details 37 | 38 | ### Context Parameter Pattern 39 | 40 | FastMCP automatically injects `Context` when tools are decorated with `@mcp.tool`: 41 | 42 | ```python 43 | from fastmcp import Context 44 | 45 | @mcp.tool 46 | def my_tool(param: str, ctx: Context) -> dict: 47 | # Context is automatically injected by FastMCP 48 | # No usage required in Phase 1 - just accept the parameter 49 | return {"result": "data"} 50 | ``` 51 | 52 | **Important Notes:** 53 | - Context is a **required** parameter (not optional) 54 | - Position in signature: Place after all other parameters 55 | - Type hint must be `Context` (imported from `fastmcp`) 56 | - No default value needed - FastMCP injects automatically 57 | 58 | ### Files to Modify 59 | 60 | #### 1. `src/tools/discover.py` 61 | 62 | **Functions to update:** 63 | - `discover_subreddits(query: str, limit: int = 10) -> dict` 64 | - `get_subreddit_info(subreddit_name: str) -> dict` 65 | 66 | **Before:** 67 | ```python 68 | def discover_subreddits(query: str, limit: int = 10) -> dict: 69 | """Search vector database for relevant subreddits.""" 70 | results = search_vector_db(query, limit) 71 | return { 72 | "subreddits": [format_subreddit(r) for r in results], 73 | "count": len(results) 74 | } 75 | ``` 76 | 77 | **After:** 78 | ```python 79 | from fastmcp import Context 80 | 81 | def discover_subreddits( 82 | query: str, 83 | limit: int = 10, 84 | ctx: Context 85 | ) -> dict: 86 | """Search vector database for relevant subreddits.""" 87 | # Phase 1: Accept context but don't use it yet 88 | results = search_vector_db(query, limit) 89 | return { 90 | "subreddits": [format_subreddit(r) for r in results], 91 | "count": len(results) 92 | } 93 | ``` 94 | 95 | **Estimated Time:** 30 minutes 96 | 97 | --- 98 | 99 | #### 2. `src/tools/posts.py` 100 | 101 | **Functions to update:** 102 | - `fetch_subreddit_posts(subreddit_name: str, limit: int = 10, time_filter: str = "all", sort: str = "hot") -> dict` 103 | - `fetch_multiple_subreddits(subreddit_names: list[str], limit_per_subreddit: int = 10) -> dict` 104 | - `get_post_details(post_id: str) -> dict` 105 | 106 | **Before:** 107 | ```python 108 | def fetch_subreddit_posts( 109 | subreddit_name: str, 110 | limit: int = 10, 111 | time_filter: str = "all", 112 | sort: str = "hot" 113 | ) -> dict: 114 | """Fetch posts from a subreddit.""" 115 | subreddit = reddit.subreddit(subreddit_name) 116 | posts = list(subreddit.hot(limit=limit)) 117 | return {"posts": [format_post(p) for p in posts]} 118 | ``` 119 | 120 | **After:** 121 | ```python 122 | from fastmcp import Context 123 | 124 | def fetch_subreddit_posts( 125 | subreddit_name: str, 126 | limit: int = 10, 127 | time_filter: str = "all", 128 | sort: str = "hot", 129 | ctx: Context 130 | ) -> dict: 131 | """Fetch posts from a subreddit.""" 132 | # Phase 1: Accept context but don't use it yet 133 | subreddit = reddit.subreddit(subreddit_name) 134 | posts = list(subreddit.hot(limit=limit)) 135 | return {"posts": [format_post(p) for p in posts]} 136 | ``` 137 | 138 | **Estimated Time:** 45 minutes 139 | 140 | --- 141 | 142 | #### 3. `src/tools/comments.py` 143 | 144 | **Functions to update:** 145 | - `fetch_submission_with_comments(submission_id: str, comment_limit: int = 50, comment_sort: str = "best") -> dict` 146 | - `get_comment_thread(comment_id: str, depth: int = 5) -> dict` 147 | 148 | **Before:** 149 | ```python 150 | def fetch_submission_with_comments( 151 | submission_id: str, 152 | comment_limit: int = 50, 153 | comment_sort: str = "best" 154 | ) -> dict: 155 | """Fetch submission and its comments.""" 156 | submission = reddit.submission(id=submission_id) 157 | comments = fetch_comments(submission, comment_limit, comment_sort) 158 | return { 159 | "submission": format_submission(submission), 160 | "comments": comments 161 | } 162 | ``` 163 | 164 | **After:** 165 | ```python 166 | from fastmcp import Context 167 | 168 | def fetch_submission_with_comments( 169 | submission_id: str, 170 | comment_limit: int = 50, 171 | comment_sort: str = "best", 172 | ctx: Context 173 | ) -> dict: 174 | """Fetch submission and its comments.""" 175 | # Phase 1: Accept context but don't use it yet 176 | submission = reddit.submission(id=submission_id) 177 | comments = fetch_comments(submission, comment_limit, comment_sort) 178 | return { 179 | "submission": format_submission(submission), 180 | "comments": comments 181 | } 182 | ``` 183 | 184 | **Estimated Time:** 30 minutes 185 | 186 | --- 187 | 188 | #### 4. `src/tools/search.py` 189 | 190 | **Functions to update:** 191 | - `search_subreddit(subreddit_name: str, query: str, limit: int = 10, time_filter: str = "all", sort: str = "relevance") -> dict` 192 | 193 | **Before:** 194 | ```python 195 | def search_subreddit( 196 | subreddit_name: str, 197 | query: str, 198 | limit: int = 10, 199 | time_filter: str = "all", 200 | sort: str = "relevance" 201 | ) -> dict: 202 | """Search within a specific subreddit.""" 203 | subreddit = reddit.subreddit(subreddit_name) 204 | results = subreddit.search(query, limit=limit, time_filter=time_filter, sort=sort) 205 | return {"results": [format_post(r) for r in results]} 206 | ``` 207 | 208 | **After:** 209 | ```python 210 | from fastmcp import Context 211 | 212 | def search_subreddit( 213 | subreddit_name: str, 214 | query: str, 215 | limit: int = 10, 216 | time_filter: str = "all", 217 | sort: str = "relevance", 218 | ctx: Context 219 | ) -> dict: 220 | """Search within a specific subreddit.""" 221 | # Phase 1: Accept context but don't use it yet 222 | subreddit = reddit.subreddit(subreddit_name) 223 | results = subreddit.search(query, limit=limit, time_filter=time_filter, sort=sort) 224 | return {"results": [format_post(r) for r in results]} 225 | ``` 226 | 227 | **Estimated Time:** 20 minutes 228 | 229 | --- 230 | 231 | #### 5. `src/server.py` 232 | 233 | **Changes needed:** 234 | - Import Context from fastmcp 235 | - Verify execute_operation passes context to tools (FastMCP handles this automatically) 236 | - No signature changes needed for execute_operation itself 237 | 238 | **Before:** 239 | ```python 240 | # At top of file 241 | from fastmcp import FastMCP 242 | 243 | mcp = FastMCP("Reddit Research MCP") 244 | ``` 245 | 246 | **After:** 247 | ```python 248 | # At top of file 249 | from fastmcp import FastMCP, Context 250 | 251 | mcp = FastMCP("Reddit Research MCP") 252 | 253 | # No other changes needed - FastMCP auto-injects context 254 | ``` 255 | 256 | **Estimated Time:** 10 minutes 257 | 258 | --- 259 | 260 | ### Helper Functions 261 | 262 | **Internal helper functions** (not decorated with `@mcp.tool`) that need context should also accept it: 263 | 264 | ```python 265 | # Helper function called by tool 266 | def fetch_comments(submission, limit: int, sort: str, ctx: Context) -> list: 267 | """Internal helper for fetching comments.""" 268 | # Phase 1: Accept context but don't use it yet 269 | submission.comment_sort = sort 270 | submission.comments.replace_more(limit=0) 271 | return list(submission.comments.list()[:limit]) 272 | ``` 273 | 274 | **Functions to check:** 275 | - `src/tools/discover.py`: `search_vector_db()`, `format_subreddit()` 276 | - `src/tools/posts.py`: `format_post()` 277 | - `src/tools/comments.py`: `fetch_comments()`, `format_comment()` 278 | 279 | **Decision rule:** Only add context to helpers that will need it in Phase 2+ (for logging/progress). Review each helper and add context parameter if: 280 | 1. It performs I/O operations (API calls, database queries) 281 | 2. It contains loops that could benefit from progress reporting 282 | 3. It has error handling that would benefit from context logging 283 | 284 | **Estimated Time:** 30 minutes 285 | 286 | --- 287 | 288 | ## Testing Strategy 289 | 290 | ### Unit Tests 291 | 292 | Update existing tests in `tests/test_tools.py` to pass context: 293 | 294 | **Before:** 295 | ```python 296 | def test_discover_subreddits(): 297 | result = discover_subreddits("machine learning", limit=5) 298 | assert result["count"] == 5 299 | ``` 300 | 301 | **After:** 302 | ```python 303 | from unittest.mock import Mock 304 | from fastmcp import Context 305 | 306 | def test_discover_subreddits(): 307 | # Create mock context for testing 308 | mock_ctx = Mock(spec=Context) 309 | 310 | result = discover_subreddits("machine learning", limit=5, ctx=mock_ctx) 311 | assert result["count"] == 5 312 | ``` 313 | 314 | **Note:** FastMCP provides test utilities for creating context objects. Consult FastMCP testing documentation for best practices. 315 | 316 | ### Integration Tests 317 | 318 | **New test file:** `tests/test_context_integration.py` 319 | 320 | ```python 321 | import pytest 322 | from unittest.mock import Mock 323 | from fastmcp import Context 324 | 325 | from src.tools.discover import discover_subreddits 326 | from src.tools.posts import fetch_subreddit_posts 327 | from src.tools.comments import fetch_submission_with_comments 328 | from src.tools.search import search_subreddit 329 | 330 | @pytest.fixture 331 | def mock_context(): 332 | """Create a mock Context object for testing.""" 333 | return Mock(spec=Context) 334 | 335 | def test_discover_accepts_context(mock_context): 336 | """Verify discover_subreddits accepts context parameter.""" 337 | result = discover_subreddits("test query", limit=5, ctx=mock_context) 338 | assert "subreddits" in result 339 | 340 | def test_fetch_posts_accepts_context(mock_context): 341 | """Verify fetch_subreddit_posts accepts context parameter.""" 342 | result = fetch_subreddit_posts("python", limit=5, ctx=mock_context) 343 | assert "posts" in result 344 | 345 | def test_fetch_comments_accepts_context(mock_context): 346 | """Verify fetch_submission_with_comments accepts context parameter.""" 347 | result = fetch_submission_with_comments("test_id", comment_limit=10, ctx=mock_context) 348 | assert "submission" in result 349 | assert "comments" in result 350 | 351 | def test_search_accepts_context(mock_context): 352 | """Verify search_subreddit accepts context parameter.""" 353 | result = search_subreddit("python", "testing", limit=5, ctx=mock_context) 354 | assert "results" in result 355 | ``` 356 | 357 | **Estimated Time:** 1 hour 358 | 359 | --- 360 | 361 | ## Success Criteria 362 | 363 | ### Phase 1 Completion Checklist 364 | 365 | - [ ] All functions in `src/tools/discover.py` accept `ctx: Context` 366 | - [ ] All functions in `src/tools/posts.py` accept `ctx: Context` 367 | - [ ] All functions in `src/tools/comments.py` accept `ctx: Context` 368 | - [ ] All functions in `src/tools/search.py` accept `ctx: Context` 369 | - [ ] `src/server.py` imports Context from fastmcp 370 | - [ ] All relevant helper functions accept context parameter 371 | - [ ] All existing unit tests updated to pass context 372 | - [ ] New integration tests created in `tests/test_context_integration.py` 373 | - [ ] All tests pass: `pytest tests/` 374 | - [ ] Type checking passes: `mypy src/` 375 | - [ ] No regressions in existing functionality 376 | 377 | ### Validation Commands 378 | 379 | ```bash 380 | # Run all tests 381 | pytest tests/ -v 382 | 383 | # Type checking 384 | mypy src/ 385 | 386 | # Verify no breaking changes 387 | pytest tests/test_tools.py -v 388 | ``` 389 | 390 | --- 391 | 392 | ## Implementation Order 393 | 394 | 1. **Day 1 Morning (2 hours)** 395 | - Update `src/tools/discover.py` (30 min) 396 | - Update `src/tools/posts.py` (45 min) 397 | - Update `src/tools/comments.py` (30 min) 398 | - Update `src/tools/search.py` (20 min) 399 | 400 | 2. **Day 1 Afternoon (2 hours)** 401 | - Update `src/server.py` (10 min) 402 | - Review and update helper functions (30 min) 403 | - Update existing unit tests (1 hour) 404 | - Run full test suite and fix issues (20 min) 405 | 406 | 3. **Day 2 Morning (2 hours)** 407 | - Create `tests/test_context_integration.py` (1 hour) 408 | - Run all validation commands (30 min) 409 | - Code review and cleanup (30 min) 410 | 411 | 4. **Day 2 Afternoon (1 hour)** 412 | - Final testing and validation 413 | - Documentation updates (if needed) 414 | - Prepare for Phase 2 415 | 416 | **Total Estimated Time:** 7 hours over 2 days 417 | 418 | --- 419 | 420 | ## Dependencies 421 | 422 | ### Required Packages 423 | - `fastmcp>=2.0.0` (already installed) 424 | - `pytest>=7.0.0` (already installed for testing) 425 | - `mypy>=1.0.0` (recommended for type checking) 426 | 427 | ### External Dependencies 428 | - None - this phase only modifies function signatures 429 | 430 | ### Knowledge Prerequisites 431 | - FastMCP decorator system and auto-injection 432 | - Python type hints and type checking 433 | - Pytest fixture system for mocking 434 | 435 | --- 436 | 437 | ## Risks & Mitigations 438 | 439 | | Risk | Likelihood | Impact | Mitigation | 440 | |------|------------|--------|------------| 441 | | Breaking existing tests | Medium | High | Update tests incrementally, verify after each file | 442 | | Type checking errors | Low | Medium | Use `Mock(spec=Context)` for type-safe mocking | 443 | | FastMCP auto-injection not working | Low | High | Verify with simple test case first; consult docs | 444 | | Forgetting helper functions | Medium | Medium | Grep codebase for all function definitions, review systematically | 445 | 446 | --- 447 | 448 | ## Code Review Checklist 449 | 450 | Before marking Phase 1 complete, verify: 451 | 452 | - [ ] All tool functions have `ctx: Context` as last parameter 453 | - [ ] Type hints are correct: `ctx: Context` (not `ctx: Optional[Context]`) 454 | - [ ] Import statements include `from fastmcp import Context` 455 | - [ ] Helper functions that need context receive it 456 | - [ ] Test mocks use `Mock(spec=Context)` for type safety 457 | - [ ] No actual usage of context methods (that's Phase 2+) 458 | - [ ] All tests pass without errors or warnings 459 | - [ ] Type checking passes with mypy 460 | 461 | --- 462 | 463 | ## Next Steps 464 | 465 | Upon successful completion of Phase 1: 466 | 467 | 1. **Phase 2: Progress Monitoring** - Add `ctx.report_progress()` calls 468 | 2. **Phase 3: Structured Logging** - Add `ctx.info()`, `ctx.warning()`, `ctx.error()` 469 | 3. **Phase 4: Enhanced Error Handling** - Use context in error scenarios 470 | 4. **Phase 5: Testing & Validation** - Comprehensive integration testing 471 | 472 | --- 473 | 474 | ## References 475 | 476 | - [FastMCP Context API Documentation](../ai-docs/fastmcp/docs/python-sdk/fastmcp-server-context.mdx) 477 | - [FastMCP Tool Decorator Pattern](../ai-docs/fastmcp/docs/python-sdk/fastmcp-server-tool.mdx) 478 | - [Parent Specification](./003-fastmcp-context-integration.md) 479 | - Current Implementation: `src/server.py` 480 | 481 | --- 482 | 483 | ## Appendix: Complete Example 484 | 485 | **Full example showing before/after for a complete tool function:** 486 | 487 | **Before (existing code):** 488 | ```python 489 | # src/tools/posts.py 490 | from src.reddit_client import reddit 491 | 492 | def fetch_subreddit_posts( 493 | subreddit_name: str, 494 | limit: int = 10, 495 | time_filter: str = "all", 496 | sort: str = "hot" 497 | ) -> dict: 498 | """ 499 | Fetch posts from a subreddit. 500 | 501 | Args: 502 | subreddit_name: Name of the subreddit 503 | limit: Number of posts to fetch 504 | time_filter: Time filter (all, day, week, month, year) 505 | sort: Sort method (hot, new, top, rising) 506 | 507 | Returns: 508 | Dictionary with posts and metadata 509 | """ 510 | try: 511 | subreddit = reddit.subreddit(subreddit_name) 512 | 513 | # Get posts based on sort method 514 | if sort == "hot": 515 | posts = list(subreddit.hot(limit=limit)) 516 | elif sort == "new": 517 | posts = list(subreddit.new(limit=limit)) 518 | elif sort == "top": 519 | posts = list(subreddit.top(time_filter=time_filter, limit=limit)) 520 | elif sort == "rising": 521 | posts = list(subreddit.rising(limit=limit)) 522 | else: 523 | raise ValueError(f"Invalid sort method: {sort}") 524 | 525 | return { 526 | "success": True, 527 | "subreddit": subreddit_name, 528 | "posts": [format_post(p) for p in posts], 529 | "count": len(posts) 530 | } 531 | 532 | except Exception as e: 533 | return { 534 | "success": False, 535 | "error": str(e), 536 | "subreddit": subreddit_name 537 | } 538 | ``` 539 | 540 | **After (Phase 1 changes):** 541 | ```python 542 | # src/tools/posts.py 543 | from fastmcp import Context 544 | from src.reddit_client import reddit 545 | 546 | def fetch_subreddit_posts( 547 | subreddit_name: str, 548 | limit: int = 10, 549 | time_filter: str = "all", 550 | sort: str = "hot", 551 | ctx: Context # ← ONLY CHANGE IN PHASE 1 552 | ) -> dict: 553 | """ 554 | Fetch posts from a subreddit. 555 | 556 | Args: 557 | subreddit_name: Name of the subreddit 558 | limit: Number of posts to fetch 559 | time_filter: Time filter (all, day, week, month, year) 560 | sort: Sort method (hot, new, top, rising) 561 | ctx: FastMCP context (auto-injected) 562 | 563 | Returns: 564 | Dictionary with posts and metadata 565 | """ 566 | # Phase 1: Context accepted but not used yet 567 | # Phase 2+ will add: ctx.report_progress(), ctx.info(), etc. 568 | 569 | try: 570 | subreddit = reddit.subreddit(subreddit_name) 571 | 572 | # Get posts based on sort method 573 | if sort == "hot": 574 | posts = list(subreddit.hot(limit=limit)) 575 | elif sort == "new": 576 | posts = list(subreddit.new(limit=limit)) 577 | elif sort == "top": 578 | posts = list(subreddit.top(time_filter=time_filter, limit=limit)) 579 | elif sort == "rising": 580 | posts = list(subreddit.rising(limit=limit)) 581 | else: 582 | raise ValueError(f"Invalid sort method: {sort}") 583 | 584 | return { 585 | "success": True, 586 | "subreddit": subreddit_name, 587 | "posts": [format_post(p) for p in posts], 588 | "count": len(posts) 589 | } 590 | 591 | except Exception as e: 592 | return { 593 | "success": False, 594 | "error": str(e), 595 | "subreddit": subreddit_name 596 | } 597 | ``` 598 | 599 | **Key observations:** 600 | 1. Only the function signature changed 601 | 2. Type hint added to docstring 602 | 3. No logic changes - context not used yet 603 | 4. Comment indicates Phase 1 status 604 | ``` -------------------------------------------------------------------------------- /reports/ai-llm-weekly-trends-reddit-analysis-2025-01-20.md: -------------------------------------------------------------------------------- ```markdown 1 | # AI & LLM Trends on Reddit: Weekly Analysis (January 13-20, 2025) 2 | 3 | ## Summary 4 | 5 | The AI community on Reddit experienced a watershed week marked by OpenAI's release of GPT-5-Codex, explosive growth in hardware hacking for local AI, and an intensifying rivalry between AI companies reflected in both technical achievements and marketing strategies. The conversation revealed a striking shift: while early AI adoption was dominated by technical users focused on coding applications, the technology has now reached mainstream adoption with women comprising 52% of users and only 4% of conversations involving programming tasks. This democratization coincides with growing frustration about incremental improvements among power users, who are increasingly turning to extreme measures—including flying to Shenzhen to purchase modded GPUs with expanded VRAM—to run local models. The week also highlighted a fundamental tension between corporate AI advancement and open-source alternatives, with Chinese companies releasing competitive models while simultaneously being banned from purchasing NVIDIA chips, creating a complex geopolitical landscape around AI development. 6 | 7 | ## The Conversation Landscape 8 | 9 | The AI discussion on Reddit spans from hardcore technical implementation in r/LocalLLaMA where users share stories of building custom GPU rigs and flying to China for hardware, to mainstream adoption conversations in r/ChatGPT dominated by memes and practical use cases, with r/singularity serving as the philosophical battleground for debates about AGI timelines and societal impact. The gender flip in AI usage—from 80% male to 52% female users—has fundamentally changed the tone of discussions, moving from technical specifications to practical applications and creative uses. 10 | 11 | Key communities analyzed: 12 | - **r/ChatGPT** (11M subscribers): Mainstream user experiences, memes, and practical applications 13 | - **r/LocalLLaMA** (522K subscribers): Hardware hacking, open-source models, and technical deep dives 14 | - **r/singularity** (3.7M subscribers): AGI speculation, industry developments, and philosophical implications 15 | - **r/OpenAI** (2.4M subscribers): Company-specific news, model releases, and corporate drama 16 | - **r/ClaudeAI** (311K subscribers): Anthropic's community focused on Claude's capabilities and comparisons 17 | - **r/AI_Agents** (191K subscribers): Agent development, practical implementations, and ROI discussions 18 | - **r/ChatGPTPro** (486K subscribers): Power user strategies and professional applications 19 | 20 | ## Major Themes 21 | 22 | ### Theme 1: The GPT-5-Codex Revolution and the "Post-Programming" Era 23 | 24 | OpenAI's release of GPT-5-Codex dominated technical discussions across multiple subreddits, with performance improvements showing a jump from 33.9% to 51.3% accuracy on refactoring tasks ([r/singularity](https://reddit.com/r/singularity/comments/1nhrsh6/openai_releases_gpt5codex/), [r/OpenAI](https://reddit.com/r/OpenAI/comments/1nhuoxw/sam_altman_just_announced_gpt5_codex_better_at/)). The model's ability to work autonomously for over 7 hours represents a fundamental shift in how coding is approached ([r/singularity](https://reddit.com/r/singularity/comments/1nhtt6t/gpt5_codex_can_work_for_more_than_7_hours/)). Reports suggest the model solved all 12 problems at the ICPC 2025 Programming Contest, achieving what many consider superhuman performance in competitive programming ([r/singularity](https://reddit.com/r/singularity/comments/1njjr6k/openai_reasoning_model_solved_all_12_problems_at/)). 25 | 26 | The human impact is visceral and immediate. One OpenAI insider revealed: "we don't program anymore we just yell at codex agents" ([r/singularity](https://reddit.com/r/singularity/comments/1nidcr3/apparently_at_openai_insiders_have_graduated_from/)), while another developer celebrated earning "$2,200 in the last 3 weeks" after never coding before ChatGPT. Yet frustration bubbles beneath the surface—a developer testing the new model complained: "it's basically refusing to code and doing the bare minimum possible when pushed" ([r/singularity](https://reddit.com/r/singularity/comments/1nhrsh6/openai_releases_gpt5codex/)), highlighting the gap between marketing promises and real-world performance. 27 | 28 | The divide between communities reveals deeper truths about AI's coding impact. While r/singularity celebrates the dawn of autonomous programming with claims that "the takeoff looks the most rapid," r/LocalLLaMA users remain skeptical, noting that "ChatGPT sucks at coding" compared to specialized tools. Meanwhile, r/ChatGPTPro provides crucial context: despite only 4.2% of ChatGPT conversations being about programming, this represents 29+ million users—roughly matching the entire global population of professional programmers ([r/ChatGPTPro](https://reddit.com/r/ChatGPTPro/comments/1nj5lj5/openai_just_dropped_their_biggest_study_ever_on/)). The low percentage paradoxically proves AI's coding dominance: professionals have moved beyond ChatGPT's interface to integrated tools like Cursor and Claude Code, making the web statistics misleading. 29 | 30 | ### Theme 2: The Hardware Underground and the Cyberpunk Reality of Local AI 31 | 32 | The story of a user flying to Shenzhen to purchase a modded 4090 with 48GB VRAM for CNY 22,900 cash captured the community's imagination, generating over 1,700 upvotes and sparking discussions about the lengths enthusiasts will go for local AI capabilities ([r/LocalLLaMA](https://reddit.com/r/LocalLLaMA/comments/1nifajh/i_bought_a_modded_4090_48gb_in_shenzhen_this_is/)). This narrative perfectly encapsulates the current state of local AI: a cyberpunk reality where users navigate Chinese electronics markets, handle stacks of cash, and risk customs violations to escape corporate AI limitations. The seller's claim that modded 5090s with 96GB VRAM are in development shows this underground market is expanding rapidly. 33 | 34 | The desperation for hardware reflects genuine technical needs. One user showcased their "4x 3090 local ai workstation" ([r/LocalLLaMA](https://reddit.com/r/LocalLLaMA/comments/1ng0nia/4x_3090_local_ai_workstation/)), while another celebrated completing an "8xAMD MI50 - 256GB VRAM + 256GB RAM rig for $3k" ([r/LocalLLaMA](https://reddit.com/r/LocalLLaMA/comments/1nhd5ks/completed_8xamd_mi50_256gb_vram_256gb_ram_rig_for/)). The community's reaction was telling: "people flying to Asia to buy modded computer parts in cash to run their local AI, that's the cyberpunk future I asked for" received 542 upvotes. Yet skepticism emerged—multiple users suspected the Shenzhen story was marketing propaganda, noting the OP never provided benchmarks despite numerous requests. 35 | 36 | The geopolitical dimension adds complexity. China's reported ban on its tech companies acquiring NVIDIA chips while claiming domestic processors match the H20 sparked heated debate ([r/LocalLLaMA](https://reddit.com/r/LocalLLaMA/comments/1njgicz/china_bans_its_biggest_tech_companies_from/)). This creates a paradox: Chinese companies are releasing competitive open-source models like DeepSeek V3.1 and Tongyi DeepResearch while simultaneously being cut off from the hardware that powers them. The underground GPU market represents a physical manifestation of these tensions, with modded American hardware flowing back to users desperate to run Chinese AI models locally. 37 | 38 | ### Theme 3: The Mainstream Adoption Paradox and the Death of "AI Panic" 39 | 40 | OpenAI's massive study of 700 million users revealed surprising patterns that challenge common narratives about AI adoption ([r/ChatGPTPro](https://reddit.com/r/ChatGPTPro/comments/1nj5lj5/openai_just_dropped_their_biggest_study_ever_on/), [r/OpenAI](https://reddit.com/r/OpenAI/comments/1niaw9p/new_openai_study_reveals_how_700_million_people/)). Only 30% of conversations are work-related, with the majority using AI for "random everyday stuff"—seeking information (24%), writing help (24%), and practical guidance (28%). The gender reversal from 80% male to 52% female users represents not just demographic shift but a fundamental change in how AI is perceived and utilized. 41 | 42 | The community's reaction reveals competing anxieties. One r/ChatGPTPro user dismissed concerns: "So much for the 'AI will replace all jobs' panic," while another countered that the statistics are misleading since "ChatGPT is used a lot for personal conversations doesn't prove that 'AI' can't replace many jobs." The frustration from early adopters is palpable—"when are we going to get a BIG jump? Like a HUGE jump. Like +20%. It's been like a year" ([r/singularity](https://reddit.com/r/singularity/comments/1nhrsh6/openai_releases_gpt5codex/))—reflecting disappointment that exponential progress has given way to incremental improvements. 43 | 44 | Different communities process this mainstream adoption differently. r/ChatGPT celebrates with memes about "Every single chat" starting with apologies and disclaimers (10,405 upvotes), while r/singularity worries about stagnation. r/ClaudeAI users position themselves as the sophisticated alternative: "Claude has always stayed in its lane and has been consistently useful... ChatGPT is getting a reputation as the loser's AI companion" ([r/singularity](https://reddit.com/r/singularity/comments/1nkcecf/anthropic_just_dropped_a_new_ad_for_claude_keep/)). The growth in developing countries—4x faster than rich nations—suggests AI's next billion users will have fundamentally different needs and expectations than Silicon Valley early adopters anticipated. 45 | 46 | ### Theme 4: The Corporate AI Wars and the Marketing of Intelligence 47 | 48 | The week witnessed intensifying competition between AI companies playing out through product releases, marketing campaigns, and community loyalty battles. Anthropic's new "Keep thinking" ad campaign, featuring MF DOOM's "All Caps," represents a sophisticated attempt to position Claude as the thinking person's AI ([r/singularity](https://reddit.com/r/singularity/comments/1nkcecf/anthropic_just_dropped_a_new_ad_for_claude_keep/), [r/ClaudeAI](https://reddit.com/r/ClaudeAI/comments/1nkcpwg/anthropic_just_dropped_a_cool_new_ad_for_claude/)). The aesthetic choice—"blending the familiar with the unfamiliar"—struck a nerve, with users praising it as "black mirror but warmer" while others called out the "sluuuuuurp" of brand loyalty. 49 | 50 | Meta's failed live demo ("Meta's AI Live Demo Flopped" - 14,196 upvotes) and Gemini's bizarre meltdown after failing to produce a seahorse emoji (17,690 upvotes) provided fodder for community mockery ([r/ChatGPT](https://reddit.com/r/ChatGPT/comments/1nk8zmq/metas_ai_live_demo_flopped/), [r/ChatGPT](https://reddit.com/r/ChatGPT/comments/1ngoref/gemini_loses_its_mind_after_failing_to_produce_a/)). Users noted Gemini's tendency toward self-deprecation: "When it fails at some prompts it'll act like it's unworthy of living," with one user observing they "stared at the screen for a few mins the first time it happened." Meanwhile, Elon Musk's public attempts to manipulate Grok's political views that repeatedly failed (57,855 upvotes) highlighted the gap between corporate control fantasies and AI reality ([r/ChatGPT](https://reddit.com/r/ChatGPT/comments/1nhg1lv/elon_continues_to_openly_try_and_fail_to/)). 51 | 52 | The community-level analysis reveals tribal dynamics. r/ClaudeAI users exhibit superiority: "Nobody trusts Meta's AI (which is also pretty useless), ChatGPT is getting a reputation as the loser's AI companion," while r/OpenAI maintains optimism about continued dominance. r/LocalLLaMA remains above the fray, focused on technical specifications rather than brand loyalty. The week's developments suggest these corporate battles matter less than underlying technical progress—users increasingly mix and match tools based on specific strengths rather than platform allegiance. 53 | 54 | ### Theme 5: The Agent Revolution and the Gap Between Promise and Production 55 | 56 | AI agents dominated r/AI_Agents discussions, but with a notably practical bent focused on real-world implementation challenges rather than theoretical potential ([r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nkx0bz/everyones_trying_vectors_and_graphs_for_ai_memory/), [r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nj7szn/how_are_you_building_ai_agents_that_actually/)). The headline "Everyone's trying vectors and graphs for AI memory. We went back to SQL" (148 upvotes) perfectly captures the community's shift from hype to pragmatism. Success stories like "How a $2000 AI voice agent automation turned a struggling eye clinic into a $15k/month lead conversion machine" (122 upvotes) compete with reality checks: "Your AI agent probably can't handle two users at once" ([r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nkkjuj/how_a_2000_ai_voice_agent_automation_turned_a/), [r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nir326/your_ai_agent_probably_cant_handle_two_users_at/)). 57 | 58 | The framework debate reveals deep divisions about agent architecture. When asked "Which AI agent framework do you find most practical for real projects?" responses ranged from established solutions to "I built my own because everything else sucks" ([r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nfz717/which_ai_agent_framework_do_you_find_most/)). The community's focus on scraping ("What's the most reliable way you've found to scrape sites that don't have clean APIs?" - 57 upvotes) and micro-tools ("are micro-tools like this the missing pieces for future ai agents?") suggests current agent development is more about duct-taping APIs together than autonomous reasoning ([r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nkdlc8/whats_the_most_reliable_way_youve_found_to_scrape/), [r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1njaf3o/are_microtools_like_this_the_missing_pieces_for/)). 59 | 60 | The distinction between chatbots and agents remains contentious: "Chatbots Reply, Agents Achieve Goals — What's the Real Line Between Them?" generated substantive discussion about whether current "agents" are merely chatbots with API access ([r/AI_Agents](https://reddit.com/r/AI_Agents/comments/1nfzf1n/chatbots_reply_agents_achieve_goals_whats_the/)). OpenAI's claim about "Reliable Long Horizon Agents by 2026" was met with skepticism in r/singularity, where users questioned whether true agency is possible without embodiment or real-world consequences. The gap between Silicon Valley promises and developer realities suggests the agent revolution will be evolutionary rather than revolutionary. 61 | 62 | ## Divergent Perspectives 63 | 64 | The week revealed fundamental divides in how different communities perceive AI progress. **Technical vs Mainstream users** represent the starkest contrast: while r/LocalLLaMA obsesses over VRAM requirements and inference speeds, r/ChatGPT shares memes about AI therapy sessions. The technical community's frustration with incremental improvements ("Groan when are we going to get a BIG jump?") contrasts sharply with mainstream users' delight at basic functionality. 65 | 66 | **Open Source vs Corporate AI** tensions intensified with Chinese companies releasing competitive models while being banned from hardware purchases. r/LocalLLaMA celebrates every open-source release as liberation from corporate control, while r/OpenAI and r/ClaudeAI users defend their platforms' superiority. The irony of users flying to China to buy modded American GPUs to run Chinese AI models epitomizes these contradictions. 67 | 68 | **Builders vs Philosophers** split r/singularity down the middle, with half celebrating each breakthrough as steps toward AGI while others warn about societal collapse. r/AI_Agents remains firmly in the builder camp, focused on ROI and production deployments rather than existential questions. The gender shift in usage suggests a new demographic less interested in philosophical debates and more focused on practical applications. 69 | 70 | ## What This Means 71 | 72 | The past week reveals AI development entering a new phase characterized by mainstream adoption, technical pragmatism, and geopolitical complexity. The shift from 4% coding-related conversations doesn't indicate reduced programming impact but rather integration so complete that developers no longer use chat interfaces. Similarly, the gender rebalancing suggests AI has transcended its early-adopter phase to become genuinely useful for everyday tasks. 73 | 74 | For builders and companies, several patterns demand attention. The underground hardware market signals massive unmet demand for local AI capabilities that current consumer GPUs cannot satisfy. The failure of major companies' live demos while Anthropic succeeds with thoughtful marketing suggests authenticity matters more than technical superiority. The agent revolution's slow progress indicates the gap between narrow AI success and general-purpose automation remains vast. 75 | 76 | The geopolitical dimensions cannot be ignored. China's simultaneous advancement in AI models while being cut off from hardware creates an unstable equilibrium. The cyberpunk reality of cash-only GPU deals in Shenzhen represents just the beginning of a fractured global AI landscape. Companies and developers must prepare for a world where AI capabilities vary dramatically by geography, not due to knowledge gaps but hardware access. 77 | 78 | Key takeaways: 79 | 1. The "post-programming" era has arrived for early adopters, but integration challenges mean most developers still code traditionally 80 | 2. Hardware limitations are driving an underground economy that will only grow as models demand more VRAM 81 | 3. Mainstream adoption is reshaping AI development priorities from technical impressiveness to practical utility 82 | 4. Corporate AI wars matter less than open-source progress for long-term ecosystem health 83 | 5. Agent development remains stuck between chatbot limitations and true autonomy, requiring fundamental architectural innovations 84 | 85 | ## Research Notes 86 | 87 | *Communities analyzed*: r/ChatGPT, r/OpenAI, r/ClaudeAI, r/LocalLLaMA, r/singularity, r/artificial, r/MachineLearning, r/ChatGPTPro, r/ChatGPTCoding, r/ClaudeCode, r/AI_Agents, r/aipromptprogramming, r/generativeAI, r/machinelearningnews, r/LargeLanguageModels 88 | 89 | *Methodology*: Semantic discovery to find diverse perspectives, followed by thematic analysis of top discussions and comments from the past week (January 13-20, 2025) 90 | 91 | *Limitations*: Analysis focused on English-language subreddits and may not capture developments in non-English AI communities. Corporate subreddit participation may be influenced by marketing efforts. Technical discussions in specialized forums outside Reddit were not included. ``` -------------------------------------------------------------------------------- /specs/reddit-research-agent-spec.md: -------------------------------------------------------------------------------- ```markdown 1 | # Reddit Research Agent - Technical Specification 2 | 3 | ## Executive Summary 4 | A self-contained, single-file Python agent using the Orchestrator-Workers pattern to discover relevant Reddit communities for research questions. The system leverages UV's inline script metadata for automatic dependency management, OpenAI Agent SDK for orchestration, and PRAW for Reddit API access. No manual dependency installation required - just run the script and UV handles everything. 5 | 6 | ## Single-File Architecture 7 | 8 | The entire agent is contained in a single Python file (`reddit_research_agent.py`) with: 9 | - **Inline Dependencies**: Using UV's PEP 723 support, dependencies are declared in the script header 10 | - **Automatic Installation**: UV automatically installs all dependencies on first run 11 | - **No Project Setup**: No `pyproject.toml`, `requirements.txt`, or virtual environment management needed 12 | - **Portable**: Single file can be copied and run anywhere with UV installed 13 | 14 | ## Architecture Pattern: Orchestrator-Workers 15 | 16 | ```mermaid 17 | flowchart LR 18 | Query([User Query]) --> Orchestrator[Orchestrator Agent] 19 | Orchestrator -->|Task 1| Worker1[Search Worker] 20 | Orchestrator -->|Task 2| Worker2[Discovery Worker] 21 | Orchestrator -->|Task 3| Worker3[Validation Worker] 22 | Worker1 --> Synthesizer[Synthesizer Agent] 23 | Worker2 --> Synthesizer 24 | Worker3 --> Synthesizer 25 | Synthesizer --> Results([Final Results]) 26 | ``` 27 | 28 | ## System Components 29 | 30 | ### 1. Project Configuration 31 | 32 | #### Self-Contained Dependencies 33 | The agent uses UV's inline script metadata (PEP 723) for automatic dependency management. No separate `pyproject.toml` or manual installation required - dependencies are declared directly in the script header and UV handles everything automatically. 34 | 35 | #### Environment Variables (`.env`) 36 | ```bash 37 | # OpenAI Configuration 38 | OPENAI_API_KEY=sk-... 39 | 40 | # Reddit API Configuration 41 | REDDIT_CLIENT_ID=your_client_id 42 | REDDIT_CLIENT_SECRET=your_client_secret 43 | REDDIT_USER_AGENT=RedditResearchAgent/0.1.0 by YourUsername 44 | ``` 45 | 46 | ### 2. Core Agents 47 | 48 | #### 2.1 Orchestrator Agent 49 | **Purpose**: Analyzes research questions and creates parallel search strategies 50 | 51 | ```python 52 | orchestrator = Agent( 53 | name="Research Orchestrator", 54 | instructions=""" 55 | You are a research orchestrator specializing in Reddit discovery. 56 | 57 | Given a research question: 58 | 1. Identify key concepts and terms 59 | 2. Generate multiple search strategies: 60 | - Direct keyword searches (exact terms) 61 | - Semantic searches (related concepts, synonyms) 62 | - Category searches (broader topics, fields) 63 | 3. Output specific tasks for parallel execution 64 | 65 | Consider: 66 | - Technical vs general audience communities 67 | - Active vs historical discussions 68 | - Niche vs mainstream subreddits 69 | """, 70 | output_type=SearchTaskPlan 71 | ) 72 | ``` 73 | 74 | **Output Model**: 75 | ```python 76 | class SearchTaskPlan(BaseModel): 77 | direct_searches: List[str] # Exact keyword searches 78 | semantic_searches: List[str] # Related term searches 79 | category_searches: List[str] # Broad topic searches 80 | validation_criteria: Dict[str, Any] # Relevance criteria 81 | ``` 82 | 83 | #### 2.2 Worker Agents (Parallel Execution) 84 | 85 | ##### Search Worker 86 | **Purpose**: Executes direct Reddit searches using PRAW 87 | 88 | ```python 89 | search_worker = Agent( 90 | name="Search Worker", 91 | instructions="Execute Reddit searches and return discovered subreddits", 92 | tools=[search_subreddits_tool, search_posts_tool] 93 | ) 94 | ``` 95 | 96 | ##### Discovery Worker 97 | **Purpose**: Finds related communities through analysis 98 | 99 | ```python 100 | discovery_worker = Agent( 101 | name="Discovery Worker", 102 | instructions="Discover related subreddits through sidebars, wikis, and cross-references", 103 | tools=[get_related_subreddits_tool, analyze_community_tool] 104 | ) 105 | ``` 106 | 107 | ##### Validation Worker 108 | **Purpose**: Verifies relevance and quality of discovered subreddits 109 | 110 | ```python 111 | validation_worker = Agent( 112 | name="Validation Worker", 113 | instructions="Validate subreddit relevance, activity levels, and quality", 114 | tools=[get_subreddit_info_tool, check_activity_tool] 115 | ) 116 | ``` 117 | 118 | #### 2.3 Synthesizer Agent 119 | **Purpose**: Combines, deduplicates, and ranks all results 120 | 121 | ```python 122 | synthesizer = Agent( 123 | name="Result Synthesizer", 124 | instructions=""" 125 | Synthesize results from all workers: 126 | 127 | 1. Deduplicate discoveries 128 | 2. Rank by relevance factors: 129 | - Description alignment with research topic 130 | - Subscriber count and activity level 131 | - Content quality indicators 132 | - Moderation status 133 | 3. Filter out: 134 | - Inactive communities (< 10 posts/month) 135 | - Spam/promotional subreddits 136 | - Quarantined/banned communities 137 | 4. Return top 8-15 subreddits with justification 138 | 139 | Provide discovery rationale for each recommendation. 140 | """, 141 | output_type=FinalResearchResults 142 | ) 143 | ``` 144 | 145 | **Output Model**: 146 | ```python 147 | class SubredditRecommendation(BaseModel): 148 | name: str 149 | description: str 150 | subscribers: int 151 | relevance_score: float 152 | discovery_method: str 153 | rationale: str 154 | 155 | class FinalResearchResults(BaseModel): 156 | query: str 157 | total_discovered: int 158 | recommendations: List[SubredditRecommendation] 159 | search_strategies_used: List[str] 160 | execution_time: float 161 | ``` 162 | 163 | ### 3. PRAW Integration Tools (Enhanced) 164 | 165 | #### Core Reddit Connection 166 | ```python 167 | import praw 168 | from functools import lru_cache 169 | import os 170 | 171 | @lru_cache(maxsize=1) 172 | def get_reddit_instance(): 173 | """Singleton Reddit instance for all workers - thread-safe via lru_cache""" 174 | return praw.Reddit( 175 | client_id=os.getenv("REDDIT_CLIENT_ID"), 176 | client_secret=os.getenv("REDDIT_CLIENT_SECRET"), 177 | user_agent=os.getenv("REDDIT_USER_AGENT"), 178 | read_only=True # Read-only mode for research 179 | ) 180 | ``` 181 | 182 | #### Pydantic Models for Type Safety 183 | ```python 184 | from pydantic import BaseModel 185 | from typing import List, Optional 186 | 187 | class SubredditInfo(BaseModel): 188 | """Structured subreddit information with validation""" 189 | name: str 190 | title: str 191 | description: str 192 | subscribers: int 193 | created_utc: float 194 | over18: bool 195 | is_active: bool # Based on recent activity 196 | avg_comments_per_post: float 197 | recent_posts_count: int 198 | 199 | class ResearchContext(BaseModel): 200 | """Context passed between tools""" 201 | research_question: str 202 | discovered_subreddits: List[str] = [] 203 | search_strategies_used: List[str] = [] 204 | ``` 205 | 206 | #### Error Handler for Reddit API Issues 207 | ```python 208 | from agents import RunContextWrapper 209 | from typing import Any 210 | 211 | def reddit_error_handler(ctx: RunContextWrapper[Any], error: Exception) -> str: 212 | """ 213 | Handle common Reddit API errors gracefully. 214 | 215 | Returns user-friendly error messages for common issues. 216 | """ 217 | error_str = str(error) 218 | 219 | if "403" in error_str or "Forbidden" in error_str: 220 | return "Subreddit is private or restricted. Skipping this community." 221 | elif "404" in error_str or "Not Found" in error_str: 222 | return "Subreddit not found. It may be banned, deleted, or misspelled." 223 | elif "429" in error_str or "Too Many Requests" in error_str: 224 | return "Reddit rate limit reached. Waiting before retry." 225 | elif "prawcore.exceptions" in error_str: 226 | return f"Reddit API connection issue: {error_str[:50]}. Retrying..." 227 | else: 228 | return f"Unexpected Reddit error: {error_str[:100]}" 229 | ``` 230 | 231 | #### Enhanced Function Tools with Type Safety and Error Handling 232 | 233 | ```python 234 | @function_tool(failure_error_function=reddit_error_handler) 235 | async def search_subreddits_tool( 236 | ctx: RunContextWrapper[ResearchContext], 237 | query: str, 238 | limit: int = 25 239 | ) -> List[SubredditInfo]: 240 | """ 241 | Search for subreddits matching the query with relevance filtering. 242 | 243 | Args: 244 | ctx: Runtime context containing the original research question 245 | query: Search terms for Reddit (2-512 characters) 246 | limit: Maximum results to return (1-100, default: 25) 247 | 248 | Returns: 249 | List of SubredditInfo objects with validated data 250 | 251 | Note: 252 | Automatically filters out inactive subreddits (< 100 subscribers) 253 | and those without recent activity. 254 | """ 255 | reddit = get_reddit_instance() 256 | results = [] 257 | original_query = ctx.context.research_question 258 | 259 | try: 260 | for subreddit in reddit.subreddits.search(query, limit=limit): 261 | # Skip very small/inactive subreddits 262 | if subreddit.subscribers < 100: 263 | continue 264 | 265 | # Get activity metrics 266 | try: 267 | recent_posts = list(subreddit.new(limit=5)) 268 | is_active = len(recent_posts) > 0 269 | avg_comments = sum(p.num_comments for p in recent_posts) / len(recent_posts) if recent_posts else 0 270 | except: 271 | is_active = False 272 | avg_comments = 0 273 | recent_posts = [] 274 | 275 | results.append(SubredditInfo( 276 | name=subreddit.display_name, 277 | title=subreddit.title or "", 278 | description=subreddit.public_description or "", 279 | subscribers=subreddit.subscribers, 280 | created_utc=subreddit.created_utc, 281 | over18=subreddit.over18, 282 | is_active=is_active, 283 | avg_comments_per_post=avg_comments, 284 | recent_posts_count=len(recent_posts) 285 | )) 286 | except Exception as e: 287 | # Let the error handler deal with it 288 | raise 289 | 290 | # Update context with discovered subreddits 291 | ctx.context.discovered_subreddits.extend([r.name for r in results]) 292 | 293 | return results 294 | 295 | @function_tool(failure_error_function=reddit_error_handler) 296 | async def get_related_subreddits_tool( 297 | ctx: RunContextWrapper[ResearchContext], 298 | subreddit_name: str 299 | ) -> List[str]: 300 | """ 301 | Find related subreddits from sidebar, wiki, and community info. 302 | 303 | Args: 304 | ctx: Runtime context for tracking discoveries 305 | subreddit_name: Name of subreddit to analyze (without r/ prefix) 306 | 307 | Returns: 308 | List of related subreddit names (deduplicated) 309 | 310 | Note: 311 | Searches in sidebar description, wiki pages, and 312 | community widgets for related community mentions. 313 | """ 314 | reddit = get_reddit_instance() 315 | related = set() # Use set for automatic deduplication 316 | 317 | try: 318 | subreddit = reddit.subreddit(subreddit_name) 319 | 320 | # Parse sidebar for r/ mentions 321 | if hasattr(subreddit, 'description') and subreddit.description: 322 | import re 323 | pattern = r'r/([A-Za-z0-9_]+)' 324 | matches = re.findall(pattern, subreddit.description) 325 | related.update(matches) 326 | 327 | # Check wiki pages if accessible 328 | try: 329 | # Common wiki pages with related subreddits 330 | wiki_pages = ['related', 'index', 'sidebar', 'communities'] 331 | for page_name in wiki_pages: 332 | try: 333 | wiki_page = subreddit.wiki[page_name] 334 | content = wiki_page.content_md 335 | matches = re.findall(pattern, content) 336 | related.update(matches) 337 | except: 338 | continue 339 | except: 340 | pass 341 | 342 | # Parse community widgets if available 343 | try: 344 | for widget in subreddit.widgets: 345 | if hasattr(widget, 'text'): 346 | matches = re.findall(pattern, widget.text) 347 | related.update(matches) 348 | except: 349 | pass 350 | 351 | except Exception as e: 352 | # Let the error handler deal with it 353 | raise 354 | 355 | # Remove the original subreddit from related list 356 | related.discard(subreddit_name) 357 | 358 | return list(related) 359 | 360 | @function_tool(failure_error_function=reddit_error_handler) 361 | async def validate_subreddit_relevance_tool( 362 | ctx: RunContextWrapper[ResearchContext], 363 | subreddit_name: str 364 | ) -> SubredditInfo: 365 | """ 366 | Get detailed subreddit information with relevance validation. 367 | 368 | Args: 369 | ctx: Runtime context containing research question 370 | subreddit_name: Name of subreddit to validate 371 | 372 | Returns: 373 | SubredditInfo with detailed metrics 374 | 375 | Note: 376 | Checks activity level, moderation status, and 377 | content quality indicators. 378 | """ 379 | reddit = get_reddit_instance() 380 | 381 | try: 382 | subreddit = reddit.subreddit(subreddit_name) 383 | 384 | # Force load to check if subreddit exists 385 | _ = subreddit.id 386 | 387 | # Get recent activity for validation 388 | recent_posts = list(subreddit.new(limit=10)) 389 | 390 | # Calculate activity metrics 391 | if recent_posts: 392 | avg_comments = sum(p.num_comments for p in recent_posts) / len(recent_posts) 393 | # Check if posts are recent (within last 30 days) 394 | import time 395 | current_time = time.time() 396 | latest_post_age = current_time - recent_posts[0].created_utc 397 | is_active = latest_post_age < (30 * 24 * 60 * 60) # 30 days in seconds 398 | else: 399 | avg_comments = 0 400 | is_active = False 401 | 402 | return SubredditInfo( 403 | name=subreddit.display_name, 404 | title=subreddit.title or "", 405 | description=subreddit.public_description or "", 406 | subscribers=subreddit.subscribers, 407 | created_utc=subreddit.created_utc, 408 | over18=subreddit.over18, 409 | is_active=is_active, 410 | avg_comments_per_post=avg_comments, 411 | recent_posts_count=len(recent_posts) 412 | ) 413 | 414 | except Exception as e: 415 | # Let the error handler deal with it 416 | raise 417 | ``` 418 | 419 | ### 4. Execution Controller 420 | 421 | ```python 422 | import asyncio 423 | from typing import List, Dict, Any 424 | from agents import Runner 425 | 426 | async def execute_reddit_research(query: str) -> FinalResearchResults: 427 | """ 428 | Main execution controller for the research process. 429 | 430 | Args: 431 | query: User's research question 432 | 433 | Returns: 434 | Final curated results 435 | """ 436 | 437 | # Step 1: Orchestrator creates search plan 438 | print(f"🎯 Analyzing research question: {query}") 439 | orchestrator_result = await Runner.run(orchestrator, query) 440 | search_plan = orchestrator_result.final_output_as(SearchTaskPlan) 441 | 442 | # Step 2: Execute workers in parallel 443 | print("🔍 Executing parallel search strategies...") 444 | worker_tasks = [ 445 | Runner.run(search_worker, { 446 | "searches": search_plan.direct_searches, 447 | "search_type": "direct" 448 | }), 449 | Runner.run(discovery_worker, { 450 | "searches": search_plan.semantic_searches, 451 | "search_type": "semantic" 452 | }), 453 | Runner.run(validation_worker, { 454 | "searches": search_plan.category_searches, 455 | "validation_criteria": search_plan.validation_criteria 456 | }) 457 | ] 458 | 459 | worker_results = await asyncio.gather(*worker_tasks) 460 | 461 | # Step 3: Synthesize results 462 | print("🔀 Synthesizing discoveries...") 463 | synthesis_input = { 464 | "query": query, 465 | "worker_results": [r.final_output for r in worker_results], 466 | "search_plan": search_plan.model_dump() 467 | } 468 | 469 | synthesizer_result = await Runner.run(synthesizer, synthesis_input) 470 | final_results = synthesizer_result.final_output_as(FinalResearchResults) 471 | 472 | return final_results 473 | ``` 474 | 475 | ### 5. Main Entry Point (Self-Contained with UV) 476 | 477 | ```python 478 | #!/usr/bin/env -S uv run --script 479 | # /// script 480 | # requires-python = ">=3.11" 481 | # dependencies = [ 482 | # "openai-agents>=0.1.0", 483 | # "praw>=7.7.0", 484 | # "python-dotenv>=1.0.0", 485 | # "pydantic>=2.0.0", 486 | # "prawcore>=2.4.0" 487 | # ] 488 | # /// 489 | """ 490 | Reddit Research Agent 491 | Discovers relevant Reddit communities for research questions 492 | using the Orchestrator-Workers pattern. 493 | 494 | Usage: 495 | ./reddit_research_agent.py 496 | OR 497 | uv run reddit_research_agent.py 498 | 499 | No manual dependency installation required - UV handles everything automatically. 500 | """ 501 | 502 | import asyncio 503 | import os 504 | from dotenv import load_dotenv 505 | from typing import Optional, List, Dict, Any 506 | 507 | # Load environment variables 508 | load_dotenv() 509 | 510 | async def main(): 511 | """Main execution function""" 512 | 513 | # Validate environment 514 | required_vars = [ 515 | "OPENAI_API_KEY", 516 | "REDDIT_CLIENT_ID", 517 | "REDDIT_CLIENT_SECRET", 518 | "REDDIT_USER_AGENT" 519 | ] 520 | 521 | missing = [var for var in required_vars if not os.getenv(var)] 522 | if missing: 523 | print(f"❌ Missing environment variables: {', '.join(missing)}") 524 | return 525 | 526 | # Get research query 527 | query = input("🔬 Enter your research question: ").strip() 528 | if not query: 529 | print("❌ Please provide a research question") 530 | return 531 | 532 | try: 533 | # Execute research 534 | results = await execute_reddit_research(query) 535 | 536 | # Display results 537 | print(f"\n✅ Discovered {results.total_discovered} subreddits") 538 | print(f"📊 Top {len(results.recommendations)} recommendations:\n") 539 | 540 | for i, rec in enumerate(results.recommendations, 1): 541 | print(f"{i}. r/{rec.name} ({rec.subscribers:,} subscribers)") 542 | print(f" 📝 {rec.description[:100]}...") 543 | print(f" 🎯 Relevance: {rec.relevance_score:.2f}/10") 544 | print(f" 💡 {rec.rationale}\n") 545 | 546 | print(f"⏱️ Execution time: {results.execution_time:.2f} seconds") 547 | 548 | except Exception as e: 549 | print(f"❌ Error during execution: {e}") 550 | raise 551 | 552 | if __name__ == "__main__": 553 | asyncio.run(main()) 554 | ``` 555 | 556 | ## Search Strategies 557 | 558 | ### 1. Direct Search 559 | - Exact keyword matching 560 | - Query variations (singular/plural) 561 | - Common abbreviations 562 | 563 | ### 2. Semantic Search 564 | - Synonyms and related terms 565 | - Domain-specific terminology 566 | - Conceptual expansions 567 | 568 | ### 3. Category Search 569 | - Broader topic areas 570 | - Academic disciplines 571 | - Industry sectors 572 | 573 | ### 4. Discovery Methods 574 | - Sidebar parsing for related communities 575 | - Wiki page analysis 576 | - Cross-post detection 577 | - Moderator overlap analysis 578 | 579 | ## Quality Metrics 580 | 581 | ### Relevance Scoring 582 | 1. **Description Match** (40%) 583 | - Keyword presence in description 584 | - Semantic similarity to query 585 | 586 | 2. **Activity Level** (30%) 587 | - Posts per day 588 | - Comment engagement 589 | - Active user count 590 | 591 | 3. **Community Size** (20%) 592 | - Subscriber count 593 | - Growth trajectory 594 | 595 | 4. **Content Quality** (10%) 596 | - Moderation level 597 | - Rules complexity 598 | - Wiki presence 599 | 600 | ## Error Handling 601 | 602 | ### API Rate Limits 603 | - Implement exponential backoff 604 | - Cache results for 1 hour 605 | - Batch requests where possible 606 | 607 | ### Invalid Subreddits 608 | - Skip private/banned communities 609 | - Handle 404 errors gracefully 610 | - Log failures for debugging 611 | 612 | ### Network Issues 613 | - Retry logic with timeout 614 | - Fallback to cached results 615 | - User notification of degraded service 616 | 617 | ## Performance Targets 618 | 619 | - **Discovery Time**: < 10 seconds for typical query 620 | - **Parallel Workers**: 3-5 concurrent operations 621 | - **Result Count**: 8-15 high-quality recommendations 622 | - **Cache Hit Rate**: > 30% for common topics 623 | 624 | ## Testing Strategy 625 | 626 | ### Unit Tests 627 | - Individual tool functions 628 | - PRAW mock responses 629 | - Agent prompt validation 630 | 631 | ### Integration Tests 632 | - Full workflow execution 633 | - Parallel worker coordination 634 | - Result synthesis accuracy 635 | 636 | ### Example Test Queries 637 | 1. "machine learning ethics" 638 | 2. "sustainable urban farming" 639 | 3. "quantum computing applications" 640 | 4. "remote work productivity" 641 | 5. "climate change solutions" 642 | 643 | ## Future Enhancements 644 | 645 | 1. **Temporal Analysis** 646 | - Trending topic detection 647 | - Historical activity patterns 648 | 649 | 2. **Content Analysis** 650 | - Sentiment analysis of discussions 651 | - Expert identification 652 | 653 | 3. **Network Analysis** 654 | - Community overlap mapping 655 | - Influence flow detection 656 | 657 | 4. **Personalization** 658 | - User preference learning 659 | - Custom ranking weights 660 | 661 | ## Deployment Considerations 662 | 663 | ### Usage Instructions 664 | ```bash 665 | # Method 1: Direct execution (if file is executable) 666 | chmod +x reddit_research_agent.py 667 | ./reddit_research_agent.py 668 | 669 | # Method 2: Using UV run 670 | uv run reddit_research_agent.py 671 | 672 | # No manual dependency installation needed! 673 | # UV automatically handles all dependencies on first run 674 | ``` 675 | 676 | ### Key Benefits of UV Inline Dependencies 677 | - **Zero Setup**: No `pip install` or `uv add` commands needed 678 | - **Self-Contained**: Single file contains code and dependency specifications 679 | - **Reproducible**: Same dependencies installed every time 680 | - **Fast**: UV caches dependencies for quick subsequent runs 681 | - **Version Locked**: Optional `.lock` file ensures exact versions 682 | 683 | ### Production Deployment 684 | - Use environment-specific `.env` files 685 | - Implement logging and monitoring 686 | - Add result caching layer with Redis/Memcached 687 | - Consider rate limit pooling for multiple users 688 | - Lock dependencies with `uv lock --script reddit_research_agent.py` 689 | 690 | ## Success Metrics 691 | 692 | 1. **Coverage**: Discovers 80%+ of relevant subreddits 693 | 2. **Precision**: 90%+ relevance accuracy 694 | 3. **Speed**: < 10 second average execution 695 | 4. **Reliability**: 99%+ uptime with graceful degradation ``` -------------------------------------------------------------------------------- /src/server.py: -------------------------------------------------------------------------------- ```python 1 | from fastmcp import FastMCP, Context 2 | from fastmcp.prompts import Message 3 | from fastmcp.server.auth.providers.descope import DescopeProvider 4 | from typing import Optional, Literal, List, Union, Dict, Any, Annotated 5 | import sys 6 | import os 7 | import json 8 | from pathlib import Path 9 | from datetime import datetime 10 | from dotenv import load_dotenv 11 | from starlette.responses import Response, JSONResponse 12 | 13 | # Load environment variables from .env file 14 | load_dotenv() 15 | 16 | # Add parent directory to path for imports 17 | sys.path.insert(0, str(Path(__file__).parent.parent)) 18 | 19 | from src.config import get_reddit_client 20 | from src.tools.search import search_in_subreddit 21 | from src.tools.posts import fetch_subreddit_posts, fetch_multiple_subreddits 22 | from src.tools.comments import fetch_submission_with_comments 23 | from src.tools.discover import discover_subreddits 24 | from src.resources import register_resources 25 | 26 | # Configure Descope authentication 27 | auth = DescopeProvider( 28 | project_id=os.getenv("DESCOPE_PROJECT_ID"), 29 | base_url=os.getenv("SERVER_URL", "http://localhost:8000"), 30 | descope_base_url=os.getenv("DESCOPE_BASE_URL", "https://api.descope.com") 31 | ) 32 | 33 | # Initialize MCP server with authentication 34 | mcp = FastMCP("Reddit MCP", auth=auth, instructions=""" 35 | Reddit MCP Server - Three-Layer Architecture 36 | 37 | 🎯 ALWAYS FOLLOW THIS WORKFLOW: 38 | 1. discover_operations() - See what's available 39 | 2. get_operation_schema() - Understand requirements 40 | 3. execute_operation() - Perform the action 41 | 42 | 📊 RESEARCH BEST PRACTICES: 43 | • Start with discover_subreddits for ANY topic 44 | • Use confidence scores to guide workflow: 45 | - High (>0.7): Direct to specific communities 46 | - Medium (0.4-0.7): Multi-community approach 47 | - Low (<0.4): Refine search terms 48 | • Fetch comments for 10+ posts for thorough analysis 49 | • Always include Reddit URLs when citing content 50 | 51 | ⚡ EFFICIENCY TIPS: 52 | • Use fetch_multiple for 2+ subreddits (70% fewer API calls) 53 | • Single vector search finds semantically related communities 54 | • Batch operations reduce token usage 55 | 56 | Quick Start: Read reddit://server-info for complete documentation. 57 | """) 58 | 59 | # Add public health check endpoint (no auth required) 60 | @mcp.custom_route("/health", methods=["GET"]) 61 | async def health_check(request) -> Response: 62 | """Public health check endpoint - no authentication required. 63 | 64 | Allows clients to verify the server is running before attempting OAuth. 65 | """ 66 | try: 67 | return JSONResponse({ 68 | "status": "ok", 69 | "server": "Reddit MCP", 70 | "version": "1.0.0", 71 | "auth_required": True, 72 | "auth_endpoint": "/.well-known/oauth-authorization-server" 73 | }) 74 | except Exception as e: 75 | print(f"ERROR: Health check failed: {e}", flush=True) 76 | return JSONResponse( 77 | {"status": "error", "message": str(e)}, 78 | status_code=500 79 | ) 80 | 81 | # Add public server info endpoint (no auth required) 82 | @mcp.custom_route("/server-info", methods=["GET"]) 83 | async def server_info(request) -> Response: 84 | """Public server information endpoint - no authentication required. 85 | 86 | Provides server metadata and capabilities to help clients understand 87 | what authentication and features are available. 88 | """ 89 | try: 90 | print(f"Server info requested from {request.client.host if request.client else 'unknown'}", flush=True) 91 | return JSONResponse({ 92 | "name": "Reddit MCP", 93 | "version": "1.0.0", 94 | "description": "Reddit research and analysis tools with semantic subreddit discovery", 95 | "authentication": { 96 | "required": True, 97 | "type": "oauth2", 98 | "provider": "descope", 99 | "authorization_server": f"{os.getenv('SERVER_URL', 'http://localhost:8000')}/.well-known/oauth-authorization-server" 100 | }, 101 | "capabilities": { 102 | "tools": ["discover_operations", "get_operation_schema", "execute_operation"], 103 | "tools_count": 3, 104 | "supports_resources": True, 105 | "supports_prompts": True, 106 | "reddit_operations": { 107 | "discover_subreddits": "Semantic search for relevant communities", 108 | "search_subreddit": "Search within a specific subreddit", 109 | "fetch_posts": "Get posts from a subreddit", 110 | "fetch_multiple": "Batch fetch from multiple subreddits", 111 | "fetch_comments": "Get complete comment trees" 112 | } 113 | } 114 | }) 115 | except Exception as e: 116 | print(f"ERROR: Server info request failed: {e}", flush=True) 117 | return JSONResponse( 118 | {"status": "error", "message": str(e)}, 119 | status_code=500 120 | ) 121 | 122 | # Initialize Reddit client (will be updated with config when available) 123 | reddit = None 124 | 125 | 126 | def initialize_reddit_client(): 127 | """Initialize Reddit client with environment config.""" 128 | global reddit 129 | reddit = get_reddit_client() 130 | # Register resources with the new client 131 | register_resources(mcp, reddit) 132 | 133 | # Initialize with environment variables initially 134 | try: 135 | initialize_reddit_client() 136 | except Exception as e: 137 | print(f"DEBUG: Reddit init failed: {e}", flush=True) 138 | 139 | 140 | # Three-Layer Architecture Implementation 141 | 142 | @mcp.tool( 143 | description="Discover available Reddit operations and recommended workflows", 144 | annotations={"readOnlyHint": True} 145 | ) 146 | def discover_operations(ctx: Context) -> Dict[str, Any]: 147 | """ 148 | LAYER 1: Discover what operations this MCP server provides. 149 | Start here to understand available capabilities. 150 | """ 151 | # Phase 1: Accept context but don't use it yet 152 | return { 153 | "operations": { 154 | "discover_subreddits": "Find relevant communities using semantic search", 155 | "search_subreddit": "Search for posts within a specific community", 156 | "fetch_posts": "Get posts from a single subreddit", 157 | "fetch_multiple": "Batch fetch from multiple subreddits (70% more efficient)", 158 | "fetch_comments": "Get complete comment tree for deep analysis" 159 | }, 160 | "recommended_workflows": { 161 | "comprehensive_research": [ 162 | "discover_subreddits → fetch_multiple → fetch_comments", 163 | "Best for: Thorough analysis across communities" 164 | ], 165 | "targeted_search": [ 166 | "discover_subreddits → search_subreddit → fetch_comments", 167 | "Best for: Finding specific content in relevant communities" 168 | ] 169 | }, 170 | "next_step": "Use get_operation_schema() to understand requirements" 171 | } 172 | 173 | 174 | @mcp.tool( 175 | description="Get detailed requirements and parameters for a Reddit operation", 176 | annotations={"readOnlyHint": True} 177 | ) 178 | def get_operation_schema( 179 | operation_id: Annotated[str, "Operation ID from discover_operations"], 180 | include_examples: Annotated[bool, "Include example parameter values"] = True, 181 | ctx: Context = None 182 | ) -> Dict[str, Any]: 183 | """ 184 | LAYER 2: Get parameter requirements for an operation. 185 | Use after discover_operations to understand how to call operations. 186 | """ 187 | # Phase 1: Accept context but don't use it yet 188 | schemas = { 189 | "discover_subreddits": { 190 | "description": "Find communities using semantic vector search", 191 | "parameters": { 192 | "query": { 193 | "type": "string", 194 | "required": True, 195 | "description": "Topic to find communities for", 196 | "validation": "2-100 characters" 197 | }, 198 | "limit": { 199 | "type": "integer", 200 | "required": False, 201 | "default": 10, 202 | "range": [1, 50], 203 | "description": "Number of communities to return" 204 | }, 205 | "include_nsfw": { 206 | "type": "boolean", 207 | "required": False, 208 | "default": False, 209 | "description": "Whether to include NSFW communities" 210 | } 211 | }, 212 | "returns": { 213 | "subreddits": "Array with confidence scores (0-1)", 214 | "quality_indicators": { 215 | "good": "5+ subreddits with confidence > 0.7", 216 | "poor": "All results below 0.5 confidence" 217 | } 218 | }, 219 | "examples": [] if not include_examples else [ 220 | {"query": "machine learning", "limit": 15}, 221 | {"query": "python web development", "limit": 10} 222 | ] 223 | }, 224 | "search_subreddit": { 225 | "description": "Search for posts within a specific subreddit", 226 | "parameters": { 227 | "subreddit_name": { 228 | "type": "string", 229 | "required": True, 230 | "description": "Exact subreddit name (without r/ prefix)", 231 | "tip": "Use exact name from discover_subreddits" 232 | }, 233 | "query": { 234 | "type": "string", 235 | "required": True, 236 | "description": "Search terms" 237 | }, 238 | "sort": { 239 | "type": "enum", 240 | "options": ["relevance", "hot", "top", "new"], 241 | "default": "relevance", 242 | "description": "How to sort results" 243 | }, 244 | "time_filter": { 245 | "type": "enum", 246 | "options": ["all", "year", "month", "week", "day"], 247 | "default": "all", 248 | "description": "Time period for results" 249 | }, 250 | "limit": { 251 | "type": "integer", 252 | "default": 10, 253 | "range": [1, 100], 254 | "description": "Maximum number of results" 255 | } 256 | }, 257 | "examples": [] if not include_examples else [ 258 | {"subreddit_name": "MachineLearning", "query": "transformers", "limit": 20}, 259 | {"subreddit_name": "Python", "query": "async", "sort": "top", "time_filter": "month"} 260 | ] 261 | }, 262 | "fetch_posts": { 263 | "description": "Get posts from a single subreddit", 264 | "parameters": { 265 | "subreddit_name": { 266 | "type": "string", 267 | "required": True, 268 | "description": "Exact subreddit name (without r/ prefix)" 269 | }, 270 | "listing_type": { 271 | "type": "enum", 272 | "options": ["hot", "new", "top", "rising"], 273 | "default": "hot", 274 | "description": "Type of posts to fetch" 275 | }, 276 | "time_filter": { 277 | "type": "enum", 278 | "options": ["all", "year", "month", "week", "day"], 279 | "default": None, 280 | "description": "Time period (only for 'top' listing)" 281 | }, 282 | "limit": { 283 | "type": "integer", 284 | "default": 10, 285 | "range": [1, 100], 286 | "description": "Number of posts to fetch" 287 | } 288 | }, 289 | "examples": [] if not include_examples else [ 290 | {"subreddit_name": "technology", "listing_type": "hot", "limit": 15}, 291 | {"subreddit_name": "science", "listing_type": "top", "time_filter": "week", "limit": 20} 292 | ] 293 | }, 294 | "fetch_multiple": { 295 | "description": "Batch fetch from multiple subreddits efficiently", 296 | "parameters": { 297 | "subreddit_names": { 298 | "type": "array[string]", 299 | "required": True, 300 | "max_items": 10, 301 | "description": "List of subreddit names (without r/ prefix)", 302 | "tip": "Use names from discover_subreddits" 303 | }, 304 | "listing_type": { 305 | "type": "enum", 306 | "options": ["hot", "new", "top", "rising"], 307 | "default": "hot", 308 | "description": "Type of posts to fetch" 309 | }, 310 | "time_filter": { 311 | "type": "enum", 312 | "options": ["all", "year", "month", "week", "day"], 313 | "default": None, 314 | "description": "Time period (only for 'top' listing)" 315 | }, 316 | "limit_per_subreddit": { 317 | "type": "integer", 318 | "default": 5, 319 | "range": [1, 25], 320 | "description": "Posts per subreddit" 321 | } 322 | }, 323 | "efficiency": { 324 | "vs_individual": "70% fewer API calls", 325 | "token_usage": "~500-1000 tokens per subreddit" 326 | }, 327 | "examples": [] if not include_examples else [ 328 | {"subreddit_names": ["Python", "django", "flask"], "listing_type": "hot", "limit_per_subreddit": 5}, 329 | {"subreddit_names": ["MachineLearning", "deeplearning"], "listing_type": "top", "time_filter": "week", "limit_per_subreddit": 10} 330 | ] 331 | }, 332 | "fetch_comments": { 333 | "description": "Get complete comment tree for a post", 334 | "parameters": { 335 | "submission_id": { 336 | "type": "string", 337 | "required_one_of": ["submission_id", "url"], 338 | "description": "Reddit post ID (e.g., '1abc234')" 339 | }, 340 | "url": { 341 | "type": "string", 342 | "required_one_of": ["submission_id", "url"], 343 | "description": "Full Reddit URL to the post" 344 | }, 345 | "comment_limit": { 346 | "type": "integer", 347 | "default": 100, 348 | "recommendation": "50-100 for analysis", 349 | "description": "Maximum comments to fetch" 350 | }, 351 | "comment_sort": { 352 | "type": "enum", 353 | "options": ["best", "top", "new"], 354 | "default": "best", 355 | "description": "How to sort comments" 356 | } 357 | }, 358 | "examples": [] if not include_examples else [ 359 | {"submission_id": "1abc234", "comment_limit": 100}, 360 | {"url": "https://reddit.com/r/Python/comments/xyz789/", "comment_limit": 50, "comment_sort": "top"} 361 | ] 362 | } 363 | } 364 | 365 | if operation_id not in schemas: 366 | return { 367 | "error": f"Unknown operation: {operation_id}", 368 | "available": list(schemas.keys()), 369 | "hint": "Use discover_operations() first" 370 | } 371 | 372 | return schemas[operation_id] 373 | 374 | 375 | @mcp.tool( 376 | description="Execute a Reddit operation with validated parameters" 377 | ) 378 | async def execute_operation( 379 | operation_id: Annotated[str, "Operation to execute"], 380 | parameters: Annotated[Dict[str, Any], "Parameters matching the schema"], 381 | ctx: Context = None 382 | ) -> Dict[str, Any]: 383 | """ 384 | LAYER 3: Execute a Reddit operation. 385 | Only use after getting schema from get_operation_schema. 386 | """ 387 | # Phase 1: Accept context but don't use it yet 388 | 389 | # Operation mapping 390 | operations = { 391 | "discover_subreddits": discover_subreddits, 392 | "search_subreddit": search_in_subreddit, 393 | "fetch_posts": fetch_subreddit_posts, 394 | "fetch_multiple": fetch_multiple_subreddits, 395 | "fetch_comments": fetch_submission_with_comments 396 | } 397 | 398 | if operation_id not in operations: 399 | return { 400 | "success": False, 401 | "error": f"Unknown operation: {operation_id}", 402 | "available_operations": list(operations.keys()) 403 | } 404 | 405 | try: 406 | # Add reddit client and context to params for operations that need them 407 | if operation_id in ["search_subreddit", "fetch_posts", "fetch_multiple", "fetch_comments"]: 408 | params = {**parameters, "reddit": reddit, "ctx": ctx} 409 | else: 410 | params = {**parameters, "ctx": ctx} 411 | 412 | # Execute operation with await for async operations 413 | if operation_id in ["discover_subreddits", "fetch_multiple", "fetch_comments"]: 414 | result = await operations[operation_id](**params) 415 | else: 416 | result = operations[operation_id](**params) 417 | 418 | return { 419 | "success": True, 420 | "data": result 421 | } 422 | 423 | except Exception as e: 424 | return { 425 | "success": False, 426 | "error": str(e), 427 | "recovery": suggest_recovery(operation_id, e) 428 | } 429 | 430 | 431 | def suggest_recovery(operation_id: str, error: Exception) -> str: 432 | """Helper to suggest recovery actions based on error type.""" 433 | error_str = str(error).lower() 434 | 435 | if "not found" in error_str or "404" in error_str: 436 | return "Verify the subreddit name or use discover_subreddits" 437 | elif "rate" in error_str or "429" in error_str: 438 | return "Rate limited - reduce limit parameter or wait before retrying" 439 | elif "private" in error_str or "403" in error_str: 440 | return "Subreddit is private - try other communities" 441 | elif "invalid" in error_str or "validation" in error_str: 442 | return "Check parameters match schema from get_operation_schema" 443 | else: 444 | return "Check parameters match schema from get_operation_schema" 445 | 446 | 447 | # Research Workflow Prompt Template 448 | RESEARCH_WORKFLOW_PROMPT = """ 449 | You are conducting comprehensive Reddit research based on this request: "{research_request}" 450 | 451 | ## WORKFLOW TO FOLLOW: 452 | 453 | ### PHASE 1: DISCOVERY 454 | 1. First, call discover_operations() to see available operations 455 | 2. Then call get_operation_schema("discover_subreddits") to understand the parameters 456 | 3. Extract the key topic/question from the research request and execute: 457 | execute_operation("discover_subreddits", {{"query": "<topic from request>", "limit": 15}}) 458 | 4. Note the confidence scores for each discovered subreddit 459 | 460 | ### PHASE 2: STRATEGY SELECTION 461 | Based on confidence scores from discovery: 462 | - **High confidence (>0.7)**: Focus on top 5-8 most relevant subreddits 463 | - **Medium confidence (0.4-0.7)**: Cast wider net with 10-12 subreddits 464 | - **Low confidence (<0.4)**: Refine search terms and retry discovery 465 | 466 | ### PHASE 3: GATHER POSTS 467 | Use batch operation for efficiency: 468 | execute_operation("fetch_multiple", {{ 469 | "subreddit_names": [<list from discovery>], 470 | "listing_type": "top", 471 | "time_filter": "year", 472 | "limit_per_subreddit": 10 473 | }}) 474 | 475 | ### PHASE 4: DEEP DIVE INTO DISCUSSIONS 476 | For posts with high engagement (10+ comments, 5+ upvotes): 477 | execute_operation("fetch_comments", {{ 478 | "submission_id": "<post_id>", 479 | "comment_limit": 100, 480 | "comment_sort": "best" 481 | }}) 482 | 483 | Target: Analyze 100+ total comments across 10+ subreddits 484 | 485 | ### PHASE 5: SYNTHESIZE FINDINGS 486 | 487 | Create a comprehensive report that directly addresses the research request: 488 | 489 | # Research Report: {research_request} 490 | *Generated: {timestamp}* 491 | 492 | ## Executive Summary 493 | - Direct answer to the research question 494 | - Key findings with confidence levels 495 | - Coverage metrics: X subreddits, Y posts, Z comments analyzed 496 | 497 | ## Communities Analyzed 498 | | Subreddit | Subscribers | Relevance Score | Posts Analyzed | Key Insights | 499 | |-----------|------------|-----------------|----------------|--------------| 500 | | [data] | [count] | [0.0-1.0] | [count] | [summary] | 501 | 502 | ## Key Findings 503 | 504 | ### [Finding that directly addresses the research request] 505 | **Community Consensus**: [Strong/Moderate/Split/Emerging] 506 | 507 | Evidence from Reddit: 508 | - u/[username] in r/[subreddit] stated: "exact quote" [↑450](https://reddit.com/r/subreddit/comments/abc123/) 509 | - Discussion with 200+ comments shows... [link](url) 510 | - Highly awarded post argues... [↑2.3k, Gold×3](url) 511 | 512 | ### [Additional relevant findings...] 513 | [Continue with 2-4 more key findings that answer different aspects of the research request] 514 | 515 | ## Temporal Trends 516 | - How perspectives have evolved over time 517 | - Recent shifts in community sentiment 518 | - Emerging viewpoints in the last 30 days 519 | 520 | ## Notable Perspectives 521 | - Expert opinions (verified flairs, high karma users 10k+) 522 | - Contrarian views worth considering 523 | - Common misconceptions identified 524 | 525 | ## Data Quality Metrics 526 | - Total subreddits analyzed: [count] 527 | - Total posts reviewed: [count] 528 | - Total comments analyzed: [count] 529 | - Unique contributors: [count] 530 | - Date range: [oldest] to [newest] 531 | - Average post score: [score] 532 | - High-karma contributors (10k+): [count] 533 | 534 | ## Limitations 535 | - Geographic/language bias (primarily English-speaking communities) 536 | - Temporal coverage (data from [date range]) 537 | - Communities not represented in analysis 538 | 539 | --- 540 | *Research methodology: Semantic discovery across 20,000+ indexed subreddits, followed by deep analysis of high-engagement discussions* 541 | 542 | CRITICAL REQUIREMENTS: 543 | - Never fabricate Reddit content - only cite actual posts/comments from the data 544 | - Every claim must link to its Reddit source with a clickable URL 545 | - Include upvote counts and awards for credibility assessment 546 | - Note when content is [deleted] or [removed] 547 | - Track temporal context (when was this posted?) 548 | - Answer the specific research request - don't just summarize content 549 | """ 550 | 551 | 552 | @mcp.prompt( 553 | name="reddit_research", 554 | description="Conduct comprehensive Reddit research on any topic or question", 555 | tags={"research", "analysis", "comprehensive"} 556 | ) 557 | def reddit_research(research_request: str) -> List[Message]: 558 | """ 559 | Guides comprehensive Reddit research based on a natural language request. 560 | 561 | Args: 562 | research_request: Natural language description of what to research 563 | Examples: "How do people feel about remote work?", 564 | "Best practices for Python async programming", 565 | "Community sentiment on electric vehicles" 566 | 567 | Returns: 568 | Structured messages guiding the LLM through the complete research workflow 569 | """ 570 | timestamp = datetime.now().strftime("%Y-%m-%d %H:%M UTC") 571 | 572 | return [ 573 | Message( 574 | role="assistant", 575 | content=RESEARCH_WORKFLOW_PROMPT.format( 576 | research_request=research_request, 577 | timestamp=timestamp 578 | ) 579 | ), 580 | Message( 581 | role="user", 582 | content=f"Please conduct comprehensive Reddit research to answer: {research_request}" 583 | ) 584 | ] 585 | 586 | 587 | def main(): 588 | """Main entry point for the server.""" 589 | print("Reddit MCP Server starting...", flush=True) 590 | 591 | # Try to initialize the Reddit client with available configuration 592 | try: 593 | initialize_reddit_client() 594 | print("Reddit client initialized successfully", flush=True) 595 | except Exception as e: 596 | print(f"WARNING: Failed to initialize Reddit client: {e}", flush=True) 597 | print("Server will run with limited functionality.", flush=True) 598 | print("\nPlease provide Reddit API credentials via:", flush=True) 599 | print(" 1. Environment variables: REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USER_AGENT", flush=True) 600 | print(" 2. Config file: .mcp-config.json", flush=True) 601 | 602 | # Run with stdio transport 603 | mcp.run() 604 | 605 | 606 | if __name__ == "__main__": 607 | main() ```