Technical Feasibility: AI Question Generation
Context: Building AI-powered technical assessment platform to compete with HackerRank/CodeSignal by generating unique coding questions on-demand.
Goal: Evaluate technical feasibility of generating high-quality, unique coding questions using AI to prevent memorization and cheating.
Executive Summary
Feasibility Verdict: ✅ HIGHLY FEASIBLE with current AI technology (GPT-4, Claude Opus, etc.)
Key Findings:
- Question Generation: Modern LLMs (GPT-4, Claude 3.5) can generate valid coding problems at 80-90% quality
- Uniqueness: Can generate virtually infinite variations, eliminating memorization
- Difficulty Calibration: Achievable with prompt engineering + feedback loops
- Cost: Affordable ($0.01-0.10 per question generated)
- Main Challenge: Quality validation requires human review initially, automated testing over time
- Timeline: MVP in 3-6 months, production-ready in 6-12 months
Recommendation: Proceed with phased approach - start with curated AI-generated library, evolve to real-time generation.
Problem Statement
Current Limitations of Existing Platforms
HackerRank/CodeSignal Issues:
-
Question Memorization
- 1000-5000 static questions
- Candidates practice same problems
- Solutions leaked online (LeetCode, GitHub)
- Cheating via memory
-
Stale Question Bank
- Hard to create new questions (expensive, time-consuming)
- Questions rarely updated
- Easy to find answers online
-
Limited Customization
- Companies can't test specific technologies easily
- Generic problems don't match real job requirements
- Hard to create company-specific assessments
AI Solution Advantages
Dynamic Question Generation:
- Generate unique questions for each candidate
- Infinite variations of similar problems
- Impossible to memorize all questions
- Harder to share/cheat
Customization:
- Generate questions for any programming language
- Target specific concepts/technologies
- Match company tech stack
- Adjust difficulty on-the-fly
Cost & Speed:
- Generate questions in seconds vs hours/days
- No human question writers needed (initially)
- Scale infinitely without hiring
Technical Approach
1. Question Generation Pipeline
Architecture:
User Input (Topic, Difficulty, Language)
↓
Prompt Engineering Layer (System Prompts + Context)
↓
LLM API (GPT-4, Claude 3.5, or fine-tuned model)
↓
Question Generation (Problem + Test Cases + Solution)
↓
Validation Layer (Syntax Check, Test Case Execution)
↓
Quality Scoring (Automated Metrics + Human Review)
↓
Question Bank (Store if quality > threshold)
↓
Serve to Candidate
2. Prompt Engineering Strategy
Multi-Shot Prompting:
System Prompt:
"You are an expert coding interview question creator. Generate original,
high-quality coding problems suitable for technical interviews. Each question
must include:
1. Clear problem statement
2. Input/output examples
3. Constraints
4. 5-10 test cases (including edge cases)
5. Reference solution in [LANGUAGE]
6. Time/space complexity analysis
Follow these principles:
- Original problems (not variations of common problems)
- Clear, unambiguous descriptions
- Realistic constraints
- Solvable within 30-45 minutes
- Single optimal approach preferred
"
User Prompt:
"Generate a [DIFFICULTY] coding problem about [TOPIC] in [LANGUAGE].
Difficulty: [Easy/Medium/Hard]
Topic: [e.g., 'binary trees', 'dynamic programming', 'graph algorithms']
Language: [e.g., Python, Java, C++]
Constraints:
- Time limit: 45 minutes
- Target role: [e.g., 'Mid-level Software Engineer']
"
Few-Shot Examples:
Provide 3-5 high-quality example questions in the prompt to guide the model.
3. Question Components
Required Elements:
-
Problem Statement
- Clear description
- Real-world context (when applicable)
- Input format specification
- Output format specification
-
Examples
- 2-3 worked examples
- Edge cases demonstrated
- Clear explanation of logic
-
Constraints
- Input size limits (e.g., 1 ≤ n ≤ 10^5)
- Time complexity expectations
- Space complexity expectations
- Language-specific constraints
-
Test Cases
- Basic cases (2-3)
- Edge cases (2-3)
- Large inputs (1-2)
- Hidden test cases (3-5 for anti-cheating)
-
Reference Solution
- Optimal solution in target language
- Time/space complexity analysis
- Alternative approaches (optional)
-
Metadata
- Difficulty (Easy/Medium/Hard)
- Topics (tags: arrays, sorting, DP, etc.)
- Estimated time
- Success rate (track over time)
4. Difficulty Calibration
Approach: Multi-Dimensional Scoring
Factors Determining Difficulty:
-
Algorithmic Complexity
- Easy: O(n) or O(n log n), single concept
- Medium: O(n²), 2-3 concepts combined
- Hard: O(n³) or complex DP, multiple concepts
-
Problem Size
- Easy:
<30lines of code - Medium: 30-60 lines
- Hard:
>60lines
- Easy:
-
Concept Layering
- Easy: 1 concept (e.g., just sorting)
- Medium: 2-3 concepts (sorting + binary search)
- Hard: 3+ concepts (graph + DP + optimization)
-
Edge Case Handling
- Easy: 1-2 edge cases
- Medium: 3-4 edge cases
- Hard: 5+ edge cases, tricky scenarios
Calibration Process:
1. Generate question with target difficulty
2. Run through test candidates (or simulate)
3. Measure:
- Completion rate
- Average time to solve
- Test case pass rate
4. Adjust difficulty label if needed
5. Store calibrated question
Automated Difficulty Estimation:
Use ML model trained on existing questions to predict difficulty based on:
- Code complexity (cyclomatic complexity)
- Number of test cases
- Constraint ranges
- Topic combinations
5. Quality Validation
Automated Checks (Must Pass):
-
Syntax Validation
- Problem statement is complete
- Code syntax is valid
- Test cases are well-formed
-
Solution Verification
- Reference solution passes all test cases
- Solution runs within time/space constraints
- No runtime errors
-
Test Case Quality
- Test cases cover edge cases
- At least 1 large input test
- No duplicate test cases
-
Uniqueness Check
- Embedding similarity to existing questions
<0.8 - Not a direct copy of known problems
- Variation is meaningful
- Embedding similarity to existing questions
Quality Scoring (0-100):
quality_score = (
syntax_valid * 20 +
solution_correctness * 30 +
test_case_coverage * 20 +
clarity_score * 15 +
uniqueness_score * 15
)
# Only store if quality_score > 70
Human Review (Initially):
- Review first 100-200 questions manually
- Flag issues, refine prompts
- Build training dataset for automated scoring
- Gradually reduce human review %
6. Anti-Cheating Mechanisms
Problem Variation Techniques:
-
Parameter Randomization
- Same problem structure, different numbers
- Example: "Find sum of array" → randomize array size, values
-
Context Switching
- Same algorithm, different story
- Example: Graph traversal as "city routes" or "social network"
-
Constraint Tweaking
- Adjust input size, time limits
- Changes optimal solution approach
-
Test Case Uniqueness
- Each candidate gets different test cases
- Same problem, different validation
Example - Same Problem, 3 Variations:
Variation 1: "Find longest increasing subsequence in array of stock prices"
Variation 2: "Find longest chain of friends where each friend is older than previous"
Variation 3: "Find longest sequence of building heights that increase"
All require same algorithm (LIS), but context differs.
Hidden Test Cases:
- Candidate sees 2-3 sample test cases
- 5-10 hidden test cases run on submission
- Prevents hard-coding solutions
- Must solve algorithmically
7. Real-Time vs Pre-Generated
Two Approaches:
A. Pre-Generated Library (Recommended for MVP)
Process:
- Generate 10,000-50,000 questions offline
- Validate and score all questions
- Store in database with metadata
- Serve random question per candidate
Pros:
- Quality controlled upfront
- Fast serving (no generation latency)
- Lower cost (batch generation)
- Predictable performance
Cons:
- Still finite set (though very large)
- Need periodic regeneration
- Storage costs
B. Real-Time Generation (Future State)
Process:
- Candidate starts assessment
- Generate question on-demand
- Quick validation (automated only)
- Serve to candidate
- Store for future use if quality good
Pros:
- Truly infinite variations
- Always fresh questions
- No memorization possible
Cons:
- Generation latency (3-10 seconds)
- Higher API costs
- Quality variance risk
- Requires robust automated validation
Recommended Hybrid:
- Start with pre-generated library (10K questions)
- Generate new variations in real-time for repeated candidates
- Continuously expand library with validated real-time generations
Technology Stack
1. LLM Options
Option A: OpenAI GPT-4 / GPT-4 Turbo
Pros:
- Excellent code generation quality
- Good at following complex instructions
- 128K context window
- Relatively fast (2-5 sec response)
Cons:
- Expensive ($0.01-0.03 per request)
- Rate limits (10K RPM on paid tier)
- Privacy concerns (data sent to OpenAI)
Cost: ~$0.02 per question generated
Option B: Anthropic Claude 3.5 Sonnet / Opus
Pros:
- Excellent reasoning and code quality
- 200K context window
- Better at following complex constraints
- More nuanced responses
Cons:
- Similar pricing to GPT-4
- Rate limits
Cost: ~$0.015-0.03 per question (Sonnet cheaper)
Option C: Open-Source Models (Llama 3, Mixtral, CodeLlama)
Pros:
- Self-hosted (data privacy)
- No per-request cost after infrastructure
- Customizable via fine-tuning
- No rate limits
Cons:
- Lower quality than GPT-4/Claude initially
- Requires GPU infrastructure ($500-2000/month)
- Fine-tuning effort needed
- Slower iteration
Cost: $0.001-0.005 per question (after infrastructure amortized)
Option D: Fine-Tuned Model
Process:
- Start with GPT-4/Claude to generate 10K questions
- Human review and curate best 1K-2K questions
- Fine-tune Llama 3 70B on curated dataset
- Use fine-tuned model for production
Pros:
- Best quality-to-cost ratio long-term
- Full control over model
- Data privacy
- Can optimize for specific question types
Cons:
- Requires upfront investment (GPU, training time)
- Need quality training data (1K+ examples)
- Maintenance overhead
Cost: $2K-5K upfront, then $0.001-0.003 per question
Recommendation: Start with Claude 3.5 Sonnet (best quality/cost), migrate to fine-tuned Llama 3 after 10K+ curated questions.
2. Infrastructure
Question Generation Service:
┌─────────────────────────────── ──────────┐
│ Question Generation API │
├─────────────────────────────────────────┤
│ - FastAPI / Express.js │
│ - LLM API integration (Claude/GPT-4) │
│ - Prompt management │
│ - Validation pipeline │
│ - Quality scoring │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Code Execution Sandbox │
├─────────────────────────────────────────┤
│ - Docker containers │
│ - Judge0 / Piston API │
│ - Solution verification │
│ - Test case execution │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Question Database │
├─────────────────────────────────────────┤
│ - PostgreSQL (metadata) │
│ - MongoDB (question content) │
│ - Vector DB (embeddings for similarity)│
│ - Redis (caching) │
└────────── ───────────────────────────────┘
Stack Recommendations:
- Backend: Python (FastAPI) or Node.js (Express)
- LLM API: Claude API or OpenAI API
- Code Execution: Judge0 API (self-hosted) or Piston API
- Database: PostgreSQL + MongoDB
- Vector DB: Pinecone or Weaviate (for similarity search)
- Cache: Redis
- Hosting: AWS / GCP / Fly.io
- Monitoring: Sentry, DataDog
3. Code Execution Environment
Requirements:
- Secure sandboxing (prevent malicious code)
- Multiple language support (Python, Java, C++, JS, Go, etc.)
- Fast execution (1-5 seconds per test case)
- Resource limits (CPU, memory, time)
Options:
A. Judge0 (Recommended)
- Open-source online judge
- 60+ languages supported
- Docker-based sandboxing
- Self-hostable or API
- $0.01-0.05 per execution (API) or $100-300/month (self-hosted)
B. Piston API
- Lightweight code execution engine
- 45+ languages
- Simple API
- Free tier: 1000 executions/day
- Paid: $20-100/month
C. AWS Lambda + Firecracker
- Build custom execution environment
- Full control
- Higher development cost
- ~$0.20 per 1M executions
Recommendation: Start with Piston API (cheap, simple), migrate to self-hosted Judge0 at scale.
Implementation Phases
Phase 1: MVP (3 months)
Goal: Prove concept with 1000 high-quality AI-generated questions
Tasks:
-
Prompt Engineering (2 weeks)
- Develop question generation prompts
- Test with GPT-4 / Claude
- Iterate on quality
-
Generation Pipeline (3 weeks)
- Build API for question generation
- Integrate Claude API
- Automated syntax validation
-
Code Execution (3 weeks)
- Integrate Piston API
- Solution verification system
- Test case execution
-
Quality Control (2 weeks)
- Human review workflow
- Quality scoring system
- Feedback loop to improve prompts
-
Question Library (2 weeks)
- Generate 1000 questions (Easy: 400, Medium: 400, Hard: 200)
- Manual review and curation
- Store in database
-
Basic Assessment UI (2 weeks)
- Simple test interface
- Code editor integration
- Submit and evaluate
Deliverable: 1000 curated AI-generated questions, working assessment platform
Cost: $2K-5K (mostly LLM API costs)
Phase 2: Scaling (3 months)
Goal: 10,000 questions, automated quality validation
Tasks:
-
Automated Quality Scoring (4 weeks)
- Train ML model on curated 1K questions
- Automated difficulty prediction
- Reduce human review to 10%
-
Generate 10K Questions (4 weeks)
- Batch generation with automated validation
- Store in database
- Quality score
>70only
-
Anti-Cheating Features (3 weeks)
- Parameter randomization
- Hidden test cases
- Question variation system
-
Analytics Dashboard (2 weeks)
- Question performance metrics
- Candidate success rates
- Difficulty calibration feedback
-
Testing & Iteration (3 weeks)
- Beta test with real candidates
- Gather feedback
- Refine questions based on data
Deliverable: 10K validated questions, automated quality system, production-ready platform
Cost: $10K-20K (LLM API + infrastructure)
Phase 3: Real-Time Generation (3-6 months)
Goal: Real-time question generation, infinite variations
Tasks:
-
Real-Time Generation API (4 weeks)
- Low-latency generation (3-5 sec)
- Caching layer
- Fallback to library if generation fails
-
Fine-Tuned Model (8 weeks)
- Curate 2K best questions from library
- Fine-tune Llama 3 70B
- Deploy on GPU infrastructure
- Quality comparison with Claude
-
Advanced Anti-Cheating (4 weeks)
- Session-specific questions
- Real-time variation generation
- Question versioning
-
Performance Optimization (3 weeks)
- Reduce generation latency
- Optimize code execution
- Scale infrastructure
Deliverable: Real-time question generation, fine-tuned model, fully scaled platform
Cost: $20K-40K (GPU infrastructure + training)
Cost Analysis
Per-Question Generation Costs
| Model | Cost per Question | Quality | Speed |
|---|---|---|---|
| GPT-4 | $0.02-0.03 | Excellent | 3-5 sec |
| Claude Opus | $0.03-0.05 | Excellent | 3-5 sec |
| Claude Sonnet | $0.015-0.02 | Very Good | 2-4 sec |
| Llama 3 70B (self-hosted) | $0.001-0.003 | Good-Very Good | 5-10 sec |
| Fine-tuned Llama 3 | $0.002-0.005 | Very Good-Excellent | 5-10 sec |
Infrastructure Costs (Monthly)
| Service | MVP | Scale (10K questions) | Production (Real-time) |
|---|---|---|---|
| LLM API | $100-200 | $500-1000 | $2K-5K |
| Code Execution | $20-50 | $100-300 | $500-1K |
| Database | $25-50 | $100-200 | $300-500 |
| Hosting | $50-100 | $200-400 | $500-1K |
| GPU (fine-tuned) | $0 | $0 | $1K-2K |
| Total | $200-400 | $900-1.9K | $4.3K-9.5K |
Total Investment by Phase
| Phase | Duration | Dev Cost | Infra Cost | Total |
|---|---|---|---|---|
| Phase 1 (MVP) | 3 months | $15K-25K | $600-1.2K | $15.6K-26.2K |
| Phase 2 (Scale) | 3 months | $20K-35K | $2.7K-5.7K | $22.7K-40.7K |
| Phase 3 (Real-time) | 3-6 months | $30K-50K | $12.9K-57K | $42.9K-107K |
| Total (12 months) | 12 months | $65K-110K | $16.2K-63.9K | $81.2K-173.9K |
Note: Dev costs assume 1 full-time engineer at $5K-9K/month fully-loaded.
ROI Analysis
Assumptions:
- Platform serves 100 companies
- Average $500/month per company
- Monthly revenue: $50K
Break-even:
- Phase 1: Month 1 (ROI positive immediately)
- Phase 2: Month 3-4
- Phase 3: Month 6-8
Unit Economics:
- Cost per assessment: $0.50-2.00 (including question generation, code execution)
- Revenue per assessment: $10-50 (depending on pricing tier)
- Gross margin: 80-96%
Quality Assurance
Validation Metrics
Track These Metrics:
-
Generation Success Rate
- % of generations that pass automated checks
- Target:
>90%
-
Quality Score Distribution
- Average quality score of generated questions
- Target:
>75/100
-
Candidate Success Rate
- % of candidates who pass questions
- By difficulty: Easy
>70%, Medium 30-50%, Hard<20%
-
Time to Solve
- Average time per difficulty
- Target: Easy
<20min, Medium 30-40 min, Hard 45-60 min
-
Test Case Coverage
- % of edge cases covered
- Target:
>90%
-
Uniqueness Score
- Similarity to existing questions (embedding distance)
- Target:
<0.75cosine similarity
A/B Testing
Compare AI-Generated vs Human-Written:
- Run parallel tests with both question types
- Measure candidate performance, satisfaction
- Adjust prompts based on results
Metrics to Compare:
- Candidate NPS (Net Promoter Score)
- Completion rate
- Time to solve
- Quality feedback
Risk Analysis & Mitigation
Risk 1: Low-Quality Questions ⚠️ HIGH IMPACT
Risk: AI generates unclear, incorrect, or trivial questions
Probability: Medium (30-40% without validation)
Mitigation:
- Multi-layer validation (syntax → execution → quality score → human review)
- Start with human review on all questions
- Build training dataset from good examples
- Automated quality scoring based on historical data
- Continuous feedback loop
Fallback: Maintain library of human-written questions as backup
Risk 2: Difficulty Calibration ⚠️ MEDIUM IMPACT
Risk: Questions labeled "Medium" are actually "Hard" or vice versa
Probability: Medium-High (40-50%)
Mitigation:
- Test questions on sample candidates before production
- Track success rates and adjust labels
- Use ML model to predict difficulty from question features
- Provide difficulty range (e.g., "Medium-Hard")
Fallback: Let candidates rate difficulty, adjust labels based on feedback
Risk 3: Cheating via AI Assistants ⚠️ MEDIUM IMPACT
Risk: Candidates use ChatGPT/Claude to solve questions
Probability: High (60%+, but not unique to AI-generated questions)
Mitigation:
- Proctoring (webcam, screen recording)
- Time pressure (harder to copy-paste to ChatGPT)
- Unique question variations (AI assistant gets different version)
- Code style analysis (detect AI-written code patterns)
Fallback: Focus on real-time live interviews after initial screening
Risk 4: API Costs Spiral ⚠️ LOW-MEDIUM IMPACT
Risk: LLM API costs exceed budget
Probability: Low (10-20% with proper planning)
Mitigation:
- Set budget caps on API keys
- Monitor usage closely
- Batch generation (cheaper)
- Migrate to self-hosted model at scale
- Cache generated questions
Fallback: Use pre-generated library exclusively, regenerate monthly
Risk 5: Generation Latency ⚠️ LOW IMPACT
Risk: Real-time generation too slow (10+ seconds)
Probability: Low (10-15% with GPT-4/Claude)
Mitigation:
- Use faster models (Claude Sonnet, GPT-4 Turbo)
- Pre-generate questions during off-peak
- Hybrid approach (serve from library, generate in background)
- Optimize prompts for speed
Fallback: Revert to pre-generated library only
Risk 6: Model Drift ⚠️ LOW IMPACT
Risk: LLM provider changes model, quality degrades
Probability: Medium (30%, happens every 6-12 months)
Mitigation:
- Pin specific model versions in API calls
- Test new models before migration
- Have backup LLM provider (Claude + GPT-4)
- Self-hosted option as long-term solution
Fallback: Switch to alternative provider or self-hosted model
Success Criteria
MVP Success (Phase 1)
- ✅ Generate 1000 high-quality questions (quality score
>70) - ✅ 80%+ of questions solvable by target candidates
- ✅
<5%questions require human fixing - ✅ Candidate feedback score
>4/5
Scale Success (Phase 2)
- ✅ 10,000 questions generated with
<10%human review - ✅ Automated quality scoring 90%+ accurate
- ✅ Difficulty calibration within ±1 level
- ✅ Anti-cheating mechanisms effective (
<10%cheating rate)
Production Success (Phase 3)
- ✅ Real-time generation
<5seconds - ✅ Fine-tuned model matches or beats Claude quality
- ✅ Cost per question
<$0.005 - ✅ Platform serves 100+ companies
- ✅ Candidate NPS
>40
Competitive Analysis
CodeSignal AI Approach
What They Do:
- AI Interviewer (conversational AI for screening)
- AI Tutoring (Cosmo)
- AI Proctoring
- NOT real-time question generation (still static library)
Opportunity: You can differentiate with AI question generation (they don't have this)
HackerRank AI Approach
What They Do:
- AI-powered analytics
- Some AI-assisted question creation (internal)
- NOT candidate-facing AI generation
Opportunity: First-mover advantage in AI-generated assessments
Market Gap
Current State:
- All major players use static question libraries
- Some variation (parameter tweaking) but not true generation
- Question memorization still possible
Your Advantage:
- True AI generation = infinite questions
- Impossible to memorize
- First-to-market with this approach
- Unique selling proposition
Recommendations
Immediate Next Steps (Weeks 1-4)
-
Prototype Generation (Week 1)
- Set up Claude API account
- Write initial prompts
- Generate 50 test questions
- Manual quality review
-
Validation System (Week 2)
- Integrate Piston API
- Build solution verification
- Test on generated questions
-
Iteration (Week 3)
- Refine prompts based on quality
- Test different difficulty levels
- Experiment with topics
-
Decision Point (Week 4)
- If quality
>70%: Proceed to Phase 1 (MVP) - If quality 50-70%: More iteration needed
- If quality
<50%: Reconsider approach or try different model
- If quality
Phased Approach (Recommended)
Phase 1 (Months 1-3): Pre-Generated Library
- Generate 1K-10K questions offline
- Human review and curate
- Build assessment platform
- Launch MVP with static (but AI-generated) library
Phase 2 (Months 4-6): Automated Quality
- Reduce human review to 10%
- Scale to 10K+ questions
- Implement anti-cheating
- Beta test with real companies
Phase 3 (Months 7-12): Real-Time + Fine-Tuning
- Real-time generation for power users
- Fine-tune custom model
- Infinite variations
- Full production scale
Technology Choices
For MVP:
- ✅ Claude 3.5 Sonnet (best quality/cost balance)
- ✅ Piston API (code execution)
- ✅ PostgreSQL + MongoDB (database)
- ✅ FastAPI or Express.js (backend)
For Scale:
- ✅ Fine-tuned Llama 3 70B (cost optimization)
- ✅ Self-hosted Judge0 (scale code execution)
- ✅ Vector DB for similarity search
Conclusion
Technical Feasibility: ✅ HIGHLY FEASIBLE
Key Strengths:
- Technology Maturity - LLMs (GPT-4, Claude) proven for code generation
- Cost Effective - $0.01-0.03 per question, decreasing to $0.001-0.005 with scale
- Quality - 80-90% quality achievable with good prompts
- Differentiation - No major competitor doing this yet
- Scalability - Can generate millions of questions
Key Challenges:
- Initial Quality Control - Needs human review initially (100-200 hours)
- Difficulty Calibration - Requires testing and feedback loops
- Anti-Cheating - Need robust proctoring (but not unique to AI questions)
- Model Dependency - Reliant on external APIs initially
Risk Level: LOW-MEDIUM
Recommendation: ✅ PROCEED with phased approach
- Start with pre-generated library (de-risk quality)
- Prove market fit before investing in real-time generation
- Migrate to fine-tuned model at scale for cost optimization
- Strong competitive moat if executed well
Estimated Timeline: 6-12 months to production-ready platform with 10K+ AI-generated questions.
Estimated Investment: $80K-170K (including 1 FTE engineer for 12 months + infrastructure)
ROI: Break-even in 3-6 months at 100 company customers paying $500/month average.
Related Research
- CodeSignal Analysis - AI features competitive analysis
- HackerRank Analysis - Market leader static library
- Technical Hiring Market - $8-10B TAM
- AI Coding Test Platform - Product concept
Appendix: Example Generated Question
Generated by Claude 3.5 Sonnet with custom prompt:
**Title:** Efficient Cache with TTL
**Difficulty:** Medium
**Topics:** Hash Maps, Data Structures, Time Complexity
**Problem Statement:**
Design and implement a cache system that stores key-value pairs with
a time-to-live (TTL) mechanism. The cache should automatically expire
entries after their TTL has passed.
Implement the following operations:
1. `put(key, value, ttl)` - Store a key-value pair with TTL in seconds
2. `get(key)` - Retrieve value if not expired, return -1 if expired or not found
3. `delete(key)` - Remove a key-value pair from cache
4. `cleanup()` - Remove all expired entries
**Constraints:**
- 1 ≤ key, value ≤ 10^6
- 1 ≤ ttl ≤ 10^4
- At most 1000 operations
- get/put operations should be O(1) average case
- cleanup should be O(n) where n is number of entries
**Example:**
Input:
cache = Cache()
cache.put(1, 100, 5) # key=1, value=100, ttl=5 seconds
cache.get(1) # returns 100 (within TTL)
wait(6 seconds)
cache.get(1) # returns -1 (expired)
**Test Cases:**
[Generated automatically with edge cases]
**Reference Solution:**
[Python solution with time/space complexity analysis]
**Time Complexity:** O(1) for get/put, O(n) for cleanup
**Space Complexity:** O(n) where n is number of entries
Quality Assessment: ✅ High quality, clear, solvable, original