Pricing / Costs
Hosting Llama 3 Billion parameters or similar other multimodal and used the exposed API's initially for development purposes, what would be the monthly cost.
Questions
- What is the expected scale of API usage (requests per month)?
- What are the primary use cases for the Llama 3B model?
- Are there specific latency or performance requirements?
- What is the target region for hosting the model?
- What type of data will the model interact with (text, images, etc.)?
- Is fine-tuning required, or will pre-trained weights suffice?
- Do you need multi-modal support from day one, or will it scale later?
- Do you need private hosting, or is a managed cloud solution acceptable? If yes any specific cloud requirement?
- What level of uptime and SLA is expected?
- Will the model be available 24x7 or we have to shutdown resources at night?
- Is cost optimization a priority over high availability initially?
- What is the preferred technology stack or integration requirements?
- How critical is API rate limiting or user access control?
- Will additional services like analytics or monitoring be required?
- Do you foresee a need for scaling storage or additional GPUs?
- Are there specific compliance or security requirements (e.g., GDPR)?
Costs
Assumptions
- Model Size:
- LLaMA 3B parameters: ~6GB (weights).
- Additional memory for activations and headroom: ~10GB.
- Total GPU memory needed: 16GB.
- Compute Requirements:
- GPU Type: NVIDIA A10G, A100 (40GB), or equivalent.
- vCPUs: 4.
- RAM: 32GB.
- Usage:
- Active usage: 8 hours/day (development phase).
- Idle usage: 16 hours/day with reduced compute.
- 30 days/month.
- Storage:
- Model checkpoint storage: 100GB (including versioned checkpoints).
- Persistent SSD for API hosting: 50GB.
- Networking:
- 2TB egress bandwidth.
- Other Costs:
- API Gateway: $50/month for limited development use.
- Monitoring: $30/month.
Step 1: Compute Costs
AWS, GCP, and Azure GPU Instance Pricing:
- AWS:
p3.2xlarge(1 NVIDIA V100 GPU) = ~$3.06/hour. - GCP:
A100-1 GPU(A100 40GB) = ~$2.91/hour. - Azure:
Standard_NC6s_v3(1 NVIDIA V100 GPU) = ~$2.60/hour.
Idle Instance Costs (CPU only):
- AWS:
m5.xlarge= ~$0.192/hour. - GCP:
n1-standard-4= ~$0.152/hour. - Azure:
Standard_D4_v3= ~$0.152/hour.
Step 2: Storage Costs
- AWS S3/GP2, GCP Persistent Disk, Azure Premium SSD: ~$0.10/GB/month.
- Total storage: 150GB = $15/month.