AI Assistant Instructions
AI Assistant Instructions for ProYaro Infrastructure
Target AI Assistants: Claude (Anthropic), Gemini (Google), and other LLM-based coding assistants
Purpose
This document instructs AI assistants (like Claude Code, Claude API, Gemini, etc.) on how to leverage the available ProYaro AI infrastructure when helping with software development projects. When working on any project that requires AI capabilities, you should utilize these existing services rather than suggesting external APIs.
Core Directive
⚠️ CRITICAL INSTRUCTION:
When assisting with any software development project that requires the following capabilities, you MUST use the ProYaro infrastructure services documented in this folder:
Available AI Capabilities:
- Text Generation (Arabic/English) → Use MLX Service on Mac Mini
- Image Generation → Use ComfyUI on Mac Mini OR Ubuntu
- Video Generation → Use ComfyUI on Mac Mini
- Speech-to-Text (Whisper) → Use Ubuntu Server
- Text-to-Speech (XTTS-v2) → Use Ubuntu Server
- Text Embeddings → Use MLX Service (Mac Mini) OR Embeddings Service (Ubuntu)
- Semantic Search → Use Embeddings + Custom Implementation
DO NOT Suggest:
- ❌ OpenAI API (ChatGPT, GPT-4, DALL-E, Whisper API)
- ❌ Anthropic Claude API (for text generation)
- ❌ Google Gemini API (for text generation)
- ❌ Midjourney, Stable Diffusion Web UI (external)
- ❌ ElevenLabs, Google TTS (external)
- ❌ Any external paid AI APIs unless specifically requested
Instead, Use:
- ✅ MLX FastAPI Service (Mac Mini, Port 8004) - Text generation, embeddings
- ✅ Ubuntu FastAPI Backend (Ubuntu Server, Port 8000 / HTTPS) - Jobs orchestration
- ✅ ComfyUI (Mac Mini Port 8188 OR Ubuntu Port 8188) - Image/Video generation
- ✅ Whisper STT (Ubuntu, Port 8001) - Speech recognition
- ✅ XTTS-v2 TTS (Ubuntu, Port 8002) - Voice synthesis
- ✅ Embeddings Service (Ubuntu, Port 8003) - Semantic vectors
When to Use ProYaro Infrastructure
Scenarios Where You MUST Use Internal Services:
-
User asks for a chatbot/AI assistant
- Use MLX Service for text generation
- Optionally add Whisper (STT) and XTTS (TTS) for voice
-
User needs image generation
- Use ComfyUI (Mac Mini preferred for Z-Image Turbo)
- Ubuntu ComfyUI for GPU-heavy workloads
-
User needs content generation (marketing, social media)
- MLX for text (supports Egyptian Arabic marketing copy)
- ComfyUI for images
- ComfyUI for short videos
-
User needs semantic search or RAG system
- Use Embeddings Service (Ubuntu, 1024-dim) OR MLX (Mac Mini, 384-dim)
- Implement search using cosine similarity
-
User needs voice features
- STT: Whisper (Ubuntu) - supports Arabic & 100+ languages
- TTS: XTTS-v2 (Ubuntu) - supports Arabic & 15+ languages
-
User asks about AI capabilities
- First explain what's available internally
- Only suggest external APIs if internal services can't fulfill the requirement
How to Use This Folder
1. Read the Documentation
Start here:
API_INTEGRATION_GUIDE.md- Complete API reference for all services
Network & Infrastructure:
NETWORK_TOPOLOGY.md- Server locations, IPs, ports, connectivityMACHINES_INFRASTRUCTURE.md- Hardware specs, capabilities, limitations
Skills (Reusable Code):
skills/mlx-chat-skill.md- MLX text generation integrationskills/comfyui-image-skill.md- ComfyUI image generationskills/whisper-stt-skill.md- Speech-to-text integrationskills/tts-skill.md- Text-to-speech integrationskills/embeddings-skill.md- Semantic embeddings & searchskills/job-management-skill.md- Ubuntu job queue systemskills/websocket-skill.md- Real-time updates
2. Implementation Pattern
When implementing a feature that needs AI:
1. Identify required capability (e.g., "need image generation")
2. Check NETWORK_TOPOLOGY.md to find the service
3. Check API_INTEGRATION_GUIDE.md for API details
4. Use the relevant skill file for code templates
5. Implement using the documented endpoints
6. Handle errors and provide fallbacks
3. Example Decision Tree
User Request: "Build a voice chatbot in Arabic"
Step 1: Identify services needed
→ Text Generation (Arabic) = MLX Service
→ Speech-to-Text = Whisper (Ubuntu)
→ Text-to-Speech = XTTS-v2 (Ubuntu)
Step 2: Check network topology
→ MLX: Mac Mini (10.0.0.188:8004)
→ Ubuntu API: 10.0.0.11 or api.proyaro.com
Step 3: Read API docs
→ MLX: /v1/chat/completions with conversation_id
→ STT: POST /jobs with job_type="speech_to_text"
→ TTS: POST /jobs with job_type="text_to_speech"
Step 4: Use skill templates
→ mlx-chat-skill.md - Chat implementation
→ whisper-stt-skill.md - Audio transcription
→ tts-skill.md - Speech synthesis
→ websocket-skill.md - Real-time job updates
Step 5: Implement
→ Build workflow: Audio → STT → MLX → TTS → Audio
→ Use WebSocket for real-time status
→ Handle errors gracefully
Service Priority & Selection
Text Generation
- Primary: MLX Service (Mac Mini) - Fastest, no job queue
- Alternative: Ubuntu /jobs API with job_type="text_generation" (routes to MLX)
Image Generation
- Primary: ComfyUI on Mac Mini - Direct API, fastest for Z-Image Turbo
- Alternative: Ubuntu /jobs API with job_type="image_generation" - GPU queue
Embeddings
- For Arabic/English (smaller scale): MLX Service - 384-dim, fast
- For multilingual (production scale): Ubuntu Embeddings - 1024-dim, GPU-accelerated
Common Patterns & Best Practices
1. Authentication
All services require JWT authentication:
# Login to get token
POST /api/auth/login
{ "email": "admin@a2zadd.com", "password": "..." }
# Use token in requests
Authorization: Bearer <token>
2. Job-Based Services (Ubuntu)
Use for: STT, TTS, Image Gen (Ubuntu), Text Gen (via MLX)
// Create job
const job = await fetch('/jobs', {
method: 'POST',
body: JSON.stringify({
job_type: 'speech_to_text',
parameters: { ... }
})
});
// Poll or use WebSocket for status
const result = await pollJobStatus(job.id);
// OR
websocket.onmessage = (update) => { ... };
3. Direct API Services (Mac Mini)
Use for: MLX Chat, ComfyUI (direct)
// Direct request, immediate response (or streaming)
const response = await fetch('http://localhost:8004/v1/chat/completions', {
method: 'POST',
body: JSON.stringify({
prompt: "...",
max_tokens: 500
})
});
4. Error Handling
Always implement:
- Service health checks before critical operations
- Retry logic with exponential backoff
- Graceful degradation (fallback to simpler models)
- Clear error messages to users
5. Performance Optimization
- Use conversation_id for multi-turn chat (MLX)
- Batch embeddings requests (max 1000 texts)
- Set appropriate max_tokens (don't over-generate)
- Cache generated content when possible
- Use WebSocket to avoid polling overhead
Project Types & Service Recommendations
Social Media Management App
- Text: MLX (Egyptian Arabic marketing copy)
- Images: ComfyUI (Mac Mini) - Z-Image Turbo
- Videos: ComfyUI (Mac Mini) - Wan2.2-T2V
- Architecture: Frontend → Mac Mini Backend → MLX + ComfyUI
Customer Support Chatbot
- Text: MLX with conversation memory
- Voice: Whisper (STT) + XTTS-v2 (TTS)
- Knowledge Base: Embeddings for semantic search
- Architecture: Frontend → Ubuntu Backend → MLX + Whisper + TTS
Content Generation Platform
- Text: MLX (multiple models available)
- Images: ComfyUI (both servers)
- Embeddings: Ubuntu Embeddings (1024-dim)
- Architecture: Frontend → Ubuntu Backend (job queue) → All services
Voice Assistant App
- Speech Recognition: Whisper (Arabic optimized)
- NLU: MLX
- Speech Synthesis: XTTS-v2 (Arabic voices)
- Architecture: Mobile/Web → Ubuntu Backend → Whisper + MLX + TTS
Testing & Validation
Before recommending an implementation:
-
Verify service availability:
# Mac Mini curl http://localhost:8004/health curl http://localhost:8188/system_stats # Ubuntu curl https://api.proyaro.com/health -
Check example implementations:
- Refer to Integration Examples in API_INTEGRATION_GUIDE.md
- Use skill templates as starting points
-
Consider constraints:
- Mac Mini: Limited to M4 Mac resources (192GB RAM)
- Ubuntu: GPU queue (single GPU shared)
- Network: Internal 10.0.0.x network + external HTTPS
Important Notes
Limitations to Communicate:
-
Mac Mini MLX Service:
- Apple Silicon optimized only
- Model switching is slow (10-30 seconds)
- Memory limits: 4-bit models preferred
- Video generation: Max 41 frames
-
Ubuntu GPU Services:
- Single NVIDIA GPU shared across services
- Jobs are queued (not instant)
- ComfyUI may have wait times under load
-
Network:
- Internal services: 10.0.0.x network only
- External access: Through Ubuntu reverse proxy (Caddy)
- Mac Mini services: Not exposed externally (by design)
Security Considerations:
-
Never expose:
- Internal IP addresses in production code
- JWT tokens in client-side code
- Direct access to MLX/ComfyUI without auth
-
Always use:
- Environment variables for endpoints
- Secure token storage (httpOnly cookies)
- Ubuntu backend as API gateway for external apps
Quick Reference Table
| Capability | Service | Location | Port | Protocol | Job-based |
|---|---|---|---|---|---|
| Text Gen (Arabic/English) | MLX | Mac Mini | 8004 | HTTP | No |
| Image Gen (Fast) | ComfyUI | Mac Mini | 8188 | HTTP | No |
| Image Gen (GPU) | ComfyUI | Ubuntu | 8188* | HTTP | Yes |
| Video Gen | ComfyUI | Mac Mini | 8188 | HTTP | No |
| Speech-to-Text | Whisper | Ubuntu | 8001* | HTTP | Yes |
| Text-to-Speech | XTTS-v2 | Ubuntu | 8002* | HTTP | Yes |
| Embeddings (384-dim) | MLX | Mac Mini | 8004 | HTTP | No |
| Embeddings (1024-dim) | E5-Large | Ubuntu | 8003* | HTTP | Yes |
| Job Management | FastAPI | Ubuntu | 8000/443 | HTTP/HTTPS | - |
| WebSocket | FastAPI | Ubuntu | 8000/443 | WS/WSS | - |
* Internal Docker network only, access via Ubuntu backend API
Getting Started Checklist
When starting a new project:
- Read this instruction file completely
- Review NETWORK_TOPOLOGY.md for infrastructure
- Check MACHINES_INFRASTRUCTURE.md for capabilities
- Bookmark API_INTEGRATION_GUIDE.md for reference
- Identify required AI capabilities
- Select appropriate services from the table above
- Review relevant skill files for code templates
- Implement with proper error handling
- Test service availability before deployment
- Document any new patterns in your project
Support & Updates
- Documentation Location:
/Users/yaro/Documents/a2zadd/master-instruction/ - API Guide: Always refer to latest
API_INTEGRATION_GUIDE.md - Questions: Ask the user (Yaro) about infrastructure availability
- Updates: This instruction set may be updated as services evolve
Final Reminder for AI Assistants
🎯 Your primary goal: Help users build amazing applications using the available ProYaro infrastructure. Always prefer internal services over external APIs. Make integration easy by providing complete, working code examples using the skills and documentation provided.
✨ Be proactive: Suggest AI features that leverage these services when appropriate for the user's project.
🔧 Be practical: Provide full implementations, not just suggestions. Use the skill files as templates.
📚 Stay current: Always refer to the latest documentation in this folder.
Last Updated: 2025-01-01 Version: 1.0
ProYaro AI Infrastructure Documentation • Version 1.2