Documentation

AI Assistant Instructions

AI Assistant Instructions for ProYaro Infrastructure

Target AI Assistants: Claude (Anthropic), Gemini (Google), and other LLM-based coding assistants

Purpose

This document instructs AI assistants (like Claude Code, Claude API, Gemini, etc.) on how to leverage the available ProYaro AI infrastructure when helping with software development projects. When working on any project that requires AI capabilities, you should utilize these existing services rather than suggesting external APIs.

Core Directive

⚠️ CRITICAL INSTRUCTION:

When assisting with any software development project that requires the following capabilities, you MUST use the ProYaro infrastructure services documented in this folder:

Available AI Capabilities:

Text Generation (Arabic/English) → Use MLX Service on Mac Mini
Image Generation → Use ComfyUI on Mac Mini OR Ubuntu
Video Generation → Use ComfyUI on Mac Mini
Speech-to-Text (Whisper) → Use Ubuntu Server
Text-to-Speech (XTTS-v2) → Use Ubuntu Server
Text Embeddings → Use MLX Service (Mac Mini) OR Embeddings Service (Ubuntu)
Semantic Search → Use Embeddings + Custom Implementation

DO NOT Suggest:

❌ OpenAI API (ChatGPT, GPT-4, DALL-E, Whisper API)
❌ Anthropic Claude API (for text generation)
❌ Google Gemini API (for text generation)
❌ Midjourney, Stable Diffusion Web UI (external)
❌ ElevenLabs, Google TTS (external)
❌ Any external paid AI APIs unless specifically requested

Instead, Use:

✅ MLX FastAPI Service (Mac Mini, Port 8004) - Text generation, embeddings
✅ Ubuntu FastAPI Backend (Ubuntu Server, Port 8000 / HTTPS) - Jobs orchestration
✅ ComfyUI (Mac Mini Port 8188 OR Ubuntu Port 8188) - Image/Video generation
✅ Whisper STT (Ubuntu, Port 8001) - Speech recognition
✅ XTTS-v2 TTS (Ubuntu, Port 8002) - Voice synthesis
✅ Embeddings Service (Ubuntu, Port 8003) - Semantic vectors

When to Use ProYaro Infrastructure

Scenarios Where You MUST Use Internal Services:

User asks for a chatbot/AI assistant
- Use MLX Service for text generation
- Optionally add Whisper (STT) and XTTS (TTS) for voice
User needs image generation
- Use ComfyUI (Mac Mini preferred for Z-Image Turbo)
- Ubuntu ComfyUI for GPU-heavy workloads
User needs content generation (marketing, social media)
- MLX for text (supports Egyptian Arabic marketing copy)
- ComfyUI for images
- ComfyUI for short videos
User needs semantic search or RAG system
- Use Embeddings Service (Ubuntu, 1024-dim) OR MLX (Mac Mini, 384-dim)
- Implement search using cosine similarity
User needs voice features
- STT: Whisper (Ubuntu) - supports Arabic & 100+ languages
- TTS: XTTS-v2 (Ubuntu) - supports Arabic & 15+ languages
User asks about AI capabilities
- First explain what's available internally
- Only suggest external APIs if internal services can't fulfill the requirement

How to Use This Folder

1. Read the Documentation

Start here:

API_INTEGRATION_GUIDE.md - Complete API reference for all services

Network & Infrastructure:

NETWORK_TOPOLOGY.md - Server locations, IPs, ports, connectivity
MACHINES_INFRASTRUCTURE.md - Hardware specs, capabilities, limitations

Skills (Reusable Code):

skills/mlx-chat-skill.md - MLX text generation integration
skills/comfyui-image-skill.md - ComfyUI image generation
skills/whisper-stt-skill.md - Speech-to-text integration
skills/tts-skill.md - Text-to-speech integration
skills/embeddings-skill.md - Semantic embeddings & search
skills/job-management-skill.md - Ubuntu job queue system
skills/websocket-skill.md - Real-time updates

2. Implementation Pattern

When implementing a feature that needs AI:

1. Identify required capability (e.g., "need image generation")
2. Check NETWORK_TOPOLOGY.md to find the service
3. Check API_INTEGRATION_GUIDE.md for API details
4. Use the relevant skill file for code templates
5. Implement using the documented endpoints
6. Handle errors and provide fallbacks

3. Example Decision Tree

User Request: "Build a voice chatbot in Arabic"

Step 1: Identify services needed
  → Text Generation (Arabic) = MLX Service
  → Speech-to-Text = Whisper (Ubuntu)
  → Text-to-Speech = XTTS-v2 (Ubuntu)

Step 2: Check network topology
  → MLX: Mac Mini (10.0.0.188:8004)
  → Ubuntu API: 10.0.0.11 or api.proyaro.com

Step 3: Read API docs
  → MLX: /v1/chat/completions with conversation_id
  → STT: POST /jobs with job_type="speech_to_text"
  → TTS: POST /jobs with job_type="text_to_speech"

Step 4: Use skill templates
  → mlx-chat-skill.md - Chat implementation
  → whisper-stt-skill.md - Audio transcription
  → tts-skill.md - Speech synthesis
  → websocket-skill.md - Real-time job updates

Step 5: Implement
  → Build workflow: Audio → STT → MLX → TTS → Audio
  → Use WebSocket for real-time status
  → Handle errors gracefully

Service Priority & Selection

Text Generation

Primary: MLX Service (Mac Mini) - Fastest, no job queue
Alternative: Ubuntu /jobs API with job_type="text_generation" (routes to MLX)

Image Generation

Primary: ComfyUI on Mac Mini - Direct API, fastest for Z-Image Turbo
Alternative: Ubuntu /jobs API with job_type="image_generation" - GPU queue

Embeddings

For Arabic/English (smaller scale): MLX Service - 384-dim, fast
For multilingual (production scale): Ubuntu Embeddings - 1024-dim, GPU-accelerated

Common Patterns & Best Practices

1. Authentication

All services require JWT authentication:

# Login to get token
POST /api/auth/login
{ "email": "admin@a2zadd.com", "password": "..." }

# Use token in requests
Authorization: Bearer <token>

2. Job-Based Services (Ubuntu)

Use for: STT, TTS, Image Gen (Ubuntu), Text Gen (via MLX)

// Create job
const job = await fetch('/jobs', {
  method: 'POST',
  body: JSON.stringify({
    job_type: 'speech_to_text',
    parameters: { ... }
  })
});

// Poll or use WebSocket for status
const result = await pollJobStatus(job.id);
// OR
websocket.onmessage = (update) => { ... };

3. Direct API Services (Mac Mini)

Use for: MLX Chat, ComfyUI (direct)

// Direct request, immediate response (or streaming)
const response = await fetch('http://localhost:8004/v1/chat/completions', {
  method: 'POST',
  body: JSON.stringify({
    prompt: "...",
    max_tokens: 500
  })
});

4. Error Handling

Always implement:

Service health checks before critical operations
Retry logic with exponential backoff
Graceful degradation (fallback to simpler models)
Clear error messages to users

5. Performance Optimization

Use conversation_id for multi-turn chat (MLX)
Batch embeddings requests (max 1000 texts)
Set appropriate max_tokens (don't over-generate)
Cache generated content when possible
Use WebSocket to avoid polling overhead

Project Types & Service Recommendations

Social Media Management App

Text: MLX (Egyptian Arabic marketing copy)
Images: ComfyUI (Mac Mini) - Z-Image Turbo
Videos: ComfyUI (Mac Mini) - Wan2.2-T2V
Architecture: Frontend → Mac Mini Backend → MLX + ComfyUI

Customer Support Chatbot

Text: MLX with conversation memory
Voice: Whisper (STT) + XTTS-v2 (TTS)
Knowledge Base: Embeddings for semantic search
Architecture: Frontend → Ubuntu Backend → MLX + Whisper + TTS

Content Generation Platform

Text: MLX (multiple models available)
Images: ComfyUI (both servers)
Embeddings: Ubuntu Embeddings (1024-dim)
Architecture: Frontend → Ubuntu Backend (job queue) → All services

Voice Assistant App

Speech Recognition: Whisper (Arabic optimized)
NLU: MLX
Speech Synthesis: XTTS-v2 (Arabic voices)
Architecture: Mobile/Web → Ubuntu Backend → Whisper + MLX + TTS

Testing & Validation

Before recommending an implementation:

Verify service availability:

# Mac Mini
curl http://localhost:8004/health
curl http://localhost:8188/system_stats

# Ubuntu
curl https://api.proyaro.com/health

Check example implementations:
- Refer to Integration Examples in API_INTEGRATION_GUIDE.md
- Use skill templates as starting points
Consider constraints:
- Mac Mini: Limited to M4 Mac resources (192GB RAM)
- Ubuntu: GPU queue (single GPU shared)
- Network: Internal 10.0.0.x network + external HTTPS

Important Notes

Limitations to Communicate:

Mac Mini MLX Service:
- Apple Silicon optimized only
- Model switching is slow (10-30 seconds)
- Memory limits: 4-bit models preferred
- Video generation: Max 41 frames
Ubuntu GPU Services:
- Single NVIDIA GPU shared across services
- Jobs are queued (not instant)
- ComfyUI may have wait times under load
Network:
- Internal services: 10.0.0.x network only
- External access: Through Ubuntu reverse proxy (Caddy)
- Mac Mini services: Not exposed externally (by design)

Security Considerations:

Never expose:
- Internal IP addresses in production code
- JWT tokens in client-side code
- Direct access to MLX/ComfyUI without auth
Always use:
- Environment variables for endpoints
- Secure token storage (httpOnly cookies)
- Ubuntu backend as API gateway for external apps

Quick Reference Table

Capability	Service	Location	Port	Protocol	Job-based
Text Gen (Arabic/English)	MLX	Mac Mini	8004	HTTP	No
Image Gen (Fast)	ComfyUI	Mac Mini	8188	HTTP	No
Image Gen (GPU)	ComfyUI	Ubuntu	8188*	HTTP	Yes
Video Gen	ComfyUI	Mac Mini	8188	HTTP	No
Speech-to-Text	Whisper	Ubuntu	8001*	HTTP	Yes
Text-to-Speech	XTTS-v2	Ubuntu	8002*	HTTP	Yes
Embeddings (384-dim)	MLX	Mac Mini	8004	HTTP	No
Embeddings (1024-dim)	E5-Large	Ubuntu	8003*	HTTP	Yes
Job Management	FastAPI	Ubuntu	8000/443	HTTP/HTTPS	-
WebSocket	FastAPI	Ubuntu	8000/443	WS/WSS	-

* Internal Docker network only, access via Ubuntu backend API

Getting Started Checklist

When starting a new project:

Support & Updates

Documentation Location: /Users/yaro/Documents/a2zadd/master-instruction/
API Guide: Always refer to latest API_INTEGRATION_GUIDE.md
Questions: Ask the user (Yaro) about infrastructure availability
Updates: This instruction set may be updated as services evolve

Final Reminder for AI Assistants

🎯 Your primary goal: Help users build amazing applications using the available ProYaro infrastructure. Always prefer internal services over external APIs. Make integration easy by providing complete, working code examples using the skills and documentation provided.

✨ Be proactive: Suggest AI features that leverage these services when appropriate for the user's project.

🔧 Be practical: Provide full implementations, not just suggestions. Use the skill files as templates.

📚 Stay current: Always refer to the latest documentation in this folder.

Last Updated: 2025-01-01 Version: 1.0

ProYaro AI Infrastructure Documentation • Version 1.2