Skill

Embeddings (Semantic Search)

Embeddings & Semantic Search Integration Skill

Overview

Two embeddings services available for different use cases.

Service	Dimensions	Best For	Speed
MLX (Mac)	384	Small-scale, dev	Very fast
E5-Large (Ubuntu)	1024	Production, multilingual	GPU-accelerated

MLX Embeddings (Mac Mini)

class MLXEmbeddings {
  private baseURL: string;

  constructor(baseURL = 'http://localhost:8004') {
    this.baseURL = baseURL;
  }

  async embed(text: string): Promise<number[]> {
    const response = await fetch(`${this.baseURL}/v1/embeddings`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ text }),
    });
    const data = await response.json();
    return data.embedding; // 384-dimensional vector
  }

  cosineSimilarity(a: number[], b: number[]): number {
    const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
    const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
    const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
    return dotProduct / (magA * magB);
  }
}

// Usage
const mlx = new MLXEmbeddings();
const emb1 = await mlx.embed('منتجات العناية بالبشرة');
const emb2 = await mlx.embed('كريمات الوجه');
const similarity = mlx.cosineSimilarity(emb1, emb2);
console.log('Similarity:', similarity); // 0.0 to 1.0

Ubuntu Embeddings (GPU, 1024-dim)

import { JobClient } from './job-management-skill';

class E5Embeddings {
  private jobClient: JobClient;

  constructor(token: string) {
    this.jobClient = new JobClient('https://api.proyaro.com', token);
  }

  async embed(
    texts: string[],
    instruction?: 'query:' | 'passage:'
  ): Promise<number[][]> {
    const job = await this.jobClient.createJob({
      job_type: 'embedding',
      parameters: {
        texts,
        normalize: true,
        instruction: instruction || undefined,
      },
    });

    const result = await this.jobClient.waitForJob(job.id);
    return result.result_data.embeddings; // 1024-dimensional vectors
  }
}

// Usage
const e5 = new E5Embeddings('your-token');

// For documents (to be indexed)
const docEmbeddings = await e5.embed(
  ['Document 1', 'Document 2'],
  'passage:'
);

// For search queries
const queryEmbedding = await e5.embed(
  ['search query'],
  'query:'
);

Semantic Search Implementation

class SemanticSearch {
  private documents: string[] = [];
  private embeddings: number[][] = [];
  private embeddingService: E5Embeddings;

  constructor(token: string) {
    this.embeddingService = new E5Embeddings(token);
  }

  async indexDocuments(docs: string[]) {
    this.documents = docs;
    this.embeddings = await this.embeddingService.embed(docs, 'passage:');
  }

  async search(query: string, topK = 5): Promise<Array<{ doc: string; score: number }>> {
    const [queryEmb] = await this.embeddingService.embed([query], 'query:');

    const scores = this.embeddings.map((docEmb, i) => ({
      doc: this.documents[i],
      score: this.cosineSimilarity(queryEmb, docEmb),
    }));

    return scores.sort((a, b) => b.score - a.score).slice(0, topK);
  }

  private cosineSimilarity(a: number[], b: number[]): number {
    return a.reduce((sum, val, i) => sum + val * b[i], 0);
  }
}

// Usage
const search = new SemanticSearch('your-token');

await search.indexDocuments([
  'كريم مرطب للبشرة الجافة',
  'شامبو للشعر التالف',
  'واقي شمس SPF 50',
]);

const results = await search.search('منتج للبشرة الجافة');
results.forEach(r => console.log(`${r.score.toFixed(3)}: ${r.doc}`));

Best Practices

Choose MLX when:

Development/testing
Small-scale applications
Speed is critical
Simple similarity matching

Choose E5-Large when:

Production applications
Multilingual content
Higher accuracy needed
Large-scale search (100k+ docs)

Instructions:

Use query: for search queries
Use passage: for documents to index
Omit for general similarity

Version: 1.0

ProYaro AI Infrastructure Documentation • Version 1.2