Skill
Embeddings (Semantic Search)
Embeddings & Semantic Search Integration Skill
Overview
Two embeddings services available for different use cases.
| Service | Dimensions | Best For | Speed |
|---|---|---|---|
| MLX (Mac) | 384 | Small-scale, dev | Very fast |
| E5-Large (Ubuntu) | 1024 | Production, multilingual | GPU-accelerated |
MLX Embeddings (Mac Mini)
class MLXEmbeddings {
private baseURL: string;
constructor(baseURL = 'http://localhost:8004') {
this.baseURL = baseURL;
}
async embed(text: string): Promise<number[]> {
const response = await fetch(`${this.baseURL}/v1/embeddings`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text }),
});
const data = await response.json();
return data.embedding; // 384-dimensional vector
}
cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magA * magB);
}
}
// Usage
const mlx = new MLXEmbeddings();
const emb1 = await mlx.embed('منتجات العناية بالبشرة');
const emb2 = await mlx.embed('كريمات الوجه');
const similarity = mlx.cosineSimilarity(emb1, emb2);
console.log('Similarity:', similarity); // 0.0 to 1.0
Ubuntu Embeddings (GPU, 1024-dim)
import { JobClient } from './job-management-skill';
class E5Embeddings {
private jobClient: JobClient;
constructor(token: string) {
this.jobClient = new JobClient('https://api.proyaro.com', token);
}
async embed(
texts: string[],
instruction?: 'query:' | 'passage:'
): Promise<number[][]> {
const job = await this.jobClient.createJob({
job_type: 'embedding',
parameters: {
texts,
normalize: true,
instruction: instruction || undefined,
},
});
const result = await this.jobClient.waitForJob(job.id);
return result.result_data.embeddings; // 1024-dimensional vectors
}
}
// Usage
const e5 = new E5Embeddings('your-token');
// For documents (to be indexed)
const docEmbeddings = await e5.embed(
['Document 1', 'Document 2'],
'passage:'
);
// For search queries
const queryEmbedding = await e5.embed(
['search query'],
'query:'
);
Semantic Search Implementation
class SemanticSearch {
private documents: string[] = [];
private embeddings: number[][] = [];
private embeddingService: E5Embeddings;
constructor(token: string) {
this.embeddingService = new E5Embeddings(token);
}
async indexDocuments(docs: string[]) {
this.documents = docs;
this.embeddings = await this.embeddingService.embed(docs, 'passage:');
}
async search(query: string, topK = 5): Promise<Array<{ doc: string; score: number }>> {
const [queryEmb] = await this.embeddingService.embed([query], 'query:');
const scores = this.embeddings.map((docEmb, i) => ({
doc: this.documents[i],
score: this.cosineSimilarity(queryEmb, docEmb),
}));
return scores.sort((a, b) => b.score - a.score).slice(0, topK);
}
private cosineSimilarity(a: number[], b: number[]): number {
return a.reduce((sum, val, i) => sum + val * b[i], 0);
}
}
// Usage
const search = new SemanticSearch('your-token');
await search.indexDocuments([
'كريم مرطب للبشرة الجافة',
'شامبو للشعر التالف',
'واقي شمس SPF 50',
]);
const results = await search.search('منتج للبشرة الجافة');
results.forEach(r => console.log(`${r.score.toFixed(3)}: ${r.doc}`));
Best Practices
Choose MLX when:
- Development/testing
- Small-scale applications
- Speed is critical
- Simple similarity matching
Choose E5-Large when:
- Production applications
- Multilingual content
- Higher accuracy needed
- Large-scale search (100k+ docs)
Instructions:
- Use
query:for search queries - Use
passage:for documents to index - Omit for general similarity
Version: 1.0
ProYaro AI Infrastructure Documentation • Version 1.2