Core Concepts
Understanding the key concepts in Agentary JS will help you build powerful AI applications with flexible deployment options.
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────┤
│ Agentary JS Session │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Inference Provider Manager │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Device │ │ Cloud │ │ │
│ │ │ Provider │ │ Provider │ │ │
│ │ │ (WebGPU/WASM)│ │ (HTTP Proxy) │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Workflow Engine │ │
│ │ (for agentic workflows) │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │
▼ ▼
WebGPU/WASM Cloud LLM APIs
ONNX Models (OpenAI, Anthropic)Sessions
A Session is the main interface for interacting with language models. It manages:
- Provider initialization and configuration
- Model inference (device or cloud)
- Resource cleanup and connection management
Types of Sessions
Basic Session
import { createSession } from 'agentary-js';
const session = await createSession({
models: [
{ runtime: 'transformers-js', model: 'onnx-community/Qwen3-0.6B-ONNX', quantization: 'q4', engine: 'webgpu' },
{ runtime: 'anthropic', model: 'claude-3-5-sonnet-20241022', proxyUrl: 'https://api.example.com/anthropic', modelProvider: 'anthropic' }
]
});Use for simple text generation and tool calling.
Agent Session
import { createAgentSession } from 'agentary-js';
const agent = await createAgentSession({ /* config */ });Use for multi-step workflows with memory and state management.
Providers
Agentary JS uses a provider architecture that supports both on-device and cloud-based inference.
Provider Types
Device Provider
Runs models locally in the browser using WebGPU or WebAssembly:
{
runtime: 'transformers-js',
model: 'onnx-community/Qwen3-0.6B-ONNX',
quantization: 'q4', // q4, q8, fp16, fp32
engine: 'webgpu', // webgpu, wasm, auto
hfToken?: string // Optional: for private models
}Benefits:
- Complete privacy (data never leaves the browser)
- Zero server costs
- Offline capability
- Low latency after initial load
Considerations:
- Requires model download (100MB-500MB)
- Limited to smaller models
- Browser compatibility varies
Cloud Provider
Uses cloud LLM APIs via a secure proxy:
{
runtime: 'anthropic',
model: 'claude-3-5-sonnet-20241022',
proxyUrl: 'https://your-backend.com/api/anthropic',
modelProvider: 'anthropic', // 'anthropic' or 'openai'
timeout: 30000,
maxRetries: 3
}Benefits:
- Access to powerful models (GPT-4, Claude)
- No browser limitations
- Instant startup
- Regular model updates
Considerations:
- Requires backend proxy
- API costs per request
- Network latency
- Requires internet connection
Multi-Provider Setup
You can register multiple providers and choose which to use for each request:
const session = await createSession({
models: [
// Fast on-device model for quick tasks
{
runtime: 'transformers-js',
model: 'onnx-community/Qwen3-0.6B-ONNX',
quantization: 'q4',
engine: 'webgpu'
},
// Powerful cloud model for complex tasks
{
runtime: 'anthropic',
model: 'claude-3-5-sonnet-20241022',
proxyUrl: 'https://api.example.com/anthropic',
modelProvider: 'anthropic'
}
]
});
// Use device model for quick response
const quickResponse = await session.createResponse('onnx-community/Qwen3-0.6B-ONNX', {
messages: [{ role: 'user', content: 'Quick question' }]
});
// Use cloud model for complex task
const complexResponse = await session.createResponse('claude-3-5-sonnet-20241022', {
messages: [{ role: 'user', content: 'Complex analysis' }]
});Messages
Messages follow the standard chat format:
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
{ role: 'assistant', content: 'Hi! How can I help?' },
{ role: 'user', content: 'Tell me a joke.' }
];Roles:
system: Instructions for the model’s behavioruser: Input from the userassistant: Model’s previous responses
Response Types
Agentary JS supports both streaming and non-streaming responses, depending on the provider and configuration.
Streaming Response
Process tokens as they’re generated:
const response = await session.createResponse(modelId, { messages });
if (response.type === 'streaming') {
for await (const chunk of response.stream) {
console.log(chunk.token);
}
}Benefits:
- Better UX (show progress)
- Faster time to first byte
- Cancellable generation
- Lower perceived latency
Non-Streaming Response
Get the complete response at once:
const response = await session.createResponse(modelId, { messages });
if (response.type === 'complete') {
console.log(response.content);
console.log('Tokens used:', response.usage);
}Benefits:
- Simpler to handle
- Complete token usage stats
- Tool call information
- Finish reason available
Device Provider Internals
Device providers use Web Workers to avoid blocking the main thread:
// Main thread continues running while model generates
const generationPromise = (async () => {
const response = await session.createResponse(deviceModelId, { messages });
if (response.type === 'streaming') {
for await (const chunk of response.stream) {
updateUI(chunk.token);
}
}
})();
// UI remains responsive
updateProgressBar();Model Loading
Device providers handle model loading and initialization automatically:
// Models are validated and loaded during session creation
const session = await createSession({
models: [{
runtime: 'transformers-js',
model: 'onnx-community/Qwen3-0.6B-ONNX',
quantization: 'q4',
engine: 'webgpu'
}]
});
// Monitor loading progress
session.on('worker:init:progress', (event) => {
console.log(`Loading ${event.modelName}: ${event.progress}%`);
});
session.on('worker:init:complete', (event) => {
console.log(`${event.modelName} ready in ${event.duration}ms`);
});See the Model Support documentation for supported models.
Memory Management
Agentary JS provides built-in memory management for long conversations. Configure memory when calling runWorkflow():
const agent = await createAgentSession({
models: [
{
runtime: 'transformers-js',
model: 'onnx-community/Qwen3-0.6B-ONNX',
quantization: 'q4',
engine: 'webgpu'
}
]
});
const workflow = {
id: 'my-workflow',
steps: [ /* ... */ ],
tools: [ /* ... */ ]
};
const memoryConfig = {
maxTokens: 2048, // Maximum context size
compressionThreshold: 0.8, // Compress at 80% capacity
memoryCompressorConfig: {
name: 'sliding-window' // or 'summarization'
}
};
for await (const result of agent.runWorkflow(prompt, workflow, memoryConfig)) {
// Process workflow results
}Lifecycle Events
Subscribe to events for debugging and monitoring:
session.on('generation:start', (event) => {
console.log('Started generating...');
});
session.on('generation:token', (event) => {
console.log(event.token);
});
session.on('generation:complete', (event) => {
console.log(`Generated ${event.totalTokens} tokens`);
});Tools (Function Calling)
Define tools that the model can call:
const tools = [{
definition: {
name: 'get_weather',
description: 'Get current weather',
parameters: {
type: 'object',
properties: {
city: {
type: 'string',
description: 'City name'
}
},
required: ['city']
}
},
implementation: async ({ city }) => {
// Your implementation
const data = { temp: 72, condition: 'sunny' };
return JSON.stringify(data);
}
}];
const response = await session.createResponse(modelId, {
messages,
tools
});
if (response.type === 'streaming') {
for await (const chunk of response.stream) {
console.log(chunk.token);
}
}Next Steps
- Set up a Cloud Provider
- Learn about Tool Calling
- Build Agentic Workflows
- Explore API Reference