Getting StartedCore Concepts

Core Concepts

Understanding the key concepts in Agentary JS will help you build powerful AI applications with flexible deployment options.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│              Your Application                           │
├─────────────────────────────────────────────────────────┤
│            Agentary JS Session                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │        Inference Provider Manager                 │  │
│  │  ┌──────────────┐         ┌──────────────┐        │  │
│  │  │ Device       │         │ Cloud        │        │  │
│  │  │ Provider     │         │ Provider     │        │  │
│  │  │ (WebGPU/WASM)│         │ (HTTP Proxy) │        │  │
│  │  └──────────────┘         └──────────────┘        │  │
│  └───────────────────────────────────────────────────┘  │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │      Workflow Engine                              │  │
│  │   (for agentic workflows)                         │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
         │                              │
         ▼                              ▼
    WebGPU/WASM                   Cloud LLM APIs
    ONNX Models                   (OpenAI, Anthropic)

Sessions

A Session is the main interface for interacting with language models. It manages:

  • Provider initialization and configuration
  • Model inference (device or cloud)
  • Resource cleanup and connection management

Types of Sessions

Basic Session

import { createSession } from 'agentary-js';
const session = await createSession({
  models: [
    { runtime: 'transformers-js', model: 'onnx-community/Qwen3-0.6B-ONNX', quantization: 'q4', engine: 'webgpu' },
    { runtime: 'anthropic', model: 'claude-3-5-sonnet-20241022', proxyUrl: 'https://api.example.com/anthropic', modelProvider: 'anthropic' }
  ]
});

Use for simple text generation and tool calling.

Agent Session

import { createAgentSession } from 'agentary-js';
const agent = await createAgentSession({ /* config */ });

Use for multi-step workflows with memory and state management.

Providers

Agentary JS uses a provider architecture that supports both on-device and cloud-based inference.

Provider Types

Device Provider

Runs models locally in the browser using WebGPU or WebAssembly:

{
  runtime: 'transformers-js',
  model: 'onnx-community/Qwen3-0.6B-ONNX',
  quantization: 'q4',  // q4, q8, fp16, fp32
  engine: 'webgpu',    // webgpu, wasm, auto
  hfToken?: string     // Optional: for private models
}

Benefits:

  • Complete privacy (data never leaves the browser)
  • Zero server costs
  • Offline capability
  • Low latency after initial load

Considerations:

  • Requires model download (100MB-500MB)
  • Limited to smaller models
  • Browser compatibility varies

Cloud Provider

Uses cloud LLM APIs via a secure proxy:

{
  runtime: 'anthropic',
  model: 'claude-3-5-sonnet-20241022',
  proxyUrl: 'https://your-backend.com/api/anthropic',
  modelProvider: 'anthropic',  // 'anthropic' or 'openai'
  timeout: 30000,
  maxRetries: 3
}

Benefits:

  • Access to powerful models (GPT-4, Claude)
  • No browser limitations
  • Instant startup
  • Regular model updates

Considerations:

  • Requires backend proxy
  • API costs per request
  • Network latency
  • Requires internet connection

Multi-Provider Setup

You can register multiple providers and choose which to use for each request:

const session = await createSession({
  models: [
    // Fast on-device model for quick tasks
    {
      runtime: 'transformers-js',
      model: 'onnx-community/Qwen3-0.6B-ONNX',
      quantization: 'q4',
      engine: 'webgpu'
    },
    // Powerful cloud model for complex tasks
    {
      runtime: 'anthropic',
      model: 'claude-3-5-sonnet-20241022',
      proxyUrl: 'https://api.example.com/anthropic',
      modelProvider: 'anthropic'
    }
  ]
});
 
// Use device model for quick response
const quickResponse = await session.createResponse('onnx-community/Qwen3-0.6B-ONNX', {
  messages: [{ role: 'user', content: 'Quick question' }]
});
 
// Use cloud model for complex task
const complexResponse = await session.createResponse('claude-3-5-sonnet-20241022', {
  messages: [{ role: 'user', content: 'Complex analysis' }]
});

Messages

Messages follow the standard chat format:

const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'Hello!' },
  { role: 'assistant', content: 'Hi! How can I help?' },
  { role: 'user', content: 'Tell me a joke.' }
];

Roles:

  • system: Instructions for the model’s behavior
  • user: Input from the user
  • assistant: Model’s previous responses

Response Types

Agentary JS supports both streaming and non-streaming responses, depending on the provider and configuration.

Streaming Response

Process tokens as they’re generated:

const response = await session.createResponse(modelId, { messages });
 
if (response.type === 'streaming') {
  for await (const chunk of response.stream) {
    console.log(chunk.token);
  }
}

Benefits:

  • Better UX (show progress)
  • Faster time to first byte
  • Cancellable generation
  • Lower perceived latency

Non-Streaming Response

Get the complete response at once:

const response = await session.createResponse(modelId, { messages });
 
if (response.type === 'complete') {
  console.log(response.content);
  console.log('Tokens used:', response.usage);
}

Benefits:

  • Simpler to handle
  • Complete token usage stats
  • Tool call information
  • Finish reason available

Device Provider Internals

Device providers use Web Workers to avoid blocking the main thread:

// Main thread continues running while model generates
const generationPromise = (async () => {
  const response = await session.createResponse(deviceModelId, { messages });
  if (response.type === 'streaming') {
    for await (const chunk of response.stream) {
      updateUI(chunk.token);
    }
  }
})();
 
// UI remains responsive
updateProgressBar();

Model Loading

Device providers handle model loading and initialization automatically:

// Models are validated and loaded during session creation
const session = await createSession({
  models: [{
    runtime: 'transformers-js',
    model: 'onnx-community/Qwen3-0.6B-ONNX',
    quantization: 'q4',
    engine: 'webgpu'
  }]
});
 
// Monitor loading progress
session.on('worker:init:progress', (event) => {
  console.log(`Loading ${event.modelName}: ${event.progress}%`);
});
 
session.on('worker:init:complete', (event) => {
  console.log(`${event.modelName} ready in ${event.duration}ms`);
});

See the Model Support documentation for supported models.

Memory Management

Agentary JS provides built-in memory management for long conversations. Configure memory when calling runWorkflow():

const agent = await createAgentSession({
  models: [
    { 
      runtime: 'transformers-js',
      model: 'onnx-community/Qwen3-0.6B-ONNX',
      quantization: 'q4',
      engine: 'webgpu'
    }
  ]
});
 
const workflow = {
  id: 'my-workflow',
  steps: [ /* ... */ ],
  tools: [ /* ... */ ]
};
 
const memoryConfig = {
  maxTokens: 2048,              // Maximum context size
  compressionThreshold: 0.8,    // Compress at 80% capacity
  memoryCompressorConfig: {
    name: 'sliding-window'      // or 'summarization'
  }
};
 
for await (const result of agent.runWorkflow(prompt, workflow, memoryConfig)) {
  // Process workflow results
}

Lifecycle Events

Subscribe to events for debugging and monitoring:

session.on('generation:start', (event) => {
  console.log('Started generating...');
});
 
session.on('generation:token', (event) => {
  console.log(event.token);
});
 
session.on('generation:complete', (event) => {
  console.log(`Generated ${event.totalTokens} tokens`);
});

Tools (Function Calling)

Define tools that the model can call:

const tools = [{
  definition: {
    name: 'get_weather',
    description: 'Get current weather',
    parameters: {
      type: 'object',
      properties: {
        city: { 
          type: 'string',
          description: 'City name'
        }
      },
      required: ['city']
    }
  },
  implementation: async ({ city }) => {
    // Your implementation
    const data = { temp: 72, condition: 'sunny' };
    return JSON.stringify(data);
  }
}];
 
const response = await session.createResponse(modelId, {
  messages,
  tools
});
 
if (response.type === 'streaming') {
  for await (const chunk of response.stream) {
    console.log(chunk.token);
  }
}

Next Steps