Quick Start
This guide will walk you through creating your first Agentary JS application.
Basic Text Generation
Create a simple chat session:
import { createSession } from 'agentary-js';
// Create a session with a quantized model
const session = await createSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
}
},
engine: 'webgpu' // or 'wasm'
});
// Generate text with streaming
for await (const chunk of session.createResponse({
messages: [{ role: 'user', content: 'Hello, how are you today?' }]
})) {
if (chunk.isFirst && chunk.ttfbMs) {
console.log(`Time to first byte: ${chunk.ttfbMs}ms`);
}
if (!chunk.isLast) {
process.stdout.write(chunk.token);
}
}
// Clean up resources
await session.dispose();Understanding the Code
1. Create a Session
const session = await createSession({
models: {
chat: {
name: 'onnx-community/gemma-3-270m-it-ONNX',
quantization: 'q4'
}
},
engine: 'webgpu'
});The session initializes the model and manages the Web Worker for inference.
Key options:
models.chat: The model to use for chat completionsquantization: Quantization level (q4, q8, fp16, fp32)engine: Inference engine (webgpu, wasm, auto)
2. Generate Responses
for await (const chunk of session.createResponse({
messages: [{ role: 'user', content: 'Hello!' }]
})) {
process.stdout.write(chunk.token);
}The createResponse method returns an async iterable that yields tokens as they’re generated.
Chunk properties:
token: The generated text tokentokenId: Numeric token IDisFirst: True for the first tokenisLast: True for the final tokenttfbMs: Time to first byte (only on first chunk)
3. Clean Up
await session.dispose();Always dispose of sessions to free up memory and terminate workers.
Configuration Options
Engine Selection
engine: 'auto' // Automatically selects best available
engine: 'webgpu' // Force WebGPU (fastest, but limited browser support)
engine: 'wasm' // Force WebAssembly (universal compatibility)Model Quantization
Quantization reduces model size and improves performance:
| Level | Size | Speed | Quality |
|---|---|---|---|
q4 | Smallest | Fastest | Good |
q8 | Small | Fast | Better |
fp16 | Medium | Moderate | Great |
fp32 | Large | Slow | Best |
Generation Parameters
Control the output with generation parameters:
for await (const chunk of session.createResponse({
messages: [{ role: 'user', content: 'Write a poem' }],
temperature: 0.7, // Randomness (0.0-2.0)
max_new_tokens: 200, // Maximum tokens to generate
top_p: 0.9, // Nucleus sampling
top_k: 50, // Top-k sampling
repetition_penalty: 1.1 // Penalty for repetition
})) {
process.stdout.write(chunk.token);
}Next Steps
- Learn about Core Concepts
- Explore Tool Calling
- Build Agentic Workflows