Web context
The @vertana/context-web package provides context sources that fetch and extract content from web pages. This is useful when translating documents that reference external articles or resources.
Installation
deno add jsr:@vertana/context-webnpm add @vertana/context-webpnpm add @vertana/context-webyarn add @vertana/context-webbun add @vertana/context-webOverview
This package provides three main context sources:
fetchWebPage- A passive context source that fetches a single URL on demand. The LLM can call this tool when it needs additional context.
searchWeb- A passive context source that performs a web search and returns a list of results (title, URL, snippet).
fetchLinkedPages- A required context source factory that extracts links from the source text and fetches their content before translation begins.
fetchWebPage and fetchLinkedPages use Mozilla's Readability algorithm to extract the main article content from web pages, filtering out navigation, ads, and other noise.
TIP
Prefer the passive sources (fetchWebPage, searchWeb) for most use cases. The translator only invokes them when it actually needs the information, which keeps the prompt focused on the source text. The required fetchLinkedPages helper is convenient for short, trusted link sets, but pulling many large pages into required context can cause the translator to echo a fetched page back as the translation; see the warning below before using it on large or untrusted documents.
fetchWebPage
A passive context source that the LLM can invoke when it needs to fetch a specific URL.
import { translate } from "@vertana/facade";
import { fetchWebPage } from "@vertana/context-web";
const text = `
This article discusses the concept explained at https://example.com/guide.
`;
const result = await translate(model, "ko", text, {
contextSources: [fetchWebPage],
});When the LLM encounters a reference it wants to understand better, it can call the fetch-web-page tool with the URL to retrieve the page content. If the fetched page may be large, configure the passive source with caps or summarization:
import { translate } from "@vertana/facade";
import { fetchWebPage } from "@vertana/context-web";
const text = `
This article discusses the concept explained at https://example.com/guide.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchWebPage({
maxCharsPerPage: 2000,
maxTotalChars: 2500,
summarize: { model: summarizerModel, maxChars: 800 },
}),
],
});fetchLinkedPages
A factory function that creates a required context source. It extracts URLs from the source text and fetches their content before translation begins. By default it fetches up to ten links (configurable via maxLinks).
WARNING
Pulling many large pages into required context can confuse the translator: when the combined reference material is much larger than the source text, and especially when it is in the target language, the model may echo a fetched page back instead of translating the actual input. For large or untrusted link sets, prefer the passive fetchWebPage source so the translator only fetches what it actually needs.
import { translate } from "@vertana/facade";
import { fetchLinkedPages } from "@vertana/context-web";
const text = `
Check out https://example.com/article for background.
Also see https://example.com/reference for more details.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchLinkedPages({
text,
mediaType: "text/plain",
maxCharsPerPage: 2000,
maxTotalChars: 6000,
}),
],
});Options
text- The source text to extract links from.
mediaType- The media type of the text (
"text/plain","text/markdown", or"text/html"). This affects how links are extracted. maxLinks- Maximum number of links to fetch. Defaults to
10. timeout- Timeout for each fetch request in milliseconds. Defaults to
10000. maxCharsPerPage- Maximum number of characters to keep from each fetched page body before formatting it as context.
maxTotalChars- Maximum number of characters to keep from the combined formatted context across all fetched pages.
summarize- Summarization settings for each fetched page. Pass
{ model: summarizerModel, maxChars?: number }; baretrueis not supported because the helper cannot infer which model to use.
import { translate } from "@vertana/facade";
import { fetchLinkedPages } from "@vertana/context-web";
const text = `
Check out https://example.com/article for background.
Also see https://example.com/reference for more details.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchLinkedPages({
text,
mediaType: "text/plain",
summarize: { model: summarizerModel, maxChars: 800 },
}),
],
});searchWeb
A passive context source that the LLM can invoke when it needs to perform web search. This source only returns a list of results and does not fetch any result pages.
import { translate } from "@vertana/facade";
import { searchWeb } from "@vertana/context-web";
const text = "Please translate this and cite sources.";
const result = await translate(model, "ko", text, {
contextSources: [searchWeb],
});When the LLM needs to find a relevant resource, it can call the search-web tool with a query to obtain a list of results (title, URL, snippet).
Options
query- The search query keyword(s).
maxResults- Maximum number of results to return. Defaults to
10. region- DuckDuckGo region parameter (
kl), e.g."kr-kr"or"us-en". timeRange- Time range filter (
df):"d"(day),"w"(week),"m"(month),"y"(year).
Combining sources
For most cases, combining the two passive sources gives the LLM enough flexibility to gather context only when it actually helps the translation:
fetchWebPagelets the LLM fetch a specific URL for more detail.searchWebhelps the LLM find relevant pages when the input has no links.
import { translate } from "@vertana/facade";
import { fetchWebPage, searchWeb } from "@vertana/context-web";
const text = `
Read the introduction at https://example.com/intro.
`;
const result = await translate(model, "ko", text, {
contextSources: [
// The LLM may fetch a specific URL when it needs more context.
fetchWebPage,
// The LLM may run a web search when it needs more context.
searchWeb,
],
});If the source text has a small, trusted set of links you want pulled in up-front, you can add fetchLinkedPages alongside the passive sources. Mind the warning in the fetchLinkedPages section above before doing so on large or untrusted documents:
import { translate } from "@vertana/facade";
import { fetchLinkedPages, fetchWebPage, searchWeb } from "@vertana/context-web";
const text = `
Read the introduction at https://example.com/intro.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchLinkedPages({ text, mediaType: "text/plain" }),
fetchWebPage,
searchWeb,
],
});extractLinks utility
The extractLinks function extracts URLs from text. It's used internally by fetchLinkedPages but is also exported for custom use cases.
import { extractLinks } from "@vertana/context-web";
// From plain text
const plainUrls = extractLinks(
"Check https://example.com for info.",
"text/plain"
);
// => ["https://example.com"]
// From Markdown
const mdUrls = extractLinks(
"See [this article](https://example.com/article).",
"text/markdown"
);
// => ["https://example.com/article"]
// From HTML
const htmlUrls = extractLinks(
'<a href="https://example.com">Link</a>',
"text/html"
);
// => ["https://example.com"]CLI usage
The Vertana CLI includes the -L or --fetch-links flag that enables web context fetching:
vertana translate -t ko -L document.mdThis automatically:
- Extracts links from the input document.
- Fetches and extracts content from those linked pages.
- Provides the content as context for translation.
This flag wires up fetchLinkedPages, so the same caveat applies: it is appropriate for documents with a short set of links you trust, but on inputs with many large linked pages the fetched material can dominate the prompt and cause the translator to echo a fetched page back as the translation. Treat -L as an opt-in convenience for those cases, not as a default.
See the CLI reference for more details.