Web context
The @vertana/context-web package provides context sources that fetch and extract content from web pages. This is useful when translating documents that reference external articles or resources.
Installation
deno add jsr:@vertana/context-webnpm add @vertana/context-webpnpm add @vertana/context-webyarn add @vertana/context-webbun add @vertana/context-webOverview
This package provides three main context sources:
fetchWebPage- A passive context source that fetches a single URL on demand. The LLM can call this tool when it needs additional context.
fetchLinkedPages- A required context source factory that extracts all links from the source text and fetches their content before translation begins.
searchWeb- A passive context source that performs a web search and returns a list of results (title, URL, snippet).
fetchWebPage and fetchLinkedPages use Mozilla's Readability algorithm to extract the main article content from web pages, filtering out navigation, ads, and other noise.
fetchWebPage
A passive context source that the LLM can invoke when it needs to fetch a specific URL.
import { translate } from "@vertana/facade";
import { fetchWebPage } from "@vertana/context-web";
const text = `
This article discusses the concept explained at https://example.com/guide.
`;
const result = await translate(model, "ko", text, {
contextSources: [fetchWebPage],
});When the LLM encounters a reference it wants to understand better, it can call the fetch-web-page tool with the URL to retrieve the page content.
fetchLinkedPages
A factory function that creates a required context source. It extracts all URLs from the source text and fetches their content before translation begins.
import { translate } from "@vertana/facade";
import { fetchLinkedPages } from "@vertana/context-web";
const text = `
Check out https://example.com/article for background.
Also see https://example.com/reference for more details.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchLinkedPages({
text,
mediaType: "text/plain",
}),
],
});Options
text- The source text to extract links from.
mediaType- The media type of the text (
"text/plain","text/markdown", or"text/html"). This affects how links are extracted. maxLinks- Maximum number of links to fetch. Defaults to
10. timeout- Timeout for each fetch request in milliseconds. Defaults to
10000.
searchWeb
A passive context source that the LLM can invoke when it needs to perform web search. This source only returns a list of results and does not fetch any result pages.
import { translate } from "@vertana/facade";
import { searchWeb } from "@vertana/context-web";
const text = "Please translate this and cite sources.";
const result = await translate(model, "ko", text, {
contextSources: [searchWeb],
});When the LLM needs to find a relevant resource, it can call the search-web tool with a query to obtain a list of results (title, URL, snippet).
Options
query- The search query keyword(s).
maxResults- Maximum number of results to return. Defaults to
10. region- DuckDuckGo region parameter (
kl), e.g."kr-kr"or"us-en". timeRange- Time range filter (
df):"d"(day),"w"(week),"m"(month),"y"(year).
Combining sources
For best results, use these sources together:
fetchLinkedPagesprovides context from links already present in the input.searchWebhelps the LLM find relevant pages when the input has no links.fetchWebPagelets the LLM fetch a specific result URL for more detail.
import { translate } from "@vertana/facade";
import { fetchLinkedPages, fetchWebPage, searchWeb } from "@vertana/context-web";
const text = `
Read the introduction at https://example.com/intro.
`;
const result = await translate(model, "ko", text, {
contextSources: [
// Pre-fetch all links in the text
fetchLinkedPages({ text, mediaType: "text/plain" }),
// Allow LLM to search the web and fetch additional URLs on demand
searchWeb,
fetchWebPage,
],
});extractLinks utility
The extractLinks function extracts URLs from text. It's used internally by fetchLinkedPages but is also exported for custom use cases.
import { extractLinks } from "@vertana/context-web";
// From plain text
const plainUrls = extractLinks(
"Check https://example.com for info.",
"text/plain"
);
// => ["https://example.com"]
// From Markdown
const mdUrls = extractLinks(
"See [this article](https://example.com/article).",
"text/markdown"
);
// => ["https://example.com/article"]
// From HTML
const htmlUrls = extractLinks(
'<a href="https://example.com">Link</a>',
"text/html"
);
// => ["https://example.com"]CLI usage
The Vertana CLI includes the -L or --fetch-links flag that enables web context fetching:
vertana translate -t ko -L document.mdThis automatically:
- Extracts all links from the input document
- Fetches and extracts content from linked pages
- Provides the content as context for translation
See the CLI reference for more details.