HTTP or Cron trigger
Scraping and data extraction at the edge
Pair an HTTP or cron trigger with a fetch and you have a data pipeline: scrape a page, aggregate an RSS feed, unfurl a URL, or normalize a third-party API. Every outbound request is SSRF-filtered and byte-counted automatically.
The problem
Scrapers are simple to write and a pain to operate: they need scheduling, retries, egress control, and somewhere safe to run untrusted fetches. Running them on your own servers risks SSRF and surprise bandwidth bills.
How hostfunc solves it
- Every outbound fetch passes through an egress worker that blocks private-network targets and counts bytes.
- Run on demand via HTTP or on a schedule via cron — same function, just a trigger change.
- Parse and transform inline; return clean JSON other functions or clients can consume.
- Per-execution metrics show exactly how much egress and CPU each scrape used.
rss-aggregate.ts
// Fetch and normalize an RSS feed into clean JSON.
export async function main(input: { feedUrl: string }) {
const xml = await fetch(input.feedUrl).then((r) => r.text());
const items = [...xml.matchAll(/<item>([\s\S]*?)<\/item>/g)].map((m) => ({
title: /<title>(.*?)<\/title>/.exec(m[1])?.[1] ?? "",
link: /<link>(.*?)<\/link>/.exec(m[1])?.[1] ?? "",
}));
return { count: items.length, items: items.slice(0, 20) };
}