HTTP or Cron trigger

Scraping and data extraction at the edge

Pair an HTTP or cron trigger with a fetch and you have a data pipeline: scrape a page, aggregate an RSS feed, unfurl a URL, or normalize a third-party API. Every outbound request is SSRF-filtered and byte-counted automatically.

The problem

Scrapers are simple to write and a pain to operate: they need scheduling, retries, egress control, and somewhere safe to run untrusted fetches. Running them on your own servers risks SSRF and surprise bandwidth bills.

How hostfunc solves it

Every outbound fetch passes through an egress worker that blocks private-network targets and counts bytes.
Run on demand via HTTP or on a schedule via cron — same function, just a trigger change.
Parse and transform inline; return clean JSON other functions or clients can consume.
Per-execution metrics show exactly how much egress and CPU each scrape used.

rss-aggregate.ts

// Fetch and normalize an RSS feed into clean JSON.
export async function main(input: { feedUrl: string }) {
  const xml = await fetch(input.feedUrl).then((r) => r.text());
  const items = [...xml.matchAll(/<item>([\s\S]*?)<\/item>/g)].map((m) => ({
    title: /<title>(.*?)<\/title>/.exec(m[1])?.[1] ?? "",
    link: /<link>(.*?)<\/link>/.exec(m[1])?.[1] ?? "",
  }));
  return { count: items.length, items: items.slice(0, 20) };
}

Build this in minutes

Start building

Scraping and data extraction at the edge

The problem

How hostfunc solves it

Related documentation

Build this in minutes