ostrio:spiderable-middleware

v2.3.0Published 3 weeks ago

Spiderable middleware

Spiderable Middleware is a lightweight Node.js package designed for modern JavaScript web apps — including those built with React, Preact, Vue, Svelte, Angular, Ember, Backbone, Meteor, and others. It ensures that search engine crawlers and social media bots receive fully-rendered HTML, not just an empty skeleton. This bridges the gap between client-rendered apps and the indexing and preview requirements of platforms like Google, Facebook, and Twitter.

When paired with the global CDN and smart caching layer from ostr.io, spiderable-middleware significantly improves SEO scores, link previews, and performance metrics like TTFB and Lighthouse scores — all without requiring changes to existing codebase. It intelligently reroutes bot traffic to ostr.io rendering endpoints, minimizing server load, reducing database queries, and lowering infrastructure costs.

Why Pre-render?

  • 🕸 Execute JavaScript, — get rendered HTML page and its content;
  • 🏎️ Improve delivery for dynamic and static pages via our advanced CDN and caching;
  • 🏃‍♂️ Boost response rate and decrease response time with caching;
  • 🚀 Optimized HTML markup for the best SEO score;
  • 🎛️ Improve TTFB, LCP, INP, CLS, and other LightHouse metrics positively enhancing overall SEO score;
  • 🖥 Supports PWAs and SPAs;
  • 📱 Supports mobile-like crawlers;
  • 💅 Supports styled-components;
  • ⚡️ Supports AMP (Accelerated Mobile Pages);
  • 🤓 Works with Content-Security-Policy and other complex front-end security rules;
  • 📦 This package shipped with types and TS examples;
  • ❤️ Search engines and social network crawlers love straightforward and pre-rendered pages;
  • 📱 Consistent link previews in messaging apps, like iMessage, Messages, Facebook, Slack, Telegram, WhatsApp, Viber, VK, Twitter, and other apps;
  • 💻 Image, title, and description previews for links posted at social networks, like Facebook, X/Twitter, Instagram, and other social networks.

ToC

About Package

This package works as Express-style middleware, intercepting requests from crawlers and social media bots to your Node.js application. It seamlessly proxies those requests to a pre-rendering service, which returns fully-rendered static HTML — optimized for indexing and rich previews.

Built with developers in mind, spiderable-middleware is lightweight, well-structured, and easy to customize. Whether you're scaling a production app or prototyping, it's designed to be hackable and flexible enough to fit any project. By offloading bot traffic to the pre-rendering engine, it helps reduce backend load and improve server performance effortlessly.

[!NOTE] This package proxies real HTTP headers and response code, to reduce overwhelming requests, try to avoid HTTP-redirect headers, like Location . Read how to return genuine status code and handle JS-redirects

[!IMPORTANT] This is server only package. This package should be imported/initialized only within server codebase

This middleware was tested and works like a charm with:

All other frameworks that follows Node/Express middleware convention - will work too.

[!TIP] This package was originally developed for ostr.io service. But it's not limited to, and can proxy-pass requests to any other rendering-endpoint.

Installation

Install spiderable-middleware package from NPM:

# using npmjs
npm install spiderable-middleware --save

# using yarn
yarn add spiderable-middleware

Usage

Setup pre-rendering middleware in few lines of code

Basic usage

Start with adding fragment meta-tag to HTML template, head, or page:

[!TIP] See usage examples in .js and .ts for quick copy-paste experience

1<html>
2  <head>
3    <meta name="fragment" content="!">
4    <!-- ... -->
5  </head>
6  <body>
7    <!-- ... -->
8  </body>
9</html>

Import or require spiderable-middleware package:

1// ES6 import
2import Spiderable from 'spiderable-middleware';
3
4// or CommonJS require
5const Spiderable = require('spiderable-middleware');

Register middleware handle:

1import express from 'express';
2import Spiderable from 'spiderable-middleware';
3
4const spiderable = new Spiderable({
5  rootURL: 'http://example.com',
6  auth: 'test:test',
7});
8
9const app = express();
10// ensure this is the most top registered handle
11// to reduce response time and server load
12app.use(spiderable.handle).get('/', (req, res) => {
13  res.send('Hello World');
14});
15
16app.listen(3000);

[!TIP] We provide various options for serviceURL as "Rendering Endpoints", each endpoint has its own features to fit different project needs

Return genuine status code

To pass expected status code of a response from front-end JavaScript framework to browser/crawlers use specially formatted HTML-comment. This comment can be placed in any part of HTML-page. head or body tag is the best place for it.

Format

html:

1<!-- response:status-code=404 -->

jade:

// response:status-code=404

This package support any standard and custom status codes:

  • 201 - <!-- response:status-code=201 -->
  • 401 - <!-- response:status-code=401 -->
  • 403 - <!-- response:status-code=403 -->
  • 500 - <!-- response:status-code=500 -->
  • 514 - <!-- response:status-code=514 --> (non-standard)

Speed-up rendering

To speed-up rendering, JS-runtime should tell to the Spiderable engine when the page is ready. Set window.IS_RENDERED to false, and once the page is ready set this variable to true. Example:

1<html>
2  <head>
3    <meta name="fragment" content="!">
4    <script>
5      window.IS_RENDERED = false;
6    </script>
7  </head>
8  <body>
9    <!-- ... -->
10    <script type="text/javascript">
11      //Somewhere deep in app-code:
12      window.IS_RENDERED = true;
13    </script>
14  </body>
15</html>

Detect request from Pre-rendering engine during runtime

Pre-rendering engine will set window.IS_PRERENDERING global variable to true. Detecting requests from pre-rendering engine are as easy as:

1if (window.IS_PRERENDERING) {
2  // This request is coming from Pre-rendering engine
3}

[!NOTE] window.IS_PRERENDERING can be undefined on initial page load, and may change during runtime. That's why we recommend to pre-define a setter for IS_PRERENDERING:

1let isPrerendering = false;
2Object.defineProperty(window, 'IS_PRERENDERING', {
3  set(val) {
4    isPrerendering = val;
5    if (isPrerendering === true) {
6      // This request is coming from Pre-rendering engine
7    }
8  },
9  get() {
10    return isPrerendering;
11  }
12});

Detect type of the Pre-rendering engine

Like browsers, — crawlers and bots may request page as "mobile" (small screen touch-devices) or as "desktop" (large screens without touch-events) the pre-rendering engine supports these two types. For cases when content needs to get optimized for different screens pre-rendering engine will set window.IS_PRERENDERING_TYPE global variable to desktop or mobile

1if (window.IS_PRERENDERING_TYPE === 'mobile') {
2  // This request is coming from "mobile" web crawler and "mobile" pre-rendering engine
3} else if (window.IS_PRERENDERING_TYPE === 'desktop') {
4  // This request is coming from "desktop" web crawler and "desktop" pre-rendering engine
5} else {
6  // This request is coming from user
7}

JavaScript redirects

Redirect browser/crawler inside application when needed while a page is loading (imitate navigation), use any of classic JS-redirects can be used, including framework's navigation, or History.pushState()

1window.location.href = 'http://example.com/another/page';
2window.location.replace('http://example.com/another/page');
3
4Router.go('/another/page'); // framework's navigation !pseudo code

[!IMPORTANT] Only 4 redirects are allowed during one request after 4 redirects session will be terminated.

API

Create new instance and pass middleware to server's routes chain;

Constructor

1new Spiderable(opts?: SpiderableOptions);
  • opts {SpiderableOptions?} - [Optional] Configuration options
  • opts.serviceURL {string} - Valid URL to Spiderable endpoint (local or foreign). Default: https://render.ostr.io. Can be set via environment variables: SPIDERABLE_SERVICE_URL or PRERENDER_SERVICE_URL
  • opts.rootURL {string} - Valid root URL of a website. Can be set via an environment variable: ROOT_URL
  • opts.auth {string} - Auth string in next format: user:pass. Can be set via an environment variables: SPIDERABLE_SERVICE_AUTH or PRERENDER_SERVICE_AUTH. Default null
  • opts.sanitizeUrls {boolean} - Sanitize URLs in order to "fix" badly composed URLs. Default false
  • opts.botsUA {string[]} - An array of strings (case insensitive) with additional User-Agent names of crawlers that needs to get intercepted. See default bot's names. Set to ['.*'] to match all browsers and robots, to serve static pages to all users/visitors
  • opts.ignoredHeaders {string[]} - An array of strings (case insensitive) with HTTP header names to exclude from response. See default list of ignored headers. Set to ['.*'] to ignore all headers
  • opts.ignore {string[]} - An array of strings (case sensitive) with ignored routes. Note: it's based on first match, so route /users will cause ignoring of /part/users/part, /users/_id and /list/of/users, but not /user/_id or /list/of/blocked-users. Default null
  • opts.only {(String|RegExp)[]} - An array of strings (case sensitive) or regular expressions (could be mixed). Define exclusive route rules for pre-rendering. Could be used with opts.onlyRE rules. Note: To define "safe" rules as {RegExp} it should start with ^ and end with $ symbols, examples: [/^\/articles\/?$/, /^\/article\/[A-z0-9]{16}\/?$/]
  • opts.onlyRE {RegExp} - Regular Expression with exclusive route rules for pre-rendering. Could be used with opts.only rules
  • opts.timeout {number} - Number, proxy-request timeout to rendering endpoint in milliseconds. Default: 180000
  • opts.requestOptions {RequestOptions} - Options for request module (like: timeout, lookup, insecureHTTPParser), for all available options see http API docs
  • opts.debug {boolean} - [Optional] Enable debug and extra logging, default: false

[!IMPORTANT] Setting .onlyRE and/or .only rules are highly recommended. Otherwise, all routes, including randomly generated by bots will be subject of Pre-rendering and may cause unexpectedly higher usage.

1// CommonJS
2// const Spiderable = require('spiderable-middleware');
3
4// ES6 import
5// import Spiderable from 'spiderable-middleware';
6
7const spiderable = new Spiderable({
8  rootURL: 'http://example.com',
9  auth: 'test:test'
10});
11
12// More complex setup (recommended):
13const spiderable = new Spiderable({
14  rootURL: 'http://example.com',
15  serviceURL: 'https://render.ostr.io',
16  auth: 'test:test',
17  only: [
18    /\/?/, // Root of the website
19    /^\/posts\/?$/, // "/posts" path with(out) trailing slash
20    /^\/post\/[A-z0-9]{16}\/?$/ // "/post/:id" path with(out) trailing slash
21  ],
22  // [Optionally] force ignore for secret paths:
23  ignore: [
24    '/account/', // Ignore all routes under "/account/*" path
25    '/billing/' // Ignore all routes under "/billing/*" path
26  ]
27});

Configuration via env.vars

Same configuration can get achieved via setting up environment variables:

ROOT_URL='http://example.com'
SPIDERABLE_SERVICE_URL='https://render.ostr.io'
SPIDERABLE_SERVICE_AUTH='APIUser:APIPass'

alternatively, when migrating from other pre-rendering service — keep using existing variables, we support the next ones for compatibility:

ROOT_URL='http://example.com'
PRERENDER_SERVICE_URL='https://render.ostr.io'
PRERENDER_SERVICE_AUTH='APIUser:APIPass'

handle

Middleware handle

1const spiderable = new Spiderable();
2spiderable.handle(req: IncomingMessage, res: ServerResponse, next: NextFunction): void;
3
4// Alias that returns {boolean} 
5// true — if prerendering takes over the request
6spiderable.handler(req: IncomingMessage, res: ServerResponse, next: NextFunction): boolean;

Example using connect and express package:

1import { createServer } from 'node:http';
2import Spiderable from 'spiderable-middleware';
3
4const app = express();
5// const app = connect();
6const spiderable = new Spiderable();
7
8app.use(spiderable.handle).use((_req, res) => {
9  res.end('Hello from Connect!\n');
10});
11
12createServer(app).listen(3000;

Example using node.js http server:

1import { createServer } from 'node:http';
2import Spiderable from 'spiderable-middleware';
3
4// HTTP(s) Server
5http.createServer((req, res) => {
6  spiderable.handle(req, res, () => {
7    // Callback, triggered if this request
8    // is not a subject of spiderable pre-rendering
9    res.writeHead(200, {'Content-Type': 'text/plain; charset=UTF-8'});
10    res.end('Hello vanilla NodeJS!');
11    // Or do something else ...
12  });
13}).listen(3000);

Types

Import types right from NPM package

1import Spiderable from 'spiderable-middleware';
2import type { SpiderableOptions, NextFunction } from 'spiderable-middleware';
3
4const options: SpiderableOptions = {
5  rootURL: 'http://example.com',
6  auth: 'test:test',
7  debug: false,
8  /* ..and other options.. */
9};
10expectType<SpiderableOptions>(options);
11
12const spiderable = new Spiderable(options);
13expectType<Spiderable>(spiderable);
14
15const next: NextFunction = (_err?: unknown): void => {};
16expectType<void>(spiderable.handle(req, res, next));

AMP Support

To properly serve pages for Accelerated Mobile Pages (AMP) we support following URI schemes:

# Regular URIs:
https://example.com/index.html
https://example.com/articles/article-title.html
https://example.com/articles/article-uniq-id/article-slug

# AMP optimized URIs (prefix):
https://example.com/amp/index.html
https://example.com/amp/articles/article-title.html
https://example.com/amp/articles/article-uniq-id/article-slug

# AMP optimized URIs (extension):
https://example.com/amp/index.amp.html
https://example.com/amp/articles/article-title.amp.html

[!IMPORTANT] All URLs with .amp. extension and /amp/ prefix will be optimized for AMP.

Rendering Endpoints

  • render (default) - https://render.ostr.io - This endpoint has "optimal" settings, and should fit 98% cases. This endpoint respects cache headers of Crawler and origin server
  • render-bypass (devel/debug) - https://render-bypass.ostr.io - This endpoint will bypass caching mechanisms. Use it when experiencing an issue, or during development, to make sure responses are not cached. It's safe to use this endpoint in production, but it may result in higher usage and response time
  • render-cache (under attack) - https://render-cache.ostr.io - This endpoint has the most aggressive caching mechanism. Use it to achieve the shortest response time, and when outdated pages (for 6-12 hours) are acceptable

To change default endpoint, grab integration examples code and replace render.ostr.io, with endpoint from the list above. For NPM integration change value of serviceURL option.

Note: Described differences in caching behavior related to intermediate proxy caching, Cache-Control header will be always set to the value defined in "Cache TTL". Cached results at the "Pre-rendering Engine" end can be purged at any time.

Convert dynamic website to static

spiderable-middleware package can get used to convert dynamic websites to rendered, cached, and lightweight static pages. Simply set botsUA to ['.*'] to achieve this behavior

1import Spiderable from 'spiderable-middleware';
2
3const spiderable = new Spiderable({
4  botsUA: ['.*']
5  /* ... other options ...*/
6});

Debugging

Pass { debug: true } or set DEBUG=true environment variable to enable debugging mode.

[!TIP] To make sure a server can reach a rendering endpoint run cURL command or send request via Node.js to (replace example.com with your domain name):

# cURL example:
curl -v "https://test:test@render-bypass.ostr.io/?url=http://example.com"

In this example we're using render-bypass.ostr.io endpoint to avoid any possible cached results, read more about rendering endpoints. As API credentials we're using test:test, this part of URL can be replaced with auth option from Node.js example.

[!TIP] The API credentials and instructions can be found at the very bottom of Pre-rendering Panel, — click on the name of your website, then on Integration Guide at the bottom of the page

1// Node.js example:
2const https = require('https');
3
4https.get('https://test:test@render-bypass.ostr.io/?url=http://example.com', (resp) => {
5  let data = '';
6
7  resp.on('data', (chunk) => {
8    data += chunk.toString('utf8');
9  });
10
11  resp.on('end', () => {
12    console.log(data);
13  });
14}).on('error', (error) => {
15  console.error(error);
16});

Running Tests

  1. Clone this package
  2. In Terminal (Console) go to directory where package was cloned
  3. Then run:

Node.js/Mocha

# Install development NPM dependencies:
npm install --save-dev
# Install NPM dependencies:
npm install --save

# Link package to itself
npm link
npm link spiderable-middleware

# Run tests:
ROOT_URL=http://127.0.0.1:3003 npm test

# Run same tests with extra-logging
DEBUG=true ROOT_URL=http://127.0.0.1:3003 npm test
# http://127.0.0.1:3003 can be changed to any local address, PORT is required!