Spiderable middleware
Google, Facebook, Twitter, Yahoo, and Bing and all other crawlers and search engines are constantly trying to view your website. If your website is built on top of the JavaScript framework like, but not limited to - Angular, Backbone, Ember, Meteor, React, MEAN most of the front-end solutions returns basic HTML-markup and script-tags to crawlers, but not content of your page. The mission of spiderable-middleware
and ostr.io are to boost your SEO experience without a headache.
Why Pre-render?
- 🕸 Execute JavaScript, as web-crawlers and search engines can't run JS code;
- 🏃♂️ Boost response rate and decrease response time with caching;
- 🚀 Optimized HTML markup for best SEO score;
- 🖥 Support for PWAs and SPAs;
- 📱 Support for mobile-like crawlers;
- 💅 Support
styled-components
; - ⚡️ Support AMP (Accelerated Mobile Pages);
- 🤓 Works with
Content-Security-Policy
and other "complicated" front-end security; - ❤️ Search engines and social network crawlers love straightforward and pre-rendered pages;
- 📱 Consistent link previews in messaging apps, like iMessage, Messages, Facebook, Slack, Telegram, WhatsApp, Viber, VK, Twitter, etc.;
- 💻 Image, title, and description previews for posted links at social networks, like Facebook, Twitter, VK and others.
About Package
This package acts as middleware and intercepts requests to your Node.js application from web crawlers. All requests proxy passed to the Prerendering Service, which returns static, rendered HTML.
This is SERVER only package. For NPM make sure you're importing library only in Node.js. For Meteor.js please make sure library imported and executed only on SERVER.
We made this package with developers in mind. It's well written in a very simple way, hackable, and easily tunable to meet your needs, can be used to turn dynamic pages into rendered, cached, and lightweight static pages, just set botsUA
to ['.*']
. This is the best option to offload servers unless a website gets updated more than once in 4 hours.
- Note: This package proxies real HTTP headers and response code, to reduce overwhelming requests, try to avoid HTTP-redirect headers, like
Location
and others. Read how to return genuine status code and handle JS-redirects. - Note: This is server only package. For isomorphic environments, like Meteor.js, this package should be imported/initialized only at server code base.
This middleware was tested and works like a charm with:
All other frameworks which follow Node's middleware convention - will work too.
This package was originally developed for ostr.io service. But it's not limited to, and can proxy-pass requests to any other rendering-endpoint.
ToC
- Installation
- Basic usage
- MeteorJS usage
- Return genuine status code
- Speed-up rendering
- Detect request from Prerendering engine during runtime
- Detect request from Prerendering engine during Meteor/Blaze runtime
- JavaScript redirects
- AMP Support
- Rendering Endpoints
- API
- Debugging
- Running Tests
Installation
This package is distributed via NPM for Node.js and Atmosphere for Meteor.js. Although it is safe to use NPM distributed version in Meteor backend.
NPM:
npm install spiderable-middleware --save
Meteor:
meteor add webapp meteor add ostrio:spiderable-middleware
Usage
Get ready in a few lines of code
Basic usage
See all examples.
First, add fragment
meta-tag to your HTML template:
1<html> 2 <head> 3 <meta name="fragment" content="!"> 4 <!-- ... --> 5 </head> 6 <body> 7 <!-- ... --> 8 </body> 9</html>
1// Make sure this code isn't exported to the Browser bundle 2// and executed only on SERVER (Node.js) 3const express = require('express'); 4const app = express(); 5const Spiderable = require('spiderable-middleware'); 6const spiderable = new Spiderable({ 7 rootURL: 'http://example.com', 8 serviceURL: 'https://render.ostr.io', 9 auth: 'APIUser:APIPass' 10}); 11 12app.use(spiderable.handler).get('/', (req, res) => { 13 res.send('Hello World'); 14}); 15 16app.listen(3000);
We provide various options for serviceURL
as "Rendering Endpoints", each endpoint has its own features to fit every project needs.
Meteor specific usage
1// Install necessary packages: 2// meteor add webapp 3// meteor add ostrio:spiderable-middleware 4 5// Make sure this code executed only on SERVER 6// Use `if (Meteor.isServer) {/*...*/}` blocks 7// or place this code under `/server/` directory 8import { WebApp } from 'meteor/webapp'; 9import Spiderable from 'meteor/ostrio:spiderable-middleware'; 10 11WebApp.connectHandlers.use(new Spiderable({ 12 rootURL: 'http://example.com', 13 serviceURL: 'https://render.ostr.io', 14 auth: 'APIUser:APIPass' 15}));
Return genuine status code
To pass expected response code from front-end JavaScript framework to browser/crawlers, you need to create specially formatted HTML-comment. This comment can be placed in any part of HTML-page. head
or body
tag is the best place for it.
Format
html:
1<!-- response:status-code=404 -->
jade:
// response:status-code=404
This package support any standard and custom status codes:
201
-<!-- response:status-code=201 -->
401
-<!-- response:status-code=401 -->
403
-<!-- response:status-code=403 -->
500
-<!-- response:status-code=500 -->
514
-<!-- response:status-code=514 -->
(non-standard)
Note: Reserved status codes for internal service communications: 49[0-9]
.
Speed-up rendering
To speed-up rendering, you should tell to the Spiderable engine when your page is ready. Set window.IS_RENDERED
to false
, and once your page is ready set this variable to true
. Example:
1<html> 2 <head> 3 <meta name="fragment" content="!"> 4 <script> 5 window.IS_RENDERED = false; 6 </script> 7 </head> 8 <body> 9 <!-- ... --> 10 <script type="text/javascript"> 11 //Somewhere deep in your app-code: 12 window.IS_RENDERED = true; 13 </script> 14 </body> 15</html>
Detect request from Pre-rendering engine during runtime
Pre-rendering engine will set window.IS_PRERENDERING
global variable to true
. Detecting requests from pre-rendering engine are as easy as:
1if (window.IS_PRERENDERING) { 2 // This request is coming from Pre-rendering engine 3}
Note: window.IS_PRERENDERING
can be undefined
on initial page load, and may change during runtime. That's why we recommend to pre-define a setter for IS_PRERENDERING
:
1let isPrerendering = false; 2Object.defineProperty(window, 'IS_PRERENDERING', { 3 set(val) { 4 isPrerendering = val; 5 if (isPrerendering === true) { 6 // This request is coming from Pre-rendering engine 7 } 8 }, 9 get() { 10 return isPrerendering; 11 } 12});
Detect request from Pre-rendering engine in Meteor.js
Pre-rendering engine will set window.IS_PRERENDERING
global variable to true
. As in Meteor/Blaze everything should be reactive, let's bound it with ReactiveVar
:
1import { Template } from 'meteor/templating'; 2import { ReactiveVar } from 'meteor/reactive-var'; 3 4const isPrerendering = new ReactiveVar(window.IS_PRERENDERING || false); 5Object.defineProperty(window, 'IS_PRERENDERING', { 6 set(val) { 7 isPrerendering.set(val); 8 }, 9 get() { 10 return isPrerendering.get(); 11 } 12}); 13 14// Make globally available Blaze helper, 15// Feel free to omit this line in case if you're not using Blaze 16// or going to handle logic in JavaScript part 17Template.registerHelper('IS_PRERENDERING', () => isPrerendering.get());
Note: window.IS_PRERENDERING
can be undefined
on initial page load, and may change during runtime.
JavaScript redirects
If you need to redirect browser/crawler inside your application, while a page is loading (imitate navigation), you're free to use any of classic JS-redirects as well as your framework's navigation, or History.pushState()
1window.location.href = 'http://example.com/another/page'; 2window.location.replace('http://example.com/another/page'); 3 4Router.go('/another/page'); // framework's navigation !pseudo code
Note: Only 4 redirects are allowed during one request after 4 redirects session will be terminated.
API
Constructor new Spiderable([opts])
opts
{Object} - Configuration optionsopts.serviceURL
{String} - Valid URL to Spiderable endpoint (local or foreign). Default:https://render.ostr.io
. Can be set via environment variables:SPIDERABLE_SERVICE_URL
orPRERENDER_SERVICE_URL
opts.rootURL
{String} - Valid root URL of your website. Can be set via an environment variable:ROOT_URL
(common for meteor)opts.auth
{String} - [Optional] Auth string in next format:user:pass
. Can be set via an environment variables:SPIDERABLE_SERVICE_AUTH
orPRERENDER_SERVICE_AUTH
. Defaultnull
opts.botsUA
{[String]} - [Optional] An array of strings (case insensitive) with additional User-Agent names of crawlers you would like to intercept. See default bot's names. Set to['.*']
to match all browsers and robots, to serve static pages to all users/visitorsopts.ignoredHeaders
{[String]} - [Optional] An array of strings (case insensitive) with HTTP header names to exclude from response. See default list of ignored headers. Set to['.*']
to ignore all headersopts.ignore
{[String]} - [Optional] An array of strings (case sensitive) with ignored routes. Note: it's based on first match, so route/users
will cause ignoring of/part/users/part
,/users/_id
and/list/of/users
, but not/user/_id
or/list/of/blocked-users
. Defaultnull
opts.only
{[String|RegExp]} - [Optional] An array of strings (case sensitive) or regular expressions (could be mixed). Define exclusive route rules for pre-rendering. Could be used withopts.onlyRE
rules. Note: To define "safe" rules as {RegExp} it should start with^
and end with$
symbols, examples:[/^\/articles\/?$/, /^\/article\/[A-z0-9]{16}\/?$/]
opts.onlyRE
{RegExp} - [Optional] Regular Expression with exclusive route rules for pre-rendering. Could be used withopts.only
rulesopts.timeout
{Number} - [Optional] Number, proxy-request timeout to rendering endpoint in milliseconds. Default:180000
opts.requestOptions
{Object} - [Optional] Options for request module (like:timeout
,debug
,proxy
), for all available options seerequest-libcurl
API docsopts.debug
{Boolean} - [Optional] Enable debug and extra logging, default:false
Note: Setting .onlyRE
and/or .only
rules are highly recommended. Otherwise, all routes, including randomly generated by bots will be subject of Pre-rendering and may cause unexpectedly higher usage.
1// CommonJS 2// const Spiderable = require('spiderable-middleware'); 3 4// ES6 import 5// import Spiderable from 'spiderable-middleware'; 6 7// ES6 import (Meteor.js) 8// import Spiderable from 'meteor/ostrio:spiderable-middleware'; 9 10const spiderable = new Spiderable({ 11 rootURL: 'http://example.com', 12 serviceURL: 'https://render.ostr.io', 13 auth: 'APIUser:APIPass' 14}); 15 16// More complex setup (recommended): 17const spiderable = new Spiderable({ 18 rootURL: 'http://example.com', 19 serviceURL: 'https://render.ostr.io', 20 auth: 'APIUser:APIPass', 21 only: [ 22 /\/?/, // Root of the website 23 /^\/posts\/?$/, // "/posts" path with(out) trailing slash 24 /^\/post\/[A-z0-9]{16}\/?$/ // "/post/:id" path with(out) trailing slash 25 ], 26 // [Optionally] force ignore for secret paths: 27 ignore: [ 28 '/account/', // Ignore all routes under "/account/*" path 29 '/billing/' // Ignore all routes under "/billing/*" path 30 ] 31});
spiderable.handler(req, res, next)
Middleware handler. Alias: spiderable.handle
.
1// Express, Connect: 2app.use(spiderable.handler); 3 4// Meteor: 5WebApp.connectHandlers.use(spiderable); 6 7//HTTP(s) Server 8http.createServer((req, res) => { 9 spiderable.handler(req, res, () => { 10 // Callback, triggered if this request 11 // is not a subject of spiderable pre-rendering 12 res.writeHead(200, {'Content-Type': 'text/plain; charset=UTF-8'}); 13 res.end('Hello vanilla NodeJS!'); 14 // Or do something else ... 15 }); 16}).listen(3000);
AMP Support
To properly serve pages for Accelerated Mobile Pages (AMP) we support following URI schemes:
# Regular URIs: https://example.com/index.html https://example.com/articles/article-title.html https://example.com/articles/article-uniq-id/article-slug # AMP optimized URIs (prefix): https://example.com/amp/index.html https://example.com/amp/articles/article-title.html https://example.com/amp/articles/article-uniq-id/article-slug # AMP optimized URIs (extension): https://example.com/amp/index.amp.html https://example.com/amp/articles/article-title.amp.html
All URLs with .amp.
extension and /amp/
prefix will be optimized for AMP.
Rendering Endpoints
- render (default) -
https://render.ostr.io
- This endpoint has "optimal" settings, and should fit 98% cases. Respects cache headers of both Crawler and your server; - render-bypass (devel/debug) -
https://render-bypass.ostr.io
- This endpoint has bypass caching mechanism (almost no cache). Use it if you're experiencing an issue, or during development, to make sure you're not running into the intermediate cache. You're safe to use this endpoint in production, but it may result in higher usage and response time. - render-cache (under attack) -
https://render-cache.ostr.io
- This endpoint has the most aggressive caching mechanism. Use it if you're looking for the shortest response time, and don't really care about outdated pages in cache for 6-12 hours
To change default endpoint, grab integration examples code and replace render.ostr.io
, with endpoint of your choice. For NPM/Meteor integration change value of serviceURL
option.
Note: Described differences in caching behavior related to intermediate proxy caching, Cache-Control
header will be always set to the value defined in "Cache TTL". Cached results at the "Pre-rendering Engine" end can be purged at any time.
Debugging
To make sure a server can reach our rendering endpoint run cURL command or send request via Node.js to (replace example.com with your domain name) https://test:test@render-bypass.ostr.io/?url=http://example.com
.
In this example we're using render-bypass.ostr.io
endpoint to avoid any possible cached results, read more about rendering endpoints. As API credentials we're using test:test
, this part of URL can be replaced with auth
option from Node.js example. Your API credentials and instructions can be found at the very bottom of Pre-rendering Panel of a host, click on "Integration Guide".
# cURL example: curl -v "https://test:test@render-bypass.ostr.io/?url=http://example.com"
1// Node.js example: 2const https = require('https'); 3 4https.get('https://test:test@render-bypass.ostr.io/?url=http://example.com', (resp) => { 5 let data = ''; 6 7 resp.on('data', (chunk) => { 8 data += chunk.toString('utf8'); 9 }); 10 11 resp.on('end', () => { 12 console.log(data); 13 }); 14}).on('error', (error) => { 15 console.error(error); 16});
Running Tests
- Clone this package
- In Terminal (Console) go to directory where package was cloned
- Then run:
Node.js/Mocha
# Install development NPM dependencies: npm install --save-dev # Install NPM dependencies: npm install --save # Run tests: ROOT_URL=http://127.0.0.1:3003 npm test # Run same tests with extra-logging DEBUG=true ROOT_URL=http://127.0.0.1:3003 npm test # http://127.0.0.1:3003 can be changed to any local address, PORT is required!
Meteor/Tinytest
meteor test-packages ./ --port 3003 # Run same tests with extra-logging DEBUG=true meteor test-packages ./ --port 3003 # PORT is required, and can be changed to any local open port
Get $50 off pre-rendering service
Get $50 off the second purchase, use this link to sign up. Valid only for new users.