28 September 2011

Drupal bootstrap early page cache

The previous article in this series focused on phase 1 of the Drupal bootstrap process (DRUPAL_BOOTSTRAP_CONFIGURATION). This article will now focus on the second phase - DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE.

The early page cache phase is one that might not accomplish much for some Drupal sites, but for some others can play an important role in improving a site's overall performance by minimizing latency when delivering content to a user's browser. This is accomplished by caching already rendered pages for anonymous users (i.e. haven't logged in). This phase is run early in the bootstrap process as it doesn't rely on any later bootstrap phases (used to gather content and render pages) and its purpose is to keep latency to a minimum - in other words the sooner it executes, the faster a user receives the requested content.

There are two forms of caching with Drupal: non-database and database. The early page cache relies on non-database mechanisms, such as file system and in-memory caching. Memcached (http://memcached.org) is one example of an in-memory cache. It stores data - in our case rendered pages - using key-value pairs in a hash table kept in memory. I've not had the opportunity to use this type of caching mechanism with Drupal, and for some this isn't even an option. The reason for this is that shared hosting sites would need to have installed Memcached servers and required PHP extension in their shared environment. For security reasons this is not usually done, and thus this option is all but limited to dedicated hosts or virtual private servers (VPS).

For my analysis, I ended trying a file system-based mechanism called faspath_fscache. This Drupal modules uses the server's file system to cache rendered pages. Using the file system can be faster than a database - especially on a very loaded server in a shared environment.

In order to use this module (http://drupal.org/project/fastpath_fscache), you need to download and install it as you would with any other module. (Note* You need to manually add configuration parameters in the settings.php file, since we cannot rely on the database to store configuration as database is only bootstrapped later. It would be nice to use the drupal_rewrite_settings() function in install.inc instead of relying on users to edit the file by hand, however.) EDIT: Unfortunately the drupal_rewrite_settings() function is limited to constants and string type variables, and is really only intended to be used during the initial setup of Drupal. So the only option is to modify the file by hand.

The configuration parameters set the page_cache_fastpath flag, specify the path location of the module's implementation of the caching interface (e.g. cache_set() and cache_get() functions), and specify a path for the file cache.

With the module installed and configured, the DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE will now be able to perform some meaningful function. It will first begin by including the cache implementation file (as specified earlier in settings.php). It will then test to see if page_cache_fastpath flag has been set and if so test the result returned by page_cache_fastpath() function (implemented by the module). If both are true, the page is served to the user; otherwise, bootstrapping continues to later phases. The page_cache_fastpath() function simply checks to see that a form was not submitted or that the user was not logged in. In either case, a cached page should not be served. It will then check to see if the page being requested is currently stored in the file system, and will serve it if present.

In my limited time using early page cache, I did see a noticeable improvement (70%) in page loading time when compared against non-cached. I ran tests using Proxy Sniffer (http://www.proxy-sniffer.com/) on my local development environment using faspath_fscache and page load dropped from 3.5 sec average to approximately 1 sec. That may not seem like much to some, but in this day and age information delivery is paramount and users don't like to wait longer than they have to in order to see the content they've requested. I also ran some tests using Drupal's out-of-box database caching mechanism, and it actually performed slightly better that the file system caching. This test was also done on my development environment, but it would be interesting to test on a larger system that is hosting several sites to see if the results are comparable. In any event, if your site's performance is important you should definitely consider some form of caching to improve page loading time for your users. Until next time, keep IT simple.

The next article will focus on the DRUPAL_BOOTSTRAP_DATABASE phase. Stay tuned!

Drupal 6 Bootstrap Process

When investigating a recent rash of spam user registration on my site I began looking through Drupal core to gain a better understanding of its inner workings and how modules could be used to guard against these nuisances. Like every journey there is a beginning, and for me it all started with index.php.

This is not a very big file, but it plays a vital role because access to content on any Drupal site (via HTTP GET or POST commands) is all directed to this one file. It starts off simply enough by including a bootstrap PHP file (bootstrap.inc). This file in turn defines several constants referenced throughout the Drupal code base as well as several core utility functions. Then magic begins The drupal-bootstrap function is called with the constant DRUPAL_BOOTSTRAP_FULL as argument (the function and constant previously defined by the afore-mentioend include file). This function serves to execute a series of phases in order to process all requests coming to the server. How many phases are executed depends on the argument given to the function. In the case of the index.php, the highest phase is used. The drupal_bootstrap function executes every phase in turn from low to high until it reaches the one it was invoked with. The order and names of these phases (as defined by the constants) are shown below.

This series of articles will focus on each of these phases and attempt to explain what's going on in more detail.


This first phase is fairly straightfoward and as its name suggests it will initialize system configuration. However, before doing so it calls drupal_unset_globals() to unset all disallowed global variables. This is essentially everything except the superglobals _ENV, _GET, _POST, _COOKIE, _FILES, _SERVER, _REQUEST, and _GLOBALS. Starting from PHP 4.2.0 the automatic registration of globals was deprecated as it was deemed to cause potential vulnerabilities.

The next step in the phase is to create a timer to track the start of the processing for the page of content being requested.

The final step is the invocation of the conf_init() function, which is used to load the system configuration and set various global variables (eg. base URL, database URL, path, etc...). The following are the steps taken by the conf_init() function:

  • validate the HTTP_HOST header sent by the client setting it to '' (empty string) if older browser (pre HTTP/1.1) did not send it
  • checks for existence of settings.php file and if present includes it. It follows a specific sequence of directory traversal using both the HTTP HOST provided and the script's path to locate settings.php file. (eg. directory /sites/www.example.com would be searched before /sites/example.com). The Drupal API http://api.drupal.org/api/function/conf_path/6 has a more detailed example. If the initial directory traversal doesn't yield any results, it will default to using the /sites/default path
  • sets the global database url variable $db_url (this will be required by subsequent bootstrap phases that require database access)
  • sets the global base url variable $base_url (defined in settings.php or created automatically from _SERVER array)
  • sets the global session name variable and cookie domain variable, $session_name and $cookie_domain respectively
  • calls session_name() to set session name called 'SESS' with the md5 hash of the $session_name variable appended to it

The next article will focus on the DRUPAL BOOTSTRAP EARLY PAGE CACHE phase