DWiki has optional caching in order to speed up generating results repeatedly. DWiki uses a disk-based cache for this (although the interface is abstracted and alternate forms of caching may be introduced someday). There are three caches, which can be enabled separately: the renderer cache, a brute force page cache, and an in-memory brute force page cache that is only used if DWiki is running as a preforking SCGI server.
DWiki never removes the files of out of date cache entries from the
disk cache; instead, it stops considering out of date ones to be
valid. Cleaning out the detritus is left for an external process.
ChrisSiebenmann considers this safer; giving a program an automated
unlink() makes him nervous.
See ConfigurationFile for the options controlling the behavior of the caches.
In theory DWiki's caching is optional. In practice a decent sized DWiki is simply too slow without caching for some of the more expensive operations and caching becomes more or less a necessity. ChrisSiebenmann now believes that you should configure all levels of caching in basically any DWiki unless you have some unusual need and are sure.
The brute force page cache is about as simple as you can get: it caches complete requests for a configured time (called a time-to-live, or TTL). That's it. The BFC is intended as a load-shedding measure when DWiki is under significant load, so it only acts under certain circumstances:
(For speed, when something is valid in the cache DWiki just serves it without checking the system load.)
A good BFC TTL is on the order of 30 seconds to three minutes or so; long enough to shed significant load if you are getting a lot of hits to a few pages and short enough that dynamic pages won't become too outdated. (And that waiting to see a comment show up or whatever is not too annoying.)
Because Atom syndication requests are among the most expensive pages to
compute, the BFC can be set to give them a longer TTL than usual. There
is a second TTL that can be set for Atom requests that aren't using
GET; the idea is that if requesters cannot be bothered to
be polite, we can't be bothered to serve fresh content. Setting this
option always caches the results of such requests, even if the load is
low, which means that even people doing proper conditional GET requests
will use the cached results for as long as their (lower) TTL says to.
It's actually faster to serve static pages from the static page server code than from the BFC, so the BFC doesn't try to cache static pages.
It's important to understand that the BFC does not check load when it is checking to see if something is in its cache. This means there are two stages to processing a request: deciding what TTL to use for cache checks, and deciding whether to cache something that was not current in the cache.
The TTL used is:
bfc-atom-nocond-ttl if this is an unconditional request for an Atom
view, if set.
bfc-atom-ttl for Atom view requests in general, if set.
Pages enter the BFC cache either because the system seems to be
loaded or because
bfc-atom-nocond-ttl was set and they were an
unconditional request for an Atom view.
Once something is in the cache, it will be served from the cache if it is not older than the check TTL. Different requests can use different check TTLs for the same cached page; for example, conditional GETs versus other requests for Atom views.
The in-memory cache is essentially a version of the brute force cache that holds pages in memory instead of on disk. It's only effective in environments where DWiki serves multiple requests from the same process; currently it's only used if DWiki is running as a preforking SCGI server. Because it holds pages in memory as page response objects, the in-memory cache is about the fastest way that DWiki can serve requests. In particular it's faster to serve static pages from the IMC than from disk, so unlike the BFC the IMC does cache static pages.
Because IMC entries disappear automatically and are essentially free to create, the IMC caches pages unconditionally when active (unlike the BFC). This means that it should normally have a relatively low TTL, often lower than the BFC's TTL. Note that because the IMC is before the BFC, it can load its cache from BFC cache hits.
For obvious reasons, it's pointless to set the IMC cache size to be larger than the number of requests a preforked SCGI process will serve before exiting.
To keep IMC memory usage under control, the IMC has a settable maximum page size that it will cache. Tune this as appropriate for your environment.
The IMC can be deliberately forced on with
imc-force-on, in case
you're running DWiki in some other preforking environment (for example
as a WSGI application under a preforking WSGI server such as uWSGI).
Under some setups, DWiki will only be running as a (preforking) SCGI server when it's under heavy load; in others DWiki is running this way all of the time, even when the load is light. Because the IMC unconditionally caches pages the latter situation can be annoying; it means that someone who, say, writes and posts a new comment may not see that comment until the IMC TTL expires. DWiki makes some attempt to bypass the IMC (and the BFC) in the common case of someone leaving a comment. However this is not perfect (in part because it requires the web browser to accept cookies from DWiki).
(This also applies if you're running DWiki in some other preforking environment and have forced the IMC on.)
If you're running DWiki full time in an IMC-on environment, you likely want to set a quite low IMC TTL, such as 15 to 30 seconds. If you're running DWiki with the IMC on only under heavy load you can set a higher IMC TTL, such as two minutes (120 seconds).
The renderer cache is actually two caches. The renderer cache proper caches the output of various renderers (cf TemplateSyntax). The output is cached with a validator and the cached results are fully validated before they get used; this means that renderer cache entries do not normally use a TTL and in theory could be valid for years.
The (heuristic) generator cache caches the output of some expensive
precursor generator routines. These cache entries only have heuristic
validators, where DWiki can be fooled if people try hard enough.
Generator cache entries do have a TTL, so that if the heuristic is
fooled DWiki will pick up the new result sooner or later. Some cache
entries can also explicitly invalidated by DWiki in a pretty reliable
process; by default, these have a much longer TTL than plain heuristic
cache entries. These are called 'flagged' (heuristic) generator entries
and various ConfigurationFile settings controlling how they behave are
(Trivia: the 'flagged' name is because such entries are invalidated using a flag file, or more accurately a flag cache entry.)
Currently the main renderer cache caches the output of various wikitext
to HTML rendering routines while the generator cache caches the results
of various filesystem 'find all descendents' walks that are used to
build lists of comments (for Atom comments feeds and some wikitext
macros; this uses explicit invalidation) and lists of pages (for Atom
feeds and various blog renderers such as
Unfortunately, a DWiki page that has comment or access restrictions must be cached separately for each DWiki user that views it. Under some situations this can result in a number of identical copies being cached under different names. If you want to avoid this, DWiki lets you turn off renderer caching for non-anonymous users.
The general validator for
blog::prevnext cache entries is the
modification time for all of the directories involved that had files in
them at the time (the latter condition is for technical reasons). The
heuristical validator checks that some of the file timestamps are still
the same, but it can't check all of them and still be a useful cache.
So the easy way to invalidate this is to change the modification
time of a directory involved, for example with
The 'list of pages' cache is similarly invalidated by changing a
directory modification time. Unlike the
blog::prevnext case, the
directory times are the only thing that this cache checks. This is a bit
of a pity but the performance improvements from caching this information
are very visible.
Much like comments, each page that has something cached for it
becomes a subdirectory, with the various cached things in files. The
different sorts of caches use different top-level directories under the
cachedir, so you have paths like
Because some results include absolute URLs that mention the
current hostname, DWiki must maintain separate caches for each
Host: header it sees in the BFC and the general renderers cache.
These are handled as subdirectories in each cache directory, so
cachedir/bfc/localhost/... and so on. Entries in the generator
cache don't depend on the current
Host: header, so there is only one
(sub)cache for all requests,
Generally the general renderers cache uses the largest amount of disk space, followed by the BFC, and the generator cache is the smallest.
If you're using caching (and as mentioned, you probably want to), you'll want to periodically trim the caches. ChrisSiebenmann just does this by hand every so often by removing the cache directories entirely; DWiki will then rebuild them as necessary.