dwiki: Recent Entries

Subdirectories: NewFeatures.

2013-09-12

ConfigurationFile, 16:07:54 by cks

DWiki's configuration file

DWiki's configuration file has a simple format. Blank lines and comments (any line that has a '#' as the first non-whitespace character) are just skipped, and everything else is interpreted as a configuration directive to set. Directives can be continued with additional lines by starting the continued lines with whitespace (as in email headers). The continuation whitespace will be turned into a single space in the final, un-continued version of the line.

Configuration directives have optional values, which are separated from the configuration item by whitespace. (Whitespace within the value is not interpreted, although trailing whitespace is removed from lines.)

So an example set of configuration file lines might be:
root		/web/data/dwiki
pagedir		pages
tmpldir		templates
wikiname	TestWiki
wikititle	Testing Wiki
DWiki requires and uses some configuration directives. Unused configuration directives are not errors; all configuration directives (and their values) become part of the context variables available for template ${...} expansion.

To simplify life, configuration directives are put through a canonicalization process. This operates like so:

if root is specified, it must be an absolute path to a directory.

if something ending in dir or file is not an absolute path and root is defined, DWiki sees if tacking on root results in the right sort of thing.

any directive ending in dir or file must wind up (possibly after the root prefixing above) being an absolute path to a directory or a file respectively.

if root is defined and pagedir, tmpldir, or rcsdir are not defined, DWiki sees if directories called pages, templates, or rcsroot exist under root and if so sets up the configuration directives appropriately.

Required configuration directives are: pagedir, tmpldir, wikiname, and rooturl. This means that with defaulting, the minimal DWiki configuration file is:
root	/some/where
rooturl /some/thing
wikiname SomeThing
Configuration directives and their meanings:

Core where-to-find-things configuration:

root

If present, this is taken as the root directory that further configuration directives can specify paths relative to.

pagedir

The root directory of the page hierarchy. (Required.)

tmpldir

The root directory of the template hierarchy. (Required; cannot be the same as pagedir.)

usercs

Support checking RCS files for things like page history, page locker, and so on. Whether or not usercs is set, DWiki refuses to serve files ending with ,v or in RCS directories; see InvalidPageNames. As a result, setting usercs is only necessary if you want page history et al to be visible to people visiting the DWiki; you can use RCS yourself on page files without setting it.

rcsroot

The root directory of the separate RCS file storage hierarchy; used only if usercs is on.
Normally, RCS files are expected to be in RCS directories under pagedir, where basic RCS commands put them (if you make those directories; DWiki requires you to work this way). With this directive on, the RCS ,v files for files under pagedir are instead found under here, in a mirror of the directory structure in pagedir, so you have pagedir/foo/bar and rcsdir/foo/bar,v. This keeps pagedir neater at the expense of requiring some scripting support.

Web configuration

wikiname

The short, one-word name of this DWiki. You probably want to have a CamelCased name. This shows up as the name of the breadcrumbs, among other places. (Required.)

wikititle

The full, multi-word title of this DWiki.

wikiroot

The front page of the DWiki; the page you get redirected to when you request the DWiki's root. If this isn't set or doesn't exist, DWiki tries wikiname's value as a page name; if that doesn't work, people see the DWiki's root directory in a directory view.

rooturl

The URL of the directory that is the root of the DWiki instance; use '/' to mean 'the root of the web server'.

publicurl

If set, DWiki puts this directory's URL on the front of DWiki URLs instead of rooturl.

staticdir

The directory to serve static files from. DWiki only serves files from this hierarchy; requests for a directory will fail.

staticurl

The URL of the directory that is the root of static files. If staticurl doesn't start with a slash, it's taken as a subdirectory of rooturl. (Requires staticdir to be set.)

charset

If set, DWiki claims that all text/html and text/plain content it generates is in this character set in HTTP replies. Normally 'UTF-8' these days. If unset, DWiki does not label text/html and text/plain HTTP replies with character set information. You should set this to 'UTF-8'. Really. It shouldn't even be optional.

cssurlprefix

This is technically not a DWiki configuration directive as such because it isn't interpreted by the program. Instead it's used by the standard html/css template as one option for where to find DWiki's standard CSS file, dwiki.css. If this is set it's the URL of a directory (without a trailing slash). If this is not set, the html/css template assumes that dwiki.css can be found at ${staticurl}/dwiki.css. It's more efficient to serve dwiki.css outside of DWiki itself, since it's a static file.

Note that various parts of DWikiText rendering do not look right if the CSS is missing (in particular, all sorts of tables are likely to look bad).

DWiki URL to file mapping

When DWiki gets a request for a URL, it tries to turn it into a request for something under either staticurl (if defined) or rooturl; whatever is left after subtracting the appropriate thing is the path being served relative to staticdir or pagedir. staticurl is checked first, so it can be a subset of the URL space available under rooturl.

For safety reasons, DWiki only tries to process a request if the request's URL falls under either staticurl or rooturl. If DWiki receives a request for anything outside those two, something is clearly wrong and it generates a terse error page.

When it generates URLs for DWiki pages DWiki normally puts rooturl on front (as a directory). However, if you set publicurl DWiki puts that on the front instead.

This is useful if for internal reasons you receive requests with their URLs rewritten to something users shouldn't (or can't) use. The case ChrisSiebenmann knows is Apache with URL aliases and the DWiki CGI-BIN being run via suexec.

Authentication

See Authentication for more information on the authentication system.

authfile

Where DWiki can find user / password / group information for the DWiki's users. If this is set, the DWiki has authentication.

defaultuser

If set, all otherwise not authenticated connections get to be this user, if the user is in authfile. This should be used carefully, as it makes all requests to the DWiki be authenticated (since they all have a user, if even only the default user). If this is set, the username it is set to is said to be the 'guest user'.

global-authseed

This is a special magic token to make it harder to brute-force people's DWiki passwords in some situations. It can be any value and should be kept secret.

global-authseed-file

This is the file to read global-authseed from, if it is set. The file has no special format, but should contain some randomness and its contents should be kept secret.

authcookie-path

This controls the 'path=' value for the authentication cookies generated by DWiki. If not set to a value, we use the root URL; otherwise we use the value straight. If it is not set, authentication cookes have no explicit 'path=' setting. ChrisSiebenmann has come to believe that you don't want to set this, and it remains as a vestigial remnant.

logins-report-bad

If present in the configuration file, DWiki will log the username (or at least the first 50 characters of it) for bad logins with unknown usernames. This is not necessarily a good idea but at one point was interesting to track what form-stuffing spammers were doing.

Comments

commentsdir

The root directory for storing comments in. The only place DWiki writes permanent data to.

comments-on

Enable commenting in this DWiki. This requires that commentsdir be defined and that authentication be enabled.

comments-in-normal

Your standard templates display comments on the normal view of the page instead of the 'showcomments' view.

remap-normal-to-showcomments

DWiki will remap the 'normal' view for pages to the 'showcomments' view, thereby implementing comments-in-normal without you needing to change the standard templates.

If you want to enable anonymous comments you should create a guest user in the DWiki authfile and then set guest as the defaultuser. (Well, you can use the username of your choice, but guest is conventional.)

Caching

DWiki can optionally cache the results of page generation to speed up response time. See Caching for a longer discussion.

cachedir

The root directory for storing the caches. It should not be used for anything else (ie, not it should not also be pagedir, tmpldir, or commentsdir). DWiki will write scratch files to here.

cache-warn-errors

Log warnings about cache store errors. (These are non-fatal but indicate that your cache isn't caching.)

render-cache

Enable caching the results of selected renderers and renderer components. (Requires cachedir to be set.)

render-heuristic-ttl

The TTL of renderer cache entries with heuristic validators, in seconds. The default value is an hour.

render-anonymous-only

Use the renderer cache only for the guest user or for connections that are not authenticated.

render-heuristic-flagged-ttl

The TTL of renderer cache entries that have explicit invalidation (aka 'flagged' cache entries), in seconds. The default value is 48 hours, as explicit invalidation is considered safer than heuristic invalidation.

render-heuristic-flagged-delta

In order to lessen the chance of races between renderer cache invalidation and renderer cache regeneration, flagged cache entries must be at least this many seconds more recent than the invalidation marker (if it exists). Defaults to 30 seconds.

bfc-cache-ttl

Enable a brute force page cache of complete pages with a TTL of this many seconds. (Requires cachedir to be set.)

bfc-time-min

A complete page will be cached if it took at least this much of a second to be generated. Defaults to 0.75 of a second.

bfc-load-min

A complete page will be cached if the load average is at least this high. No default; the BFC normally doesn't look at the load average at all.

bfc-time-triv

Regardless of the setting of bfc-load-min, don't bother looking at the load average if the page took at most this long to generate. Defaults to 0.09 of a second.

bfc-atom-ttl

Use this TTL for Atom syndication requests, instead of the normal one.

bfc-atom-nocond-ttl

Use this TTL for Atom syndication requests that are not using conditional GET, and also force the caching of the results of these requests regardless of the load.

bfc-skip-robots

If set, this is a list of User-Agent substrings (formatted as for bad-robots, see later) for robots that should not cause entries to be put into the BFC.

imc-cache-entries

Enable an in-memory cache of complete pages with this many entries. The IMC skips all pages that the BFC skips. The IMC is only meaningful if the same process handles more than one request, so by default it is only enabled if DWiki knows that it is running using dwiki-scgi.py as a preforking SCGI server.

imc-force-on

Force the IMC on even if DWiki would not enable it. You probably only want to use this if you are running DWiki as a WSGI application inside a preforking WSGI server such as uWSGI.

imc-cache-ttl

The TTL, in seconds, of entries in the in-memory cache; must be provided if imc-cache-entries is.

imc-resp-max-size

The maximum size (in kilobytes) of pages that will be cached in the in-memory cache. The default value is 256 KB.

slow-requests-by

Delay all requests by this much, in fractional seconds. Normally used only for testing BFC.

In practice some degree of caching is mandatory for decent performance once your DWiki gets big enough and so it's recommended that you turn on render-cache and bfc-cache-ttl unless you have a good reason to do otherwise. Turn on imc-cache-entries and imc-cache-ttl if you're using SCGI.

Syndication feed controls

atomfeed-display-howmany

How many items at most an Atom feed should display. If set, it must be a positive integer; if not set, atom::pages and atom::comments use a default of 100 items.

feed-max-size

How many kilobytes atom::pages or atom::comments should try to limit their output to. If set, either stops adding new entries (regardless of how many entries have been processed already) once they have generated that many kilobytes or more of output. Because of the 'or more' clause, you should allow for a safety margin. If unset, syndication feeds are not size-limited.

feed-max-size-ips

If set, this is a whitespace separated list of IPv4 addresses, tcpwrappers style IPv4 address prefixes (eg '66.150.15.'), or IPv4 CIDRs (eg '66.150.15.0/25') that feed-max-size applies to. Syndication requests from any other addresses are not size-limited. If unset, feed-max-size applies to all syndication requests, regardless of what IP address makes the request. This option can be specified multiple times; if so, all the addresses are merged together.

feed-start-time

If set, pages older than this time will not appear in Atom feeds, which is handy if you want to move a DWiki, redirect the old URLs, and not flood people's Atom feeds (because the Atom <id> for pages is the page's full URL unless you've set atomfeed-tag). The value can be specified either as an integer Unix timestamp, as 'YEAR-MO-DA [HH:MM[:SS]]', 'YEAR/MO/DA', or an Atom format time string, and is always in local time (even when specified as an Atom format time string; sorry).

atomfeed-tag

If set, the atom::pagetag renderer will use it to generate Atom <id>s for pages in the format <tag>:/<page path>. This should normally be set to a tag:-based URI; see here for a discussion.

atomfeed-tag-time

If set, the atom::pagetag renderer will only generate tag-formatted Atom <id>s for pages more recent than this time. This can be used to make a graceful transition into tag-based Atom <id>s for an existing DWiki (and then, with feed-start-time, to graceful move it). This has the same time format as feed-start-time.

atomfeed-virt-only-adv

If set, restrict what Atom page feeds are advertised for virtual directories. If we are displaying a vdir and it is not a listed type, we advertise the Atom feed for the real directory instead (eg, for 'blog/2007/10/' the Atom feed advertised would be for 'blog/'). This is a space-separated list of vdir types; the allowed types are latest, oldest, range, calendar, and the calendar subtypes year, month, and day.

atomfeed-virt-only-in

If set, restrict what virtual directories allow Atom page feed requests. A disallowed latest or range feed request is (permanently) redirected to the real directory's feed; other disallowed feeds get 404 responses. The format and list of vdir types is the same as for atomfeed-virt-only-adv. If this is set, it becomes atomfeed-virt-only-in's default value. If both are set, this should be a superset of atomfeed-virt-only-adv's value; otherwise DWiki will advertise feeds that it will refuse requests for.
You should normally allow feeds for latest because this gives people a way of controlling how large a feed they pull from you; they can use, eg, 'blog/latest/10/?atom' to pull only a ten-entry feed instead of your full-sized feed.
These two directives don't change or affect what Atom comment feeds are advertised or allowed; they affect only Atom feeds for pages.

Other features:

alias-path

This sets the DWiki path for the third place to try to find CamelCase links in (see Formatting). This allows a DWiki to have a collection of CamelCase names for things that are globally usable but that don't clutter up the DWiki root directory.
This is a DWiki path, not a filesystem path (and is implicitly always an absolute DWiki path). The conventional value is Aliases.

search-on

enables searching. If it has the value 'authenticated', only authenticated users can search. Note that if you have a guest user set, all users are authenticated.

blog-display-howmany

How many items the blog::blog renderer should try to restrict most pages it displays to. If set, it must be a positive integer; if not set, blog::blog uses a default.

canon-hosts

If set, this is a space-separated list of canonical hostnames for this DWiki. If a request has a Host: header that is not in this list, DWiki immediately serves up a redirection to the first hostname in the list (or canon-host-url, if that is set), which is assumed to be the preferred hostname.

canon-host-url

If set, this is the canonical URL for the host of this DWiki (without the ending /, but including http or https and the port if necessary). DWiki will generate redirects and absolute URLs that use this URL. If canon-hosts is also set, this should be the full version of the first entry in canon-hosts.
(This is primarily useful in some hopefully unusual situations involving HTTP-to-HTTPS transitions.)

literal-words

If set, this is a list of strings, separated by ' | ' (space, |, space), that will be rendered literally and not considered to contain markup, as if each of them had been specified in '.pn lit <whatever>' processing note directives.

Special oddities

dump-req-times

Report the amount of time that requests took to standard error. This is set by the standard -T option.

dump-atom-reqs

Report on Atom requests to standard error. This is set by the standard -A option.

stamp-messages

Add timestamp and client IP address to messages reported by the above two options. This is set by the standard --stamp option.

These are documented because you might want to set them directly if you're running DWiki as a WSGI application inside some standard WSGI server (such as uWSGI, Apache's mod_wsgi, or gUnicorn).

Dealing with bad clients:

bad-robots

If set, this is a list of User-Agent substrings, separated by ' | ' (space, |, space), for robots that should get permission denied responses when they try to fetch pages in various views that no robot should be fetching. Currently the list of bad views is atom, atomcomments, source, and writecomment, all of which are typically fetched by robots that don't respect rel="nofollow" on links.

no-ua-is-bad-robot

If set, any request with a missing User-Agent header is considered to be from a bad robot.

banned-robots

If set, this is a list of User-Agent substrings (formatted as for bad-robots) for robots that should get permission denied responses on all requests.

banned-ips

If set, this is a list of IPv4 addresses, tcpwrapper style IP prefixes, or CIDRs (as for feed-max-size-ips) for addresses that will get access denied responses for all requests. It can be specified multiple times.

banned-comment-ips

If set, this is like banned-ips but only applies to attempts to write comments.

bad-robot-ips

If set, this is like banned-ips but only applies to requests that try to fetch pages in various views that no robot should be fetching (as in bad-robots).

Under normal circumstances it's more efficient to use your web server's access controls to totally ban IP addresses and bad user-agents; your web server usually has faster code for this and you don't have to get DWiki involved in the process. banned-robots and banned-ips exist because this is not always possible.

2013-08-30

Authentication, 11:39:21 by cks

DWiki Authentication

DWiki has optional support for authenticating users, which is a prerequisite for restricting access to pages and for allowing people to comment. User authentication is done by cookies, which means that people wanting to be authenticated have to accept cookies from the DWiki's web server.

Whether authentication is on is controlled by the authfile setting in the ConfigurationFile; if it is set, it specifies a password file for the DWiki. Once enabled, a login box will appear at the bottom of pages where people can enter their login and password into a form and submit it to the wiki. If the password is correct, DWiki will send back a login cookie and the session is now authenticated (provided that the user's browser then sends the cookie back to DWiki with future requests).

An authenticated person has a login name and may optionally be in some groups. When checking permissions, logins and groups are treated the same (so you should not create groups that have the same name as users; this is either pointless or dangerous, depending on how many people are in the group). What groups a login is part of is specified in the password file.

To be precise, an authenticated request is any request that has a valid associated login name. Normally this happens because the user's browser sent back a valid DWiki login cookie, but a DWiki may have a default login, set in the ConfigurationFile. If the default login is set and exists in the password file, everything is authenticated; either as a 'real' (passworded) login or as the default login.

Because DWiki is hard-coded to require authentication before people can write comments, setting a default user is the only way to let the world (potentially) comment on your DWiki.

Using Authentication

Authentication is used by the {{Restricted}} and {{CanComment}} DWikiText macros. Without arguments they restrict the page to authenticated people or allow comments by authenticated people (respectively). With arguments, they restrict things more tightly. There are two sorts of arguments:

positive arguments are plain logins or groups, and require the authenticated session to be one of the things named.

negative arguments start with '-' and are then logins or groups, and require the authenticated session to not be one of the things named.

If only negative arguments are given, anyone not mentioned passes; if both positive and negative arguments are given, you must pass the positive arguments and not fail the negative arguments.

Directories can create default permissions for everything under them by having a special file called __access with either or both of Restricted and CanComment macros. __access files are checked backwards from the page being looked at, and the first one that contains a Restricted or a CanComment (depending on what is at issue) wins. __access files can have other content, although ChrisSiebenmann doesn't expect people to look at them very often.

Note: this means that subdirectories can give back permissions that were denied by a higher-level directory. This is deliberate.

Authentication limits

DWiki authentication protects only file contents. It does not protect directory contents and it thus doesn't protect a page's (file) name. Moral: don't put sensitive information into page names.

Password security

Note: DWiki doesn't specially encrypt login / password information while it's being sent to the web server. Unless the entire connection is running over SSL, people can theoretically snoop the password in clear text.

DWiki doesn't store someone's clear text password (even in its password file); instead it stores a hash of the password, using a format that guarantees that if two different people use the same password they will get different hashes. (Barring the hash function itself being broken.)

As always, people should be strongly discouraged from using important passwords (eg, their Unix account passwords) for any web service, a DWiki included. Using one's Unix login name as one's DWiki login name is harmless and even convenient.

The cookie

The cookie DWiki uses has the login name in clear text, and is authenticated with an added hash value. If you want the gory details, see authcookie.py and htmlauth.py in the DWiki source code. With a proper global-authseed secret in the ConfigurationFile, it is believed to be secure from all brute-force attacks.

The cookie is normally quite long-lived. It becomes invalid if the user's password or the DWiki global authseed change.

The cookie is not restricted to coming from a single IP address or anything like that.

Format of the password file

The password file has a simple format. Blank lines and comment lines (lines that have a '#' character as their first non-whitespace) are ignored. Otherwise, lines have the format:
<login>	<password-hash>		[<group> ....]
There can be any amount of whitespace between elements; groups are optional.

The easy way to add logins or change passwords is with the dpasswd.py program in the DWiki source. Adding or changing groups, or deleting logins, you get to do by editing the file directly.

DWiki has no support for creating logins or changing passwords over the web. This is deliberate.

How you manage this process in general is up to you; in non-paranoid environments ChrisSiebenmann uses a group-writeable password file owned by an appropriate (Unix) group.

As a hack, the password file can also contain supplemental information about a DWiki login in the format:
.also <login>	<'real' name> | <url>
This line must come after the main line for a given login but it doesn't have to be immediately afterwards. If present the real name and URL are used as the default values for these when that user is writing comments. Either or both may be blank (although if both are blank, there's no point to the entire .also entry). Giving the default login a name (such as 'Anonymous') means that anonymous comments will not normally have their submission IP address shown (the default templates do not show the IP address if name information is available).

2013-08-29

GlobalVariables, 16:47:40 by cks

DWiki Global Variables

As TemplateSyntax discusses, one can use global variables in templates in several ways. However, it helps to know what global variables are available. Thus this incomplete listing.

First, all ConfigurationFile directives are available as global variables.

Then, during request processing DWiki internally defines a number of additional global variables:

page The current page's full path, in DWiki form.

abspage The current page's full path, including a '/' at the start.

pagename The page's name; its last path component.

pagetype The type of the page, usually 'file' or 'dir'.

view-format The current view being processed.

relname In blog::blog, the name of the current page relative to the blog directory being displayed.

basepage In a VirtualDirectory context, the full path of the non-virtual directory. Otherwise the same as page.

:wikitext:title After a piece of wikitext has been rendered (more exactly after any wikitext template renderer has been used, including wikitext:cache), this is its title if any exists. The 'title' of a piece of wikitext is the text of the header that is at the start of the text, if there is one. This is the same as the wikitext:title template renderer but may be more convenient to use.

:wikitext:title:nohtml This is the title but without any HTML markup, making it useful for eg a <title>. It's the same as the :wikitext:title:nohtml template renderer.

login The currently authenticated user.

comment-ip IP address that posted the current comment.

comment-login Login of the user that posted the current comment, if it is not the anonymous user.

comment-name The supplied name of the user that posted the current comment, if any.

comment-url The user's supplied website URL (if any) for the current comment.

:comment:post The result of an attempt to post a comment. One of 'good', 'bad', 'badchar', or 'nocomment' (the latter if it was an attempt to post an empty comment). (Only defined during comment posting.)

:error:error Error type. Only defined during error processing.

:error:code The numerical HTTP status code for an error. Only defined during error processing.

http-command The type of HTTP command being processed, either GET or POST.

http-version The (claimed) version of HTTP that the current request used.

remote-ip The IP address the current request came from.

server-name The hostname or IP address for this web server that the sender of the current request claims to have used.

Not all of these are defined all of the time. Generally a context-dependant variable is only defined when the current thing being processed has that sort of information.

There are other global variables that get set, but they are for more internal use, and you're best off browsing the source code for them.

`page`	The current page's full path, in DWiki form.
`abspage`	The current page's full path, including a '`/`' at the start.
`pagename`	The page's name; its last path component.
`pagetype`	The type of the page, usually 'file' or 'dir'.
`view-format`	The current view being processed.
`relname`	In blog::blog, the name of the current page relative to the blog directory being displayed.
`basepage`	In a VirtualDirectory context, the full path of the non-virtual directory. Otherwise the same as `page`.
`:wikitext:title`	After a piece of wikitext has been rendered (more exactly after any wikitext template renderer has been used, including `wikitext:cache`), this is its title if any exists. The 'title' of a piece of wikitext is the text of the header that is at the start of the text, if there is one. This is the same as the `wikitext:title` template renderer but may be more convenient to use.
`:wikitext:title:nohtml`	This is the title but without any HTML markup, making it useful for eg a <title>. It's the same as the `:wikitext:title:nohtml` template renderer.
`login`	The currently authenticated user.
`comment-ip`	IP address that posted the current comment.
`comment-login`	Login of the user that posted the current comment, if it is not the anonymous user.
`comment-name`	The supplied name of the user that posted the current comment, if any.
`comment-url`	The user's supplied website URL (if any) for the current comment.
`:comment:post`	The result of an attempt to post a comment. One of 'good', 'bad', 'badchar', or 'nocomment' (the latter if it was an attempt to post an empty comment). (Only defined during comment posting.)
`:error:error`	Error type. Only defined during error processing.
`:error:code`	The numerical HTTP status code for an error. Only defined during error processing.
`http-command`	The type of HTTP command being processed, either GET or POST.
`http-version`	The (claimed) version of HTTP that the current request used.
`remote-ip`	The IP address the current request came from.
`server-name`	The hostname or IP address for this web server that the sender of the current request claims to have used.

2013-08-27

NewFeatures/RSS2Feeds

DWiki can now generate RSS 2.0 format syndication feeds for recently changed pages. This is a terrible hack that should not exist but ChrisSiebenmann has to deal with a few things that don't accept Atom format feeds, only RSS 2.0 feeds. RSS 2.0 page feeds are just like Atom page feeds and all Atom page feed restrictions and configuration options apply to them too. They are not advertised anywhere (either in page tools or in feed autodiscovery); to get access to them you must specify the feed URL directly, using the view name 'rss2' (as in http://you.cim/dwiki/?rss2).

See dwiki/view-rss2.tmpl and syndication/rss2entry.tmpl for what RSS 2.0 elements are used and how.

There is no RSS 2.0 feed for page comments.

(Because this is a hack, asking for the RSS 2.0 feed of VirtualDirs that are restricted such that they get redirections to the base directory, per AtomFeedsAndVirtualDirs, will get you a redirection to the Atom feed for that base directory. This is considered acceptable since people aren't supposed to be using those feeds anyways.)

Written 10:32:53 by cks.

TemplatesUsed, 10:17:02 by cks

What templates DWiki uses

Per ProcessingModel, DWiki ultimately produces output by expanding a template. This means that DWiki has to figure out what template to use for this process, and because the TemplateSyntax is fairly limited, it is much simpler for DWiki to start with a separate template for every different view of things it wants to have.

This means that while DWiki tries not to hardcode template names or the structure of the template directory, there are a certain amount of hardcoded names it knows about that need to be there for proper DWiki operation.

The short list of such templates is:

dwiki/view-*.tmpl, dwiki.tmpl: starting view templates.

views/*: conventional location for templates that display a particular ordinary view.

error.tmpl, errors/*: displaying errors (always 404 responses).

login-error.tmpl: displaying a login error (a regular page, not a 404).

Comment templates:
comment/comment.tmpl: used to show each comment when we're showing all comments.

comment/posting.tmpl: used to show the result of posting a comment. By convention, comment/posted-<result>.tmpl is used to display specific results, where <result> is one of 'good' (the comment was posted successfully), 'bad' (something went wrong), 'badchars' (the comment has bad characters in it), or 'nocomment' (the comment was empty and DWiki refused to post it).

blog/blogdirpage.tmpl: used to show each page in BlogDir view.

blog/blogentry.tmpl: used to show each page in Blog view.

syndication/atomentry.tmpl: used to render an Atom feed entry for each page.

syndication/atomcomment.tmpl: used to render an Atom feed entry for each comment.

syndication/rss2entry.tmpl: used to render an RSS 2.0 feed entry for each page.

All paths are relative to the template directory.

Determining a template for a view

For views that are displayed using templates, DWiki tries to find the starting template by looking in three places, in order:

dwiki/view-<view>-<pagetype>.tmpl

dwiki/view-<view>.tmpl

dwiki.tmpl

By convention, everything that generates text/html pages just goes through dwiki.tmpl so that there is one place that does top-level 'skinning' for the entire DWiki. Only views that both use templates and generate something besides text/html sidestep this.

The standard dwiki.tmpl uses the #{<...} first-found template inclusion mechanism (see TemplateSyntax) to pull in the real per-view content. It looks in four places to try to find this content, in this order:

Overrides/...$(page)/$(view-format).tmpl

Overrides/...$(page)/all.tmpl

views/$(view-format)-$(pagetype).tmpl

views/$(view-format).tmpl

The first two allow page and directory hierarchy specific overrides; the latter two are the generic places. Most views don't need to distinguish between file types, but the 'normal' view must use different templates for files and directories (since a directory doesn't have wikitext to display).

The current template-based views are: normal, history, search, blog, blogdir, atom, atomcomments, sitemap, showcomments, and writecomment. The login and logout views are 'synthetic' and don't actually display anything unless an error happens. The 'source' view simply dumps the page content out straight without getting anywhere near templates.

Note that the atom and atomcomments views are special: although they render through templates, they generate application/atom+xml content instead of text/html. Thus they use dwiki/view-* templates directly, bypassing dwiki.tmpl. The sitemap view is similarly special, although it generates application/xml content.

Error templates

Errors are rendered by the template error.tmpl. There are special error renderers error::title and error::body that look for error-specific additional templates in the subdirectory errors/. Each type of error looks for titles as errors/<error>-title.tmpl and main error body as errors/<error>.tmpl (with internal defaults if they don't exist).

Current error types: badaccess, badformat, badpage, inconsistpage, nopage.

Everything else is free and floating

That's it. DWiki has no other hardcoded template names.

2013-03-06

Caching, 13:46:35 by cks

DWiki's caching system

DWiki has optional caching in order to speed up generating results repeatedly. DWiki uses a disk-based cache for this (although the interface is abstracted and alternate forms of caching may be introduced someday). There are three caches, which can be enabled separately: the renderer cache, a brute force page cache, and an in-memory brute force page cache that is only used if DWiki is running as a preforking SCGI server.

DWiki never removes the files of out of date cache entries from the disk cache; instead, it stops considering out of date ones to be valid. Cleaning out the detritus is left for an external process. ChrisSiebenmann considers this safer; giving a program an automated unlink() makes him nervous.

See ConfigurationFile for the options controlling the behavior of the caches.

In theory DWiki's caching is optional. In practice a decent sized DWiki is simply too slow without caching for some of the more expensive operations and caching becomes more or less a necessity. ChrisSiebenmann now believes that you should configure all levels of caching in basically any DWiki unless you have some unusual need and are sure.

The brute force cache

The brute force page cache is about as simple as you can get: it caches complete requests for a configured time (called a time-to-live, or TTL). That's it. The BFC is intended as a load-shedding measure when DWiki is under significant load, so it only acts under certain circumstances:

only on GET or HEAD requests.

only on requests without a Cookie: header.

requests only get put into the cache if the system seems loaded.

(For speed, when something is valid in the cache DWiki just serves it without checking the system load.)

A good BFC TTL is on the order of 30 seconds to three minutes or so; long enough to shed significant load if you are getting a lot of hits to a few pages and short enough that dynamic pages won't become too outdated. (And that waiting to see a comment show up or whatever is not too annoying.)

Because Atom syndication requests are among the most expensive pages to compute, the BFC can be set to give them a longer TTL than usual. There is a second TTL that can be set for Atom requests that aren't using conditional GET; the idea is that if requesters cannot be bothered to be polite, we can't be bothered to serve fresh content. Setting this option always caches the results of such requests, even if the load is low, which means that even people doing proper conditional GET requests will use the cached results for as long as their (lower) TTL says to.

It's actually faster to serve static pages from the static page server code than from the BFC, so the BFC doesn't try to cache static pages.

The two sides of the BFC

It's important to understand that the BFC does not check load when it is checking to see if something is in its cache. This means there are two stages to processing a request: deciding what TTL to use for cache checks, and deciding whether to cache something that was not current in the cache.

The TTL used is:

bfc-atom-nocond-ttl if this is an unconditional request for an Atom view, if set.

bfc-atom-ttl for Atom view requests in general, if set.

bfc-cache-ttl otherwise.

Pages enter the BFC cache either because the system seems to be loaded or because bfc-atom-nocond-ttl was set and they were an unconditional request for an Atom view.

Once something is in the cache, it will be served from the cache if it is not older than the check TTL. Different requests can use different check TTLs for the same cached page; for example, conditional GETs versus other requests for Atom views.

The in-memory cache

The in-memory cache is essentially a version of the brute force cache that holds pages in memory instead of on disk. It's only effective in environments where DWiki serves multiple requests from the same process; currently it's only used if DWiki is running as a preforking SCGI server. Because it holds pages in memory as page response objects, the in-memory cache is about the fastest way that DWiki can serve requests. In particular it's faster to serve static pages from the IMC than from disk, so unlike the BFC the IMC does cache static pages.

Because IMC entries disappear automatically and are essentially free to create, the IMC caches pages unconditionally when active (unlike the BFC). This means that it should normally have a relatively low TTL, often lower than the BFC's TTL. Note that because the IMC is before the BFC, it can load its cache from BFC cache hits.

For obvious reasons, it's pointless to set the IMC cache size to be larger than the number of requests a preforked SCGI process will serve before exiting.

To keep IMC memory usage under control, the IMC has a settable maximum page size that it will cache. Tune this as appropriate for your environment.

The IMC can be deliberately forced on with imc-force-on, in case you're running DWiki in some other preforking environment (for example as a WSGI application under a preforking WSGI server such as uWSGI).

Considerations for the IMC TTL

Under some setups, DWiki will only be running as a (preforking) SCGI server when it's under heavy load; in others DWiki is running this way all of the time, even when the load is light. Because the IMC unconditionally caches pages the latter situation can be annoying; it means that someone who, say, writes and posts a new comment may not see that comment until the IMC TTL expires. DWiki makes some attempt to bypass the IMC (and the BFC) in the common case of someone leaving a comment. However this is not perfect (in part because it requires the web browser to accept cookies from DWiki).

(This also applies if you're running DWiki in some other preforking environment and have forced the IMC on.)

If you're running DWiki full time in an IMC-on environment, you likely want to set a quite low IMC TTL, such as 15 to 30 seconds. If you're running DWiki with the IMC on only under heavy load you can set a higher IMC TTL, such as two minutes (120 seconds).

The renderer cache

The renderer cache is actually two caches. The renderer cache proper caches the output of various renderers (cf TemplateSyntax). The output is cached with a validator and the cached results are fully validated before they get used; this means that renderer cache entries do not normally use a TTL and in theory could be valid for years.

The (heuristic) generator cache caches the output of some expensive precursor generator routines. These cache entries only have heuristic validators, where DWiki can be fooled if people try hard enough. Generator cache entries do have a TTL, so that if the heuristic is fooled DWiki will pick up the new result sooner or later. Some cache entries can also explicitly invalidated by DWiki in a pretty reliable process; by default, these have a much longer TTL than plain heuristic cache entries. These are called 'flagged' (heuristic) generator entries and various ConfigurationFile settings controlling how they behave are render-heuristic-flagged-....

(Trivia: the 'flagged' name is because such entries are invalidated using a flag file, or more accurately a flag cache entry.)

Currently the main renderer cache caches the output of various wikitext to HTML rendering routines while the generator cache caches the results of various filesystem 'find all descendents' walks that are used to build lists of comments (for Atom comments feeds and some wikitext macros; this uses explicit invalidation) and lists of pages (for Atom feeds and various blog renderers such as blog::prevnext).

Unfortunately, a DWiki page that has comment or access restrictions must be cached separately for each DWiki user that views it. Under some situations this can result in a number of identical copies being cached under different names. If you want to avoid this, DWiki lets you turn off renderer caching for non-anonymous users.

Force-invalidating list of pages caches

The general validator for blog::prevnext cache entries is the modification time for all of the directories involved that had files in them at the time (the latter condition is for technical reasons). The heuristical validator checks that some of the file timestamps are still the same, but it can't check all of them and still be a useful cache.

So the easy way to invalidate this is to change the modification time of a directory involved, for example with touch.

The 'list of pages' cache is similarly invalidated by changing a directory modification time. Unlike the blog::prevnext case, the directory times are the only thing that this cache checks. This is a bit of a pity but the performance improvements from caching this information are very visible.

Disk space usage and directories

Much like comments, each page that has something cached for it becomes a subdirectory, with the various cached things in files. The different sorts of caches use different top-level directories under the cachedir, so you have paths like cachedir/bfc, cachedir/renderers, and cachedir/generators.

Because some results include absolute URLs that mention the current hostname, DWiki must maintain separate caches for each Host: header it sees in the BFC and the general renderers cache. These are handled as subdirectories in each cache directory, so cachedir/bfc/localhost/... and so on. Entries in the generator cache don't depend on the current Host: header, so there is only one (sub)cache for all requests, cachedir/generators/all/....

Generally the general renderers cache uses the largest amount of disk space, followed by the BFC, and the generator cache is the smallest.

If you're using caching (and as mentioned, you probably want to), you'll want to periodically trim the caches. ChrisSiebenmann just does this by hand every so often by removing the cache directories entirely; DWiki will then rebuild them as necessary.

2013-02-01

NewFeatures/AtomFeedsAndVirtualDirs

DWiki can now restrict what sorts of VirtualDirs advertise AtomFeeds (both in SyndicationDiscovery and in the Atom toolbar) and/or provide them if they're requested by URL.

It turns out that when you have a fair amount of content in a DWiki your VirtualDirs and thus your AtomFeeds proliferate like over-active rabbits. Then SyndicationDiscovery kicks in so that anyone who looks at a virtual directory can discover its Atom feed and either start polling it or just crawl it. Once your DWiki gets big enough this becomes not really a good thing, as Chris has found out with his techblog.

Written 17:46:19 by cks.

2011-12-09

NewFeatures/DisallowDirViews

Directories can now say that they don't want to be rendered in specific view types. The usage case Chris has in mind is his techblog, where the blogdir view of categories is utterly huge because it renders hundreds of entries. Because this is intended to be a graceful gentle fix, trying to view a directory in a disallowed view generates a (permanent) redirection to the default view of the directory. To avoid redirection loops, this redirection only happens if the view has been specified explicitly as a URL parameter.

(For obvious reasons, disallowed views are also disallowed in virtual directories derived from a particular real directory.)

This is done similarly to DefaultDirViews: touch a file in the directory called .flag.noview:<viewname>. Unlike default views, this is not currently inherited by child directories.

The 'See As' page tools links also exclude disallowed view types. Right now they do so a little bit too thoroughly, in that they exclude the default view if it's also disallowed. Moral: don't do that, even though the code saves you from a redirection loop in this case.

Right now there is no restrictions on what (directory) views can be disallowed, so you can disallow Atom feeds. This is probably not a feature and will probably not be staying, although Chris may change his mind about this or just be lazy.

Written 13:43:38 by cks.

2011-06-01

NewFeatures/VariousConfigBits

New: various new configuration options

Not covered before now are various new configuration options that have been quietly added to DWiki over the five or so years that I have been using it as mostly a blogging engine. As you might expect, a bunch of these have to do with dealing with obnoxious clients of various sorts.

They are by and large documented in ConfigurationFile. I am not going to try to remember them here.

Written 10:09:45 by cks.

NewFeatures/ProcessingNotes

New: DWikiText has 'processing notes' (and better quoting)

These are directives that change how DWikiText is interpreted to do things like turn off certain font characters or map a simple-to-type character sequence like '->' to a HTML entity. They are documented in DWikiText so I am not going to repeat myself here.

In the process I added a new and less annoying plain quoting mechanism: ``...''. It looks better in ASCII than it probably does in the font here.

(The code for this was written in August of 2007 or so, but I sat on it because I wasn't entirely sure I liked the feature. Well, nuts to that; time to roll things out and just go.)

Written 10:06:13 by cks.

[There's more, starting at 2006/04/08 or Previous 10]

(Previous day)

By day for September 2013: 12; before September.