dwiki: Chronological entries

Subdirectories: NewFeatures.

ConfigurationFile , 2013-09-12 16:07:54 by cks

DWiki's configuration file

DWiki's configuration file has a simple format. Blank lines and comments (any line that has a '#' as the first non-whitespace character) are just skipped, and everything else is interpreted as a configuration directive to set. Directives can be continued with additional lines by starting the continued lines with whitespace (as in email headers). The continuation whitespace will be turned into a single space in the final, un-continued version of the line.

Configuration directives have optional values, which are separated from the configuration item by whitespace. (Whitespace within the value is not interpreted, although trailing whitespace is removed from lines.)

So an example set of configuration file lines might be:
root		/web/data/dwiki
pagedir		pages
tmpldir		templates
wikiname	TestWiki
wikititle	Testing Wiki
DWiki requires and uses some configuration directives. Unused configuration directives are not errors; all configuration directives (and their values) become part of the context variables available for template ${...} expansion.

To simplify life, configuration directives are put through a canonicalization process. This operates like so:

if root is specified, it must be an absolute path to a directory.

if something ending in dir or file is not an absolute path and root is defined, DWiki sees if tacking on root results in the right sort of thing.

any directive ending in dir or file must wind up (possibly after the root prefixing above) being an absolute path to a directory or a file respectively.

if root is defined and pagedir, tmpldir, or rcsdir are not defined, DWiki sees if directories called pages, templates, or rcsroot exist under root and if so sets up the configuration directives appropriately.

Required configuration directives are: pagedir, tmpldir, wikiname, and rooturl. This means that with defaulting, the minimal DWiki configuration file is:
root	/some/where
rooturl /some/thing
wikiname SomeThing
Configuration directives and their meanings:

Core where-to-find-things configuration:

root

If present, this is taken as the root directory that further configuration directives can specify paths relative to.

pagedir

The root directory of the page hierarchy. (Required.)

tmpldir

The root directory of the template hierarchy. (Required; cannot be the same as pagedir.)

usercs

Support checking RCS files for things like page history, page locker, and so on. Whether or not usercs is set, DWiki refuses to serve files ending with ,v or in RCS directories; see InvalidPageNames. As a result, setting usercs is only necessary if you want page history et al to be visible to people visiting the DWiki; you can use RCS yourself on page files without setting it.

rcsroot

The root directory of the separate RCS file storage hierarchy; used only if usercs is on.
Normally, RCS files are expected to be in RCS directories under pagedir, where basic RCS commands put them (if you make those directories; DWiki requires you to work this way). With this directive on, the RCS ,v files for files under pagedir are instead found under here, in a mirror of the directory structure in pagedir, so you have pagedir/foo/bar and rcsdir/foo/bar,v. This keeps pagedir neater at the expense of requiring some scripting support.

Web configuration

wikiname

The short, one-word name of this DWiki. You probably want to have a CamelCased name. This shows up as the name of the breadcrumbs, among other places. (Required.)

wikititle

The full, multi-word title of this DWiki.

wikiroot

The front page of the DWiki; the page you get redirected to when you request the DWiki's root. If this isn't set or doesn't exist, DWiki tries wikiname's value as a page name; if that doesn't work, people see the DWiki's root directory in a directory view.

rooturl

The URL of the directory that is the root of the DWiki instance; use '/' to mean 'the root of the web server'.

publicurl

If set, DWiki puts this directory's URL on the front of DWiki URLs instead of rooturl.

staticdir

The directory to serve static files from. DWiki only serves files from this hierarchy; requests for a directory will fail.

staticurl

The URL of the directory that is the root of static files. If staticurl doesn't start with a slash, it's taken as a subdirectory of rooturl. (Requires staticdir to be set.)

charset

If set, DWiki claims that all text/html and text/plain content it generates is in this character set in HTTP replies. Normally 'UTF-8' these days. If unset, DWiki does not label text/html and text/plain HTTP replies with character set information. You should set this to 'UTF-8'. Really. It shouldn't even be optional.

cssurlprefix

This is technically not a DWiki configuration directive as such because it isn't interpreted by the program. Instead it's used by the standard html/css template as one option for where to find DWiki's standard CSS file, dwiki.css. If this is set it's the URL of a directory (without a trailing slash). If this is not set, the html/css template assumes that dwiki.css can be found at ${staticurl}/dwiki.css. It's more efficient to serve dwiki.css outside of DWiki itself, since it's a static file.

Note that various parts of DWikiText rendering do not look right if the CSS is missing (in particular, all sorts of tables are likely to look bad).

DWiki URL to file mapping

When DWiki gets a request for a URL, it tries to turn it into a request for something under either staticurl (if defined) or rooturl; whatever is left after subtracting the appropriate thing is the path being served relative to staticdir or pagedir. staticurl is checked first, so it can be a subset of the URL space available under rooturl.

For safety reasons, DWiki only tries to process a request if the request's URL falls under either staticurl or rooturl. If DWiki receives a request for anything outside those two, something is clearly wrong and it generates a terse error page.

When it generates URLs for DWiki pages DWiki normally puts rooturl on front (as a directory). However, if you set publicurl DWiki puts that on the front instead.

This is useful if for internal reasons you receive requests with their URLs rewritten to something users shouldn't (or can't) use. The case ChrisSiebenmann knows is Apache with URL aliases and the DWiki CGI-BIN being run via suexec.

Authentication

See Authentication for more information on the authentication system.

authfile

Where DWiki can find user / password / group information for the DWiki's users. If this is set, the DWiki has authentication.

defaultuser

If set, all otherwise not authenticated connections get to be this user, if the user is in authfile. This should be used carefully, as it makes all requests to the DWiki be authenticated (since they all have a user, if even only the default user). If this is set, the username it is set to is said to be the 'guest user'.

global-authseed

This is a special magic token to make it harder to brute-force people's DWiki passwords in some situations. It can be any value and should be kept secret.

global-authseed-file

This is the file to read global-authseed from, if it is set. The file has no special format, but should contain some randomness and its contents should be kept secret.

authcookie-path

This controls the 'path=' value for the authentication cookies generated by DWiki. If not set to a value, we use the root URL; otherwise we use the value straight. If it is not set, authentication cookes have no explicit 'path=' setting. ChrisSiebenmann has come to believe that you don't want to set this, and it remains as a vestigial remnant.

logins-report-bad

If present in the configuration file, DWiki will log the username (or at least the first 50 characters of it) for bad logins with unknown usernames. This is not necessarily a good idea but at one point was interesting to track what form-stuffing spammers were doing.

Comments

commentsdir

The root directory for storing comments in. The only place DWiki writes permanent data to.

comments-on

Enable commenting in this DWiki. This requires that commentsdir be defined and that authentication be enabled.

comments-in-normal

Your standard templates display comments on the normal view of the page instead of the 'showcomments' view.

remap-normal-to-showcomments

DWiki will remap the 'normal' view for pages to the 'showcomments' view, thereby implementing comments-in-normal without you needing to change the standard templates.

If you want to enable anonymous comments you should create a guest user in the DWiki authfile and then set guest as the defaultuser. (Well, you can use the username of your choice, but guest is conventional.)

Caching

DWiki can optionally cache the results of page generation to speed up response time. See Caching for a longer discussion.

cachedir

The root directory for storing the caches. It should not be used for anything else (ie, not it should not also be pagedir, tmpldir, or commentsdir). DWiki will write scratch files to here.

cache-warn-errors

Log warnings about cache store errors. (These are non-fatal but indicate that your cache isn't caching.)

render-cache

Enable caching the results of selected renderers and renderer components. (Requires cachedir to be set.)

render-heuristic-ttl

The TTL of renderer cache entries with heuristic validators, in seconds. The default value is an hour.

render-anonymous-only

Use the renderer cache only for the guest user or for connections that are not authenticated.

render-heuristic-flagged-ttl

The TTL of renderer cache entries that have explicit invalidation (aka 'flagged' cache entries), in seconds. The default value is 48 hours, as explicit invalidation is considered safer than heuristic invalidation.

render-heuristic-flagged-delta

In order to lessen the chance of races between renderer cache invalidation and renderer cache regeneration, flagged cache entries must be at least this many seconds more recent than the invalidation marker (if it exists). Defaults to 30 seconds.

bfc-cache-ttl

Enable a brute force page cache of complete pages with a TTL of this many seconds. (Requires cachedir to be set.)

bfc-time-min

A complete page will be cached if it took at least this much of a second to be generated. Defaults to 0.75 of a second.

bfc-load-min

A complete page will be cached if the load average is at least this high. No default; the BFC normally doesn't look at the load average at all.

bfc-time-triv

Regardless of the setting of bfc-load-min, don't bother looking at the load average if the page took at most this long to generate. Defaults to 0.09 of a second.

bfc-atom-ttl

Use this TTL for Atom syndication requests, instead of the normal one.

bfc-atom-nocond-ttl

Use this TTL for Atom syndication requests that are not using conditional GET, and also force the caching of the results of these requests regardless of the load.

bfc-skip-robots

If set, this is a list of User-Agent substrings (formatted as for bad-robots, see later) for robots that should not cause entries to be put into the BFC.

imc-cache-entries

Enable an in-memory cache of complete pages with this many entries. The IMC skips all pages that the BFC skips. The IMC is only meaningful if the same process handles more than one request, so by default it is only enabled if DWiki knows that it is running using dwiki-scgi.py as a preforking SCGI server.

imc-force-on

Force the IMC on even if DWiki would not enable it. You probably only want to use this if you are running DWiki as a WSGI application inside a preforking WSGI server such as uWSGI.

imc-cache-ttl

The TTL, in seconds, of entries in the in-memory cache; must be provided if imc-cache-entries is.

imc-resp-max-size

The maximum size (in kilobytes) of pages that will be cached in the in-memory cache. The default value is 256 KB.

slow-requests-by

Delay all requests by this much, in fractional seconds. Normally used only for testing BFC.

In practice some degree of caching is mandatory for decent performance once your DWiki gets big enough and so it's recommended that you turn on render-cache and bfc-cache-ttl unless you have a good reason to do otherwise. Turn on imc-cache-entries and imc-cache-ttl if you're using SCGI.

Syndication feed controls

atomfeed-display-howmany

How many items at most an Atom feed should display. If set, it must be a positive integer; if not set, atom::pages and atom::comments use a default of 100 items.

feed-max-size

How many kilobytes atom::pages or atom::comments should try to limit their output to. If set, either stops adding new entries (regardless of how many entries have been processed already) once they have generated that many kilobytes or more of output. Because of the 'or more' clause, you should allow for a safety margin. If unset, syndication feeds are not size-limited.

feed-max-size-ips

If set, this is a whitespace separated list of IPv4 addresses, tcpwrappers style IPv4 address prefixes (eg '66.150.15.'), or IPv4 CIDRs (eg '66.150.15.0/25') that feed-max-size applies to. Syndication requests from any other addresses are not size-limited. If unset, feed-max-size applies to all syndication requests, regardless of what IP address makes the request. This option can be specified multiple times; if so, all the addresses are merged together.

feed-start-time

If set, pages older than this time will not appear in Atom feeds, which is handy if you want to move a DWiki, redirect the old URLs, and not flood people's Atom feeds (because the Atom <id> for pages is the page's full URL unless you've set atomfeed-tag). The value can be specified either as an integer Unix timestamp, as 'YEAR-MO-DA [HH:MM[:SS]]', 'YEAR/MO/DA', or an Atom format time string, and is always in local time (even when specified as an Atom format time string; sorry).

atomfeed-tag

If set, the atom::pagetag renderer will use it to generate Atom <id>s for pages in the format <tag>:/<page path>. This should normally be set to a tag:-based URI; see here for a discussion.

atomfeed-tag-time

If set, the atom::pagetag renderer will only generate tag-formatted Atom <id>s for pages more recent than this time. This can be used to make a graceful transition into tag-based Atom <id>s for an existing DWiki (and then, with feed-start-time, to graceful move it). This has the same time format as feed-start-time.

atomfeed-virt-only-adv

If set, restrict what Atom page feeds are advertised for virtual directories. If we are displaying a vdir and it is not a listed type, we advertise the Atom feed for the real directory instead (eg, for 'blog/2007/10/' the Atom feed advertised would be for 'blog/'). This is a space-separated list of vdir types; the allowed types are latest, oldest, range, calendar, and the calendar subtypes year, month, and day.

atomfeed-virt-only-in

If set, restrict what virtual directories allow Atom page feed requests. A disallowed latest or range feed request is (permanently) redirected to the real directory's feed; other disallowed feeds get 404 responses. The format and list of vdir types is the same as for atomfeed-virt-only-adv. If this is set, it becomes atomfeed-virt-only-in's default value. If both are set, this should be a superset of atomfeed-virt-only-adv's value; otherwise DWiki will advertise feeds that it will refuse requests for.
You should normally allow feeds for latest because this gives people a way of controlling how large a feed they pull from you; they can use, eg, 'blog/latest/10/?atom' to pull only a ten-entry feed instead of your full-sized feed.
These two directives don't change or affect what Atom comment feeds are advertised or allowed; they affect only Atom feeds for pages.

Other features:

alias-path

This sets the DWiki path for the third place to try to find CamelCase links in (see Formatting). This allows a DWiki to have a collection of CamelCase names for things that are globally usable but that don't clutter up the DWiki root directory.
This is a DWiki path, not a filesystem path (and is implicitly always an absolute DWiki path). The conventional value is Aliases.

search-on

enables searching. If it has the value 'authenticated', only authenticated users can search. Note that if you have a guest user set, all users are authenticated.

blog-display-howmany

How many items the blog::blog renderer should try to restrict most pages it displays to. If set, it must be a positive integer; if not set, blog::blog uses a default.

canon-hosts

If set, this is a space-separated list of canonical hostnames for this DWiki. If a request has a Host: header that is not in this list, DWiki immediately serves up a redirection to the first hostname in the list (or canon-host-url, if that is set), which is assumed to be the preferred hostname.

canon-host-url

If set, this is the canonical URL for the host of this DWiki (without the ending /, but including http or https and the port if necessary). DWiki will generate redirects and absolute URLs that use this URL. If canon-hosts is also set, this should be the full version of the first entry in canon-hosts.
(This is primarily useful in some hopefully unusual situations involving HTTP-to-HTTPS transitions.)

literal-words

If set, this is a list of strings, separated by ' | ' (space, |, space), that will be rendered literally and not considered to contain markup, as if each of them had been specified in '.pn lit <whatever>' processing note directives.

Special oddities

dump-req-times

Report the amount of time that requests took to standard error. This is set by the standard -T option.

dump-atom-reqs

Report on Atom requests to standard error. This is set by the standard -A option.

stamp-messages

Add timestamp and client IP address to messages reported by the above two options. This is set by the standard --stamp option.

These are documented because you might want to set them directly if you're running DWiki as a WSGI application inside some standard WSGI server (such as uWSGI, Apache's mod_wsgi, or gUnicorn).

Dealing with bad clients:

bad-robots

If set, this is a list of User-Agent substrings, separated by ' | ' (space, |, space), for robots that should get permission denied responses when they try to fetch pages in various views that no robot should be fetching. Currently the list of bad views is atom, atomcomments, source, and writecomment, all of which are typically fetched by robots that don't respect rel="nofollow" on links.

no-ua-is-bad-robot

If set, any request with a missing User-Agent header is considered to be from a bad robot.

banned-robots

If set, this is a list of User-Agent substrings (formatted as for bad-robots) for robots that should get permission denied responses on all requests.

banned-ips

If set, this is a list of IPv4 addresses, tcpwrapper style IP prefixes, or CIDRs (as for feed-max-size-ips) for addresses that will get access denied responses for all requests. It can be specified multiple times.

banned-comment-ips

If set, this is like banned-ips but only applies to attempts to write comments.

bad-robot-ips

If set, this is like banned-ips but only applies to requests that try to fetch pages in various views that no robot should be fetching (as in bad-robots).

Under normal circumstances it's more efficient to use your web server's access controls to totally ban IP addresses and bad user-agents; your web server usually has faster code for this and you don't have to get DWiki involved in the process. banned-robots and banned-ips exist because this is not always possible.

Authentication , 2013-08-30 11:39:21 by cks

DWiki Authentication

DWiki has optional support for authenticating users, which is a prerequisite for restricting access to pages and for allowing people to comment. User authentication is done by cookies, which means that people wanting to be authenticated have to accept cookies from the DWiki's web server.

Whether authentication is on is controlled by the authfile setting in the ConfigurationFile; if it is set, it specifies a password file for the DWiki. Once enabled, a login box will appear at the bottom of pages where people can enter their login and password into a form and submit it to the wiki. If the password is correct, DWiki will send back a login cookie and the session is now authenticated (provided that the user's browser then sends the cookie back to DWiki with future requests).

An authenticated person has a login name and may optionally be in some groups. When checking permissions, logins and groups are treated the same (so you should not create groups that have the same name as users; this is either pointless or dangerous, depending on how many people are in the group). What groups a login is part of is specified in the password file.

To be precise, an authenticated request is any request that has a valid associated login name. Normally this happens because the user's browser sent back a valid DWiki login cookie, but a DWiki may have a default login, set in the ConfigurationFile. If the default login is set and exists in the password file, everything is authenticated; either as a 'real' (passworded) login or as the default login.

Because DWiki is hard-coded to require authentication before people can write comments, setting a default user is the only way to let the world (potentially) comment on your DWiki.

Using Authentication

Authentication is used by the {{Restricted}} and {{CanComment}} DWikiText macros. Without arguments they restrict the page to authenticated people or allow comments by authenticated people (respectively). With arguments, they restrict things more tightly. There are two sorts of arguments:

positive arguments are plain logins or groups, and require the authenticated session to be one of the things named.

negative arguments start with '-' and are then logins or groups, and require the authenticated session to not be one of the things named.

If only negative arguments are given, anyone not mentioned passes; if both positive and negative arguments are given, you must pass the positive arguments and not fail the negative arguments.

Directories can create default permissions for everything under them by having a special file called __access with either or both of Restricted and CanComment macros. __access files are checked backwards from the page being looked at, and the first one that contains a Restricted or a CanComment (depending on what is at issue) wins. __access files can have other content, although ChrisSiebenmann doesn't expect people to look at them very often.

Note: this means that subdirectories can give back permissions that were denied by a higher-level directory. This is deliberate.

Authentication limits

DWiki authentication protects only file contents. It does not protect directory contents and it thus doesn't protect a page's (file) name. Moral: don't put sensitive information into page names.

Password security

Note: DWiki doesn't specially encrypt login / password information while it's being sent to the web server. Unless the entire connection is running over SSL, people can theoretically snoop the password in clear text.

DWiki doesn't store someone's clear text password (even in its password file); instead it stores a hash of the password, using a format that guarantees that if two different people use the same password they will get different hashes. (Barring the hash function itself being broken.)

As always, people should be strongly discouraged from using important passwords (eg, their Unix account passwords) for any web service, a DWiki included. Using one's Unix login name as one's DWiki login name is harmless and even convenient.

The cookie

The cookie DWiki uses has the login name in clear text, and is authenticated with an added hash value. If you want the gory details, see authcookie.py and htmlauth.py in the DWiki source code. With a proper global-authseed secret in the ConfigurationFile, it is believed to be secure from all brute-force attacks.

The cookie is normally quite long-lived. It becomes invalid if the user's password or the DWiki global authseed change.

The cookie is not restricted to coming from a single IP address or anything like that.

Format of the password file

The password file has a simple format. Blank lines and comment lines (lines that have a '#' character as their first non-whitespace) are ignored. Otherwise, lines have the format:
<login>	<password-hash>		[<group> ....]
There can be any amount of whitespace between elements; groups are optional.

The easy way to add logins or change passwords is with the dpasswd.py program in the DWiki source. Adding or changing groups, or deleting logins, you get to do by editing the file directly.

DWiki has no support for creating logins or changing passwords over the web. This is deliberate.

How you manage this process in general is up to you; in non-paranoid environments ChrisSiebenmann uses a group-writeable password file owned by an appropriate (Unix) group.

As a hack, the password file can also contain supplemental information about a DWiki login in the format:
.also <login>	<'real' name> | <url>
This line must come after the main line for a given login but it doesn't have to be immediately afterwards. If present the real name and URL are used as the default values for these when that user is writing comments. Either or both may be blank (although if both are blank, there's no point to the entire .also entry). Giving the default login a name (such as 'Anonymous') means that anonymous comments will not normally have their submission IP address shown (the default templates do not show the IP address if name information is available).

GlobalVariables , 2013-08-29 16:47:40 by cks

DWiki Global Variables

As TemplateSyntax discusses, one can use global variables in templates in several ways. However, it helps to know what global variables are available. Thus this incomplete listing.

First, all ConfigurationFile directives are available as global variables.

Then, during request processing DWiki internally defines a number of additional global variables:

page The current page's full path, in DWiki form.

abspage The current page's full path, including a '/' at the start.

pagename The page's name; its last path component.

pagetype The type of the page, usually 'file' or 'dir'.

view-format The current view being processed.

relname In blog::blog, the name of the current page relative to the blog directory being displayed.

basepage In a VirtualDirectory context, the full path of the non-virtual directory. Otherwise the same as page.

:wikitext:title After a piece of wikitext has been rendered (more exactly after any wikitext template renderer has been used, including wikitext:cache), this is its title if any exists. The 'title' of a piece of wikitext is the text of the header that is at the start of the text, if there is one. This is the same as the wikitext:title template renderer but may be more convenient to use.

:wikitext:title:nohtml This is the title but without any HTML markup, making it useful for eg a <title>. It's the same as the :wikitext:title:nohtml template renderer.

login The currently authenticated user.

comment-ip IP address that posted the current comment.

comment-login Login of the user that posted the current comment, if it is not the anonymous user.

comment-name The supplied name of the user that posted the current comment, if any.

comment-url The user's supplied website URL (if any) for the current comment.

:comment:post The result of an attempt to post a comment. One of 'good', 'bad', 'badchar', or 'nocomment' (the latter if it was an attempt to post an empty comment). (Only defined during comment posting.)

:error:error Error type. Only defined during error processing.

:error:code The numerical HTTP status code for an error. Only defined during error processing.

http-command The type of HTTP command being processed, either GET or POST.

http-version The (claimed) version of HTTP that the current request used.

remote-ip The IP address the current request came from.

server-name The hostname or IP address for this web server that the sender of the current request claims to have used.

Not all of these are defined all of the time. Generally a context-dependant variable is only defined when the current thing being processed has that sort of information.

There are other global variables that get set, but they are for more internal use, and you're best off browsing the source code for them.

`page`	The current page's full path, in DWiki form.
`abspage`	The current page's full path, including a '`/`' at the start.
`pagename`	The page's name; its last path component.
`pagetype`	The type of the page, usually 'file' or 'dir'.
`view-format`	The current view being processed.
`relname`	In blog::blog, the name of the current page relative to the blog directory being displayed.
`basepage`	In a VirtualDirectory context, the full path of the non-virtual directory. Otherwise the same as `page`.
`:wikitext:title`	After a piece of wikitext has been rendered (more exactly after any wikitext template renderer has been used, including `wikitext:cache`), this is its title if any exists. The 'title' of a piece of wikitext is the text of the header that is at the start of the text, if there is one. This is the same as the `wikitext:title` template renderer but may be more convenient to use.
`:wikitext:title:nohtml`	This is the title but without any HTML markup, making it useful for eg a <title>. It's the same as the `:wikitext:title:nohtml` template renderer.
`login`	The currently authenticated user.
`comment-ip`	IP address that posted the current comment.
`comment-login`	Login of the user that posted the current comment, if it is not the anonymous user.
`comment-name`	The supplied name of the user that posted the current comment, if any.
`comment-url`	The user's supplied website URL (if any) for the current comment.
`:comment:post`	The result of an attempt to post a comment. One of 'good', 'bad', 'badchar', or 'nocomment' (the latter if it was an attempt to post an empty comment). (Only defined during comment posting.)
`:error:error`	Error type. Only defined during error processing.
`:error:code`	The numerical HTTP status code for an error. Only defined during error processing.
`http-command`	The type of HTTP command being processed, either GET or POST.
`http-version`	The (claimed) version of HTTP that the current request used.
`remote-ip`	The IP address the current request came from.
`server-name`	The hostname or IP address for this web server that the sender of the current request claims to have used.

TemplatesUsed , 2013-08-27 10:17:02 by cks

What templates DWiki uses

Per ProcessingModel, DWiki ultimately produces output by expanding a template. This means that DWiki has to figure out what template to use for this process, and because the TemplateSyntax is fairly limited, it is much simpler for DWiki to start with a separate template for every different view of things it wants to have.

This means that while DWiki tries not to hardcode template names or the structure of the template directory, there are a certain amount of hardcoded names it knows about that need to be there for proper DWiki operation.

The short list of such templates is:

dwiki/view-*.tmpl, dwiki.tmpl: starting view templates.

views/*: conventional location for templates that display a particular ordinary view.

error.tmpl, errors/*: displaying errors (always 404 responses).

login-error.tmpl: displaying a login error (a regular page, not a 404).

Comment templates:
comment/comment.tmpl: used to show each comment when we're showing all comments.

comment/posting.tmpl: used to show the result of posting a comment. By convention, comment/posted-<result>.tmpl is used to display specific results, where <result> is one of 'good' (the comment was posted successfully), 'bad' (something went wrong), 'badchars' (the comment has bad characters in it), or 'nocomment' (the comment was empty and DWiki refused to post it).

blog/blogdirpage.tmpl: used to show each page in BlogDir view.

blog/blogentry.tmpl: used to show each page in Blog view.

syndication/atomentry.tmpl: used to render an Atom feed entry for each page.

syndication/atomcomment.tmpl: used to render an Atom feed entry for each comment.

syndication/rss2entry.tmpl: used to render an RSS 2.0 feed entry for each page.

All paths are relative to the template directory.

Determining a template for a view

For views that are displayed using templates, DWiki tries to find the starting template by looking in three places, in order:

dwiki/view-<view>-<pagetype>.tmpl

dwiki/view-<view>.tmpl

dwiki.tmpl

By convention, everything that generates text/html pages just goes through dwiki.tmpl so that there is one place that does top-level 'skinning' for the entire DWiki. Only views that both use templates and generate something besides text/html sidestep this.

The standard dwiki.tmpl uses the #{<...} first-found template inclusion mechanism (see TemplateSyntax) to pull in the real per-view content. It looks in four places to try to find this content, in this order:

Overrides/...$(page)/$(view-format).tmpl

Overrides/...$(page)/all.tmpl

views/$(view-format)-$(pagetype).tmpl

views/$(view-format).tmpl

The first two allow page and directory hierarchy specific overrides; the latter two are the generic places. Most views don't need to distinguish between file types, but the 'normal' view must use different templates for files and directories (since a directory doesn't have wikitext to display).

The current template-based views are: normal, history, search, blog, blogdir, atom, atomcomments, sitemap, showcomments, and writecomment. The login and logout views are 'synthetic' and don't actually display anything unless an error happens. The 'source' view simply dumps the page content out straight without getting anywhere near templates.

Note that the atom and atomcomments views are special: although they render through templates, they generate application/atom+xml content instead of text/html. Thus they use dwiki/view-* templates directly, bypassing dwiki.tmpl. The sitemap view is similarly special, although it generates application/xml content.

Error templates

Errors are rendered by the template error.tmpl. There are special error renderers error::title and error::body that look for error-specific additional templates in the subdirectory errors/. Each type of error looks for titles as errors/<error>-title.tmpl and main error body as errors/<error>.tmpl (with internal defaults if they don't exist).

Current error types: badaccess, badformat, badpage, inconsistpage, nopage.

Everything else is free and floating

That's it. DWiki has no other hardcoded template names.

Caching , 2013-03-06 13:46:35 by cks

DWiki's caching system

DWiki has optional caching in order to speed up generating results repeatedly. DWiki uses a disk-based cache for this (although the interface is abstracted and alternate forms of caching may be introduced someday). There are three caches, which can be enabled separately: the renderer cache, a brute force page cache, and an in-memory brute force page cache that is only used if DWiki is running as a preforking SCGI server.

DWiki never removes the files of out of date cache entries from the disk cache; instead, it stops considering out of date ones to be valid. Cleaning out the detritus is left for an external process. ChrisSiebenmann considers this safer; giving a program an automated unlink() makes him nervous.

See ConfigurationFile for the options controlling the behavior of the caches.

In theory DWiki's caching is optional. In practice a decent sized DWiki is simply too slow without caching for some of the more expensive operations and caching becomes more or less a necessity. ChrisSiebenmann now believes that you should configure all levels of caching in basically any DWiki unless you have some unusual need and are sure.

The brute force cache

The brute force page cache is about as simple as you can get: it caches complete requests for a configured time (called a time-to-live, or TTL). That's it. The BFC is intended as a load-shedding measure when DWiki is under significant load, so it only acts under certain circumstances:

only on GET or HEAD requests.

only on requests without a Cookie: header.

requests only get put into the cache if the system seems loaded.

(For speed, when something is valid in the cache DWiki just serves it without checking the system load.)

A good BFC TTL is on the order of 30 seconds to three minutes or so; long enough to shed significant load if you are getting a lot of hits to a few pages and short enough that dynamic pages won't become too outdated. (And that waiting to see a comment show up or whatever is not too annoying.)

Because Atom syndication requests are among the most expensive pages to compute, the BFC can be set to give them a longer TTL than usual. There is a second TTL that can be set for Atom requests that aren't using conditional GET; the idea is that if requesters cannot be bothered to be polite, we can't be bothered to serve fresh content. Setting this option always caches the results of such requests, even if the load is low, which means that even people doing proper conditional GET requests will use the cached results for as long as their (lower) TTL says to.

It's actually faster to serve static pages from the static page server code than from the BFC, so the BFC doesn't try to cache static pages.

The two sides of the BFC

It's important to understand that the BFC does not check load when it is checking to see if something is in its cache. This means there are two stages to processing a request: deciding what TTL to use for cache checks, and deciding whether to cache something that was not current in the cache.

The TTL used is:

bfc-atom-nocond-ttl if this is an unconditional request for an Atom view, if set.

bfc-atom-ttl for Atom view requests in general, if set.

bfc-cache-ttl otherwise.

Pages enter the BFC cache either because the system seems to be loaded or because bfc-atom-nocond-ttl was set and they were an unconditional request for an Atom view.

Once something is in the cache, it will be served from the cache if it is not older than the check TTL. Different requests can use different check TTLs for the same cached page; for example, conditional GETs versus other requests for Atom views.

The in-memory cache

The in-memory cache is essentially a version of the brute force cache that holds pages in memory instead of on disk. It's only effective in environments where DWiki serves multiple requests from the same process; currently it's only used if DWiki is running as a preforking SCGI server. Because it holds pages in memory as page response objects, the in-memory cache is about the fastest way that DWiki can serve requests. In particular it's faster to serve static pages from the IMC than from disk, so unlike the BFC the IMC does cache static pages.

Because IMC entries disappear automatically and are essentially free to create, the IMC caches pages unconditionally when active (unlike the BFC). This means that it should normally have a relatively low TTL, often lower than the BFC's TTL. Note that because the IMC is before the BFC, it can load its cache from BFC cache hits.

For obvious reasons, it's pointless to set the IMC cache size to be larger than the number of requests a preforked SCGI process will serve before exiting.

To keep IMC memory usage under control, the IMC has a settable maximum page size that it will cache. Tune this as appropriate for your environment.

The IMC can be deliberately forced on with imc-force-on, in case you're running DWiki in some other preforking environment (for example as a WSGI application under a preforking WSGI server such as uWSGI).

Considerations for the IMC TTL

Under some setups, DWiki will only be running as a (preforking) SCGI server when it's under heavy load; in others DWiki is running this way all of the time, even when the load is light. Because the IMC unconditionally caches pages the latter situation can be annoying; it means that someone who, say, writes and posts a new comment may not see that comment until the IMC TTL expires. DWiki makes some attempt to bypass the IMC (and the BFC) in the common case of someone leaving a comment. However this is not perfect (in part because it requires the web browser to accept cookies from DWiki).

(This also applies if you're running DWiki in some other preforking environment and have forced the IMC on.)

If you're running DWiki full time in an IMC-on environment, you likely want to set a quite low IMC TTL, such as 15 to 30 seconds. If you're running DWiki with the IMC on only under heavy load you can set a higher IMC TTL, such as two minutes (120 seconds).

The renderer cache

The renderer cache is actually two caches. The renderer cache proper caches the output of various renderers (cf TemplateSyntax). The output is cached with a validator and the cached results are fully validated before they get used; this means that renderer cache entries do not normally use a TTL and in theory could be valid for years.

The (heuristic) generator cache caches the output of some expensive precursor generator routines. These cache entries only have heuristic validators, where DWiki can be fooled if people try hard enough. Generator cache entries do have a TTL, so that if the heuristic is fooled DWiki will pick up the new result sooner or later. Some cache entries can also explicitly invalidated by DWiki in a pretty reliable process; by default, these have a much longer TTL than plain heuristic cache entries. These are called 'flagged' (heuristic) generator entries and various ConfigurationFile settings controlling how they behave are render-heuristic-flagged-....

(Trivia: the 'flagged' name is because such entries are invalidated using a flag file, or more accurately a flag cache entry.)

Currently the main renderer cache caches the output of various wikitext to HTML rendering routines while the generator cache caches the results of various filesystem 'find all descendents' walks that are used to build lists of comments (for Atom comments feeds and some wikitext macros; this uses explicit invalidation) and lists of pages (for Atom feeds and various blog renderers such as blog::prevnext).

Unfortunately, a DWiki page that has comment or access restrictions must be cached separately for each DWiki user that views it. Under some situations this can result in a number of identical copies being cached under different names. If you want to avoid this, DWiki lets you turn off renderer caching for non-anonymous users.

Force-invalidating list of pages caches

The general validator for blog::prevnext cache entries is the modification time for all of the directories involved that had files in them at the time (the latter condition is for technical reasons). The heuristical validator checks that some of the file timestamps are still the same, but it can't check all of them and still be a useful cache.

So the easy way to invalidate this is to change the modification time of a directory involved, for example with touch.

The 'list of pages' cache is similarly invalidated by changing a directory modification time. Unlike the blog::prevnext case, the directory times are the only thing that this cache checks. This is a bit of a pity but the performance improvements from caching this information are very visible.

Disk space usage and directories

Much like comments, each page that has something cached for it becomes a subdirectory, with the various cached things in files. The different sorts of caches use different top-level directories under the cachedir, so you have paths like cachedir/bfc, cachedir/renderers, and cachedir/generators.

Because some results include absolute URLs that mention the current hostname, DWiki must maintain separate caches for each Host: header it sees in the BFC and the general renderers cache. These are handled as subdirectories in each cache directory, so cachedir/bfc/localhost/... and so on. Entries in the generator cache don't depend on the current Host: header, so there is only one (sub)cache for all requests, cachedir/generators/all/....

Generally the general renderers cache uses the largest amount of disk space, followed by the BFC, and the generator cache is the smallest.

If you're using caching (and as mentioned, you probably want to), you'll want to periodically trim the caches. ChrisSiebenmann just does this by hand every so often by removing the cache directories entirely; DWiki will then rebuild them as necessary.

Formatting , 2006-03-25 21:01:04 by cks

Ye Olde Formatting Teste

This is, naturally, a test of how all of our formatting actually looks. (It has turned into sort of documenting things, too. You probably want to view the source, using the toolbar at the bottom.)

Lists:

I'm not going to try to explain lists. Once you follow View Source, it should be obvious. The only tricky bit is that a list line that is continued on following lines must be indented; a flush-left line will be taken as returning to the paragraph (or starting one). List demo:

This is an unordered list.

with another entry continued on another line (see View Source).
A nested numbered list.
... and going deeper!

Still nested, but we've switched styles.

And we can abruptly switch styles as we de-nest, as we did here.

And back to the original list entry.

back down to the original unordered list.

And back to the land of happy paragraphs. We've also got the third type of list on hand:

definition lists

... which may come in handy when I get around to writing up detailed documentation on this thing.

this thing being

DWiki. Configuration, TemplateSyntax, operation, etc.

Definition lists don't come out quite as they're written in ASCII, but it's closer than some of the other choices. Definition lists nest with more leading characters, like the other lists.

Nested Lists

The primary way of writing nested lists is to actually indent the nested lists in the wikitext, as you can see in the 'View Source' for this page. Sometimes this is awkward; in that case, you can use more than one of the list-start characters, like this:

A list.
Another nested list with another line.
Really nested.

List types can change.

And we're done.

Note that if you continue such a nested list on a new line, the new line's indentation must be deeper than the start characters for the list.

Tables:

left right

up 10 20

down 30 40

Tables are extremely low-rent. Chris figures that this pretty much matches what he wants.

The downside is that more complicated tables may render, how shall we say, a little less than optimally. You're best off sticking to tables that have something in all of the cells and that are always the same shape.

'Horizontal' tables, where the only border lines are horizontal and fainter, are created by starting a table with |_. instead of |:

Code Meaning Look at body?

200 Successful page fetch Yes (if GET)

301 (Permanent) redirect No

304 Not modified No

404 Access denied (don't retry) Only for error text

FIXME: I need to do more work on styling tables well. At the moment they are barely better than just sticking ASCII blocks in. I can steal ideas from other WikiText implementations.

Links:

Link formats:

Straight URL as text: http://www.google.com/

Explicitly marked wiki links: Chris. For in-wiki links, the name shown is shortened to the last component. These can also be external http:// links, or absolute local URLs if they're written with < and > around them, eg [[</>]].

Marked links with specific text: the crazed person behind this. Because I keep using || instead of just |, you can use either to separate the label and the link.

Marked links with space-separated words: Chris Siebenmann. The last word is taken as the link destination.

CamelCase names as wiki links: People/ChrisSiebenmann. CamelCase words are only links if the target page exists.

The preferred link format for internal wiki links is the explicit wiki link [[....]], because that allows a wider variety of useful names than CamelCase. (eg, I do not want to have to CamelCase the names of all of the machines I want to write about.)

Interpreting wiki links

The DWiki path '/' is the wiki root, in an analogy to Unix and URI roots. An absolute wiki link starts with / and always refers to that absolute DWiki page.

[[...]] links are considered relative by default (and can include '..' and so on as desired), except that if there's no page by the relative name and there is a page if we consider it an absolute link, DWiki does so. (This keeps me from having to write / at the start of all my absolute links in [[...]]'s.)

CamelCase links are considered absolute by default, but if the absolute version isn't found and a relative version is, that gets used. If neither is found, DWiki tries an alias directory if that's configured, and if that fails the CamelCase is not a link.

Thus all of the CamelCase DWiki's in this paragraph actually refer to the root /DWiki. The wiki link DWiki refers to the current directory one, /dwiki/DWiki. (This is unfortunately obscured by DWiki's new habit of rewriting CamelCase links that point to redirects to the redirection target, but trust Chris, this is what's actually happening.)

[Actually these days DWiki retargets all links that point to redirects, which may or may not be the right thing to do but does make it much harder to see this.]

Link abbreviations

Every time you give a [[...]] link both text and a link (with either [[...|...]] or [[... ... ...]]), DWiki remembers the pairing of the text and the link. Afterwards, you can use either as a link abbreviation; it will expand to the full pair.

The (almost) unambiguous form is to use | at the start or the end of the [[...]]: at the end to use the name of the link, at the end to use the URL of the link (whichever is shorter). Eg, Chris Siebenmann.

You can write [[<text>]] without the |. This is always taken to be a name abbreviation, and only if there are spaces in <text> or <text> isn't an absolute URL (http:// or with < and > around it) or a real DWiki page.

Text formatting:

Running text (in paragraphs, lists, tables, and in general all containers) is styled with fonts, links, macros, and magic line breaks.

A ' \\' (space backslash backslash) at the end of a line, and only at the end of a line, produces a <br/>.

Font styles:

Bold, type writer text, and italic. Note that if we don't close one, like say ~~bold here, that it dies at the end of the paragraph.

No stray formatting putting 2/3rds of your text into italic, nosirree. I like my formatting self-contained.

There is one other font style: code style, which produces things like 'char *dp_null;'. Code style is monospace with no further font interpretation, and is done by a ((...)) construct. It exists because ChrisSiebenmann keeps doing it by other, hackier means.

I could have used / for italics, but one major usage of dwiki is going to be documenting our Unix systems. When doing this I will be writing a lot more file paths than italics. Similarly, monospace gets used more often than italics (or underline).

NOTE: the font styles are applied with heuristics. See DWikiText for the full details.

Macro font styles

The ST macro is written {{ST:<style>:text ...}}, and formats the enclosed text in the given HTML font style, which must be one of big, small, strike, sub, sup, or u.

The C macro is used to insert a HTML character entity as either a decimal number, a hexadecimal number starting with x, or a named character entity from the list in CharacterEntities. Note that not all of them are sensible entities, and some of the more exotic of these may not render in the browser of your choice, although all of them are valid HTML 4.01 transitional.

Some examples: И, the Cyrillic capital letter "I"; 水, the Chinese character for water. Certain sorts of cuteness are ruthlessly exterminated, like {{C:funky}}, {{C:10}} or {{C:x1F}}.

Having numeric character entities be valid in your DWiki's chosen character set is up to you. (Of course, the only really sensible character set these days is utf-8.)

HTML <abbr> elements are written {{AB:<abbreviation>[:text ...]}}. Once an abbreviation has been used once its expansion is remembered, so you can write API once and then thereafter use just {{AB:API}} to get API. An abbreviation without an available expansion is considered an invalid macro, so that you notice.

(AB torture test: SWD.)

Unlike most macros, these can be used in comments.

Others:

A line of dashes will produce a horizontal cut, like:

this. You need at least four. These can come pretty much anywhere.

If you don't like really big horizontal lines, there's also the three-stars separator style, like so:

* * *

This is written as '* * *' without line indentation (although you can put more whitespace between the stars if you want).

Indents produce code:
Like so.
This is literal preformatted text and is going to stay that way.
(I suppose you can do ASCII tables if you're so inclined.)
Notice that that was all one <pre> block. Also notice that that HTML markup was quoted, just like this '&' will be.

You need at least one whitespace character on the line. More than one whitespace character produces real in-<pre> whitespace out of the rest, like so:
Left.
 Indented one more space, with & and <pre> thrown in as a bonus.
Back left.
Quoting things

I don't quite know what to call this, but you can quote things just like you would in email: put '> ' at the start of the quote lines.

Like so.
This is a new quoted paragraph.

Quotes nest, too.

You can put anything in a quote that you could put in normal text, and it will all work out right. For example:

lists.

and everything nests.

Even if you go back one level.

Quotes disappear when you stop putting the quotes in. Despite what the semantic markup people may tell you, feel free to use quotes to produce indentation if it works for you.

Macros

{{...}} is a macro. Macros are used to do special magic expansion. Macros can take parameters, separated with :'s. Available macros currently are:

AB: Generate an inline HTML <abbr> element. The first argument is the abbreviation and the following arguments are the expansion. Once the abbreviation has been used once, the expansion is optional.

AllPages: List all pages. Arguments are prefixes of page paths and page names to restrict the list to. If you simply want to list all pages under a particular directory, you should use AllPagesUnder instead; it is more efficient (and more aesthetic).

AllPagesUnder: List all pages under a particular directory, in alphabetical order. Page names are shown relative to this directory (eg 'fred' instead of 'blog/fred' if blog is the directory). If there is no argument, the directory is the current directory of this page; if there is a single argument, it is the directory.

C: Insert a character entity. The character entity may be given as a decimal number or as a HTML 4.01 character entity name. See the ShowCharEnts macro for how to display the list of known character entity names.

CanComment: Allow authenticated users to comment on a page. Arguments are users to allow or deny access to, as with the Restricted macro. A DWiki without authentication disallows comments, as no one is authenticated.

CutShort: Cut off rendering a page right at that point in some contexts. Optional arguments restrict this effect to the specified view(s); it's generally not useful to do this, but if you want to some values are blog, blogdir, or atom. Rendering as a full page can never be cut off. An important note: CutShort is not really a macro. You must put it at the start of a non-indented, non-nested line; it doesn't work anywhere else in text.

DocAll: Enumerate all of the first argument (must be 'macros', 'processnotes', 'renderers', or 'textmacros') with their documentation, if any, as a real HTML list. (In other words, you're reading its output.)

EnumerateAll: Enumerate all of the first argument, which must be 'macros', 'processnotes' 'renderers', or 'textmacros' as a comma-separated list. The short form version of DocAll.

IMG: Generate an inline image. Usage is {{IMG:<loc> width height alt text ...}}, where the height should normally be 'auto', which set the image so that it will automatically scale down for the browser width in modern CSS environments. If the location is not absolute (http:, https:, or starts with a /) it is taken as a location relative to the DWiki staticurl directory. The location cannot include spaces; % encode them if necessary. After the first time you use an image, specifying the width, height, and alt text is optional; if not specified, they default to the last values. If the alt text contains ' ||| ', it is split there to be alt text (before) and title text (afterwards). The title text is what browsers show when you hover over the image.

ListDir: List what's in the current directory. An argument restricts it to either files ('files') or subdirectories ('directory').

ListRefs: List pages with references to one of the arguments, or where one of the arguments is a word in the page name. This is an expensive operation in a DWiki of any decent size, since it must search through all pages.

MatchingPagesUnder: List all pages under a particular directory if their page name contains a word in the provided word list. Usage is {{MatchingPagesUnder:directory:match ....:option ...:exclude ...}}; the third and fourth arguments are optional and may be blank or omitted. Page names are shown relative to the directory (eg 'fred' instead of 'blog/fred' if blog is the directory). By default, pages are shown in reverse chronological order. The optional third argument sets various options, and may include 'c[hron]' for the default reverse chronological order, '+c[hron]' for forward chronological order, 'a[lpha]' for alpbahetical order based on the page name, 't[able]' for a table display similar to what TitleIndex shows, and 'u[tilstoo]' to include utility pages in the results. Without the 't' option, this may be used with PTitles or Striped, although use with PTitles doesn't change the alphabetical sort to use the titles; it is still based on page names.

PTitles: Make another macro generate lists of pages using the titles of the pages (if possible), instead of the names of the pages. Invoked as {{PTitles:<macro>[:arg:arg...]}}.

RecentChanges: List recently changed pages. First argument is how many to cut the list at, default 50; 0 means no limit, showing everything. Additional arguments are which directories to include or (with a dash at the start) to exclude from the list; you can use '.' to mean the current directory. To preserve the default limit, use a value for the first argument that is not a number. If we're Striped, list pages under their name not their full path.

RecentCommentedPages: List pages with recent comments. Arguments are the same as for RecentChanges.

RecentComments: List recent comments. Arguments are the same as for RecentChanges. Use with Striped is somewhat dubious.

Restricted: Restrict a page to authenticated users. Arguments are which users or groups to allow access to or, with a dash at the front, to deny access to. If both allow and deny arguments are given, the viewing user must pass both tests. Restricted has no effect if the DWiki has no authentication configured.

ST: Style text with a particular HTML font style. The first argument is the HTML font style; the remainder are the text to be in that style. Valid styles are big, del, ins, small, strike, sub, sup, and u.

ShowCfgVar: Insert the value of a DWiki configuration variable. The argument is which variable to insert. Only a few variables may be displayed, currently wikiname, wikititle, server-name, pagedir, and charset.

ShowCharEnts: Show all the known character entities accepted by the C macro as a real HTML list. Takes no arguments.

Striped: Make another macro generate lists of pages as a comma-separated line, instead of the real list it would normally use. Invoked as {{Striped:<macro>[:arg:arg...]}}.

TitleIndex: Insert a table of dates and entry titles (or relative paths for entries without titles), linking to entries and to the day pages. The table is in reverse chronological order. The single argument is the page hierarchy to do this for; if it is not specified, it is done for the current directory. The actual rendering is done by the blog::titleindex renderer.

This list is generated by the same code that finds macros when turning DWikiText into HTML, so it's guaranteed to be complete. The documentation is hopefully complete, but ChrisSiebenmann may have forgotten to update (or provide) it when he changed the code.

Macros that generate lists of pages generate them as links to the pages in question, which is what you want.

Escaping things:

You can put a ! in front of http://foobar, [[..]], or {{..}} to escape their special meaning. Technically this just escapes the meaning of the special leadin, leaving everything else to get styled stylishly.

If you write [[<text>|]], ie you supply no link name/URL, the text is produced un-DWikiText-ified. (This is different from the case where there is a link, in which the text will be DWikiTexted for fonts (but not links or macros).) This is the only genera way to escape font styling (as ((...)) is not exactly general).

Testing: Google. Yep, that text is styled.

Pragmas

Pragmas have to be the very first line in the page. There are two:

#pragma pre (or #pragma plaintext) forces the rest of the page to be treated as plaintext, not wikitext.

#pragma search DIR1 [...] adds any listed directories to where DWiki searches for relative links, after all of the hard-coded searches.

The search pragma is handy when drafting pages somewhere other than their final directory.

And that's all folks

At least until ChrisSiebenmann starts adding more.

Disclaimer: not entirely guaranteed to be complete and comprehensive. See wikirend.py in the source code.

	left	right
up	10	20
down	30	40

Code	Meaning	Look at body?
200	Successful page fetch	Yes (if `GET`)
301	(Permanent) redirect	No
304	Not modified	No
404	Access denied (don't retry)	Only for error text

CharacterEntities , 2006-02-25 02:17:06 by cks

Character entity names recognized by DWiki

This is the automatically generated big list of character entities recognized by (this) DWiki in the C macro. For the meanings and more precise specifications of these, see the Character entity references in HTML 4 from the HTML 4.01 specification.

Note that not everything in this list may show as a character in your browser, or in other people's browsers. Also, this list is pretty much the full HTML 4.01 list. Older versions of HTML don't define all of these entities; the Wikipedia page is the best reference for what entities are defined in what version of HTML. (Since DWiki doesn't know what HTML version you're labelling its output as, it can't restrict the entities C will accept.)

The character entity list, with names and how they show:

Aacute: Á

aacute: á

Acirc: Â

acirc: â

acute: ´

AElig: Æ

aelig: æ

Agrave: À

agrave: à

alefsym: ℵ

Alpha: Α

alpha: α

amp: &

and: ∧

ang: ∠

Aring: Å

aring: å

asymp: ≈

Atilde: Ã

atilde: ã

Auml: Ä

auml: ä

bdquo: „

Beta: Β

beta: β

brvbar: ¦

bull: •

cap: ∩

Ccedil: Ç

ccedil: ç

cedil: ¸

cent: ¢

Chi: Χ

chi: χ

circ: ˆ

clubs: ♣

cong: ≅

copy: ©

crarr: ↵

cup: ∪

curren: ¤

Dagger: ‡

dagger: †

dArr: ⇓

darr: ↓

deg: °

Delta: Δ

delta: δ

diams: ♦

divide: ÷

Eacute: É

eacute: é

Ecirc: Ê

ecirc: ê

Egrave: È

egrave: è

empty: ∅

emsp:

ensp:

Epsilon: Ε

epsilon: ε

equiv: ≡

Eta: Η

eta: η

ETH: Ð

eth: ð

Euml: Ë

euml: ë

euro: €

exist: ∃

fnof: ƒ

forall: ∀

frac12: ½

frac14: ¼

frac34: ¾

frasl: ⁄

Gamma: Γ

gamma: γ

ge: ≥

gt: >

hArr: ⇔

harr: ↔

hearts: ♥

hellip: …

Iacute: Í

iacute: í

Icirc: Î

icirc: î

iexcl: ¡

Igrave: Ì

igrave: ì

image: ℑ

infin: ∞

int: ∫

Iota: Ι

iota: ι

iquest: ¿

isin: ∈

Iuml: Ï

iuml: ï

Kappa: Κ

kappa: κ

Lambda: Λ

lambda: λ

lang: ⟨

laquo: «

lArr: ⇐

larr: ←

lceil: ⌈

ldquo: “

le: ≤

lfloor: ⌊

lowast: ∗

loz: ◊

lrm: ‎

lsaquo: ‹

lsquo: ‘

lt: <

macr: ¯

mdash: —

micro: µ

middot: ·

minus: −

Mu: Μ

mu: μ

nabla: ∇

nbsp:

ndash: –

ne: ≠

ni: ∋

not: ¬

notin: ∉

nsub: ⊄

Ntilde: Ñ

ntilde: ñ

Nu: Ν

nu: ν

Oacute: Ó

oacute: ó

Ocirc: Ô

ocirc: ô

OElig: Œ

oelig: œ

Ograve: Ò

ograve: ò

oline: ‾

Omega: Ω

omega: ω

Omicron: Ο

omicron: ο

oplus: ⊕

or: ∨

ordf: ª

ordm: º

Oslash: Ø

oslash: ø

Otilde: Õ

otilde: õ

otimes: ⊗

Ouml: Ö

ouml: ö

para: ¶

part: ∂

permil: ‰

perp: ⊥

Phi: Φ

phi: φ

Pi: Π

pi: π

piv: ϖ

plusmn: ±

pound: £

Prime: ″

prime: ′

prod: ∏

prop: ∝

Psi: Ψ

psi: ψ

quot: "

radic: √

rang: ⟩

raquo: »

rArr: ⇒

rarr: →

rceil: ⌉

rdquo: ”

real: ℜ

reg: ®

rfloor: ⌋

Rho: Ρ

rho: ρ

rlm: ‏

rsaquo: ›

rsquo: ’

sbquo: ‚

Scaron: Š

scaron: š

sdot: ⋅

sect: §

shy:

Sigma: Σ

sigma: σ

sigmaf: ς

sim: ∼

spades: ♠

sub: ⊂

sube: ⊆

sum: ∑

sup: ⊃

sup1: ¹

sup2: ²

sup3: ³

supe: ⊇

szlig: ß

Tau: Τ

tau: τ

there4: ∴

Theta: Θ

theta: θ

thetasym: ϑ

thinsp:

THORN: Þ

thorn: þ

tilde: ˜

times: ×

trade: ™

Uacute: Ú

uacute: ú

uArr: ⇑

uarr: ↑

Ucirc: Û

ucirc: û

Ugrave: Ù

ugrave: ù

uml: ¨

upsih: ϒ

Upsilon: Υ

upsilon: υ

Uuml: Ü

uuml: ü

weierp: ℘

Xi: Ξ

xi: ξ

Yacute: Ý

yacute: ý

yen: ¥

Yuml: Ÿ

yuml: ÿ

Zeta: Ζ

zeta: ζ

zwj: ‍

zwnj: ‌

Some tedious but possibly important detail: DWiki takes its list of known character entities from the Python htmlentitydefs module.

TemplateSyntax , 2005-09-27 16:59:21 by cks

What Templates Can Contain

Templates are literal text except for four magic template expansions (call them substitutions or macros if you want): ${...}, #{...}, @{...}, and %{...}.

Generally, it is a fatal error for any of the expansions not to work: undefined variables, missing templates, no renderer by the name you listed in the template, etc.

${...} inserts the value of the named global variable. There are three modifiers to variable expansion:

${|var1|var2|...} is alternatives: it inserts the value of the first of var1, var2, etc that are defined.

${?...} is error-free expansion: it makes it not an error for the rest of the expansion to use undefined variables; instead an empty result is inserted. ${?|...|..} works.

${!...} is cancelling expansion: if the variable or variable sequence isn't defined, the whole template produces nothing.

A ? or ! modifier must come before a | modifier.

Variable expansion always produces valid HTML-quoted results.

@{...} invokes the named renderer and inserts its output. That's it; renderers take no arguments (or guff).

%{...} invokes the named renderer and inserts its output, except that if the renderer produces no output the entire template will produce no output. Thus a template consisting of
Last modified: %{lastmodified} <br/>
is entirely empty if the lastmodified renderer produces nothing, instead of being 'Last modified:' and a line break (which would look ugly).

#{...} is template inclusion: the named template is recursively expanded. Template names are just file names for files under the template root directory (set in DWiki's configuration file). There are three variations:

#{|t1|t2|...} is alternative expansion: it inserts the first of t1, t2, etc that expanded to something non-blank.

#{?t1|t2|...} is conditional expansion: it only expands the additional templates if t1 expanded to something.

#{<t1[|t2|...]} and #{!t1[|t2|...]} are first found expansion, and requires a longer explanation.

First Found Expansion

First found expansion is a way of testing a number of possibly existing templates and using the first one that actually exists. With the #{!...} form it is a fatal error for no template to be found; with the #{<...} form it is not, and the whole expansion is just empty.

Each of the t1, t2, etc alternatives are paths, augmented with expansion operators. There are two:

$(<varname>) expands a global variable, like ${...} at the template level. (Unlike ${...}, the variable expansion is not HTML-quoted.)

...<rest> first tries the full path <rest>, and then tries backing up to each of <rest>'s parent directories until they run out. (That's a literal three dots at the start.)

An example may help. With a $(pagename) of dwiki/TemplateSyntax and a $(view-format) of normal, the template inclusion
#{<Overrides/...$(pagename)/magic.tmpl|default/$(view-format).tmpl}
would first try Overrides/dwiki/TemplateSyntax/magic.tmpl, then Overrides/dwiki/magic.tmpl, then Overrides/magic.tmpl, and finally default/normal.tmpl.

An example:

This DWiki uses %{..} and #{|t1|t2} expansion to produce a nice message about a directory being entirely empty of pages if it is, instead of 'The following pages are available in this directory:' followed by nothing at all. (You can see this at Tests/SubTestDir.)

A simplified version of the template for directories is:
#{structure/header.tmpl}
<h1> Directory ${pagename} </h1>
#{|dir/dirconts.tmpl|dir/dirempty.tmpl}
#{structure/footer.tmpl}
The dir/dirconts.tmpl template is:
<p>The following pages are available in this directory: %{listdir}</p>
while the the dir/dirempty.tmpl template is:
<p> This directory is empty. </p>
The %{listdir} in dirconts.tmpl makes the entire template empty if the listdir renderer returns nothing (ie, the directory is empty). Then the #{|..|..} sees that the first template is empty and goes on to use diremtpy.tmpl. If there are files in the directory, the dirconts.tmpl template has content and dirempty.tmpl does not get used.

Available renders

For convenience (mostly ChrisSiebenmann's), here is the canonical list of all available renderers. This is generated by the code itself, so is is guaranteed to be 100% accurate (at least for names; your mileage may vary for documentation):

anchor::comment: Generate an anchor start for the current comment. You must close the anchor by hand.

anchor::self: Generates an anchor start where the name is the full path to the current page. You must close the anchor by hand.

anchor::short: Generates an anchor start where the name is the name of the current page. You must close the anchor by hand.

atom::autodisc: Generate a suitable Atom feed autodiscovery <link> string, suitable for inclusion in the <head> section. Generates nothing if there is no Atom recent changes feed.

atom::comment: Display the current comment in a way suitable for inclusion in an Atom feed.

atom::commentfeed: Generate a link to the Atom comments feed for the current page, if comments are turned on.

atom::commentid: Generate a hopefully unique ID for the current comment.

atom::comments: Generate an Atom feed of recent comments on or below the current page. Each comment is rendered through syndication/atomcomment.tmpl. Supports VirtualDirectory restrictions, which limit which pages the feed will include comments for.

atom::commentstamp: Generate an Atom feed format timestamp for the current comment.

atom::commenturl: Generate the URL for the current comment.

atom::dirfeed: Generate a link to the Atom feed for the current page if the current page is a directory or the wiki root.

atom::feeds: Generate a comma-separated list of all Atom feed links, that are applicable for the current page.

atom::feedtitle: Generate an Atom feed title for the current page.

atom::feedurl: Generate the URL of this page for the current feed.

atom::modstamp: Generate an Atom timestamp for the current page based on its change time.

atom::now: Generate an Atom timestamp for right now.

atom::pages: Generate an Atom feed of the current directory and all its descendants (showing only the most recent so many entries, newest first). Each page is rendered through syndication/atomentry.tmpl, which should result in a valid Atom feed entry. Supports VirtualDirectory restrictions.

atom::pagetag: Generate an Atom entry ID. If the atomfeed-tag configuration option is not defined, this is the same as atom::pageurl. If atomfeed-tag is defined, the entry ID is <tag value>:/<page path>. If atomfeed-tag-time is defined, only pages from after this time are given tag-based IDs; for pages before then, this is the same as atom::pageurl.

atom::pageterse: Generate wikitext:terse run through a HTML entity quoter, thus suitable for use in Atom feeds.

atom::pageterse:notitle: Generate wikitext:terse:notitle run through a HTML entity quoter, thus suitable for use in Atom feeds.

atom::pageurl: Generate the URL of this page in its normal view.

atom::recentcomment: Generate an Atom format timestamp for the most recent comment that will be displayed in a comment syndication feed.

atom::recentpage: Generate an Atom format timestamp for an Atom page feed for the current directory (and all its descendants).

atom::timestamp: Generate an Atom timestamp for the current page.

auth::loginbox: Generate the form for a login or logout box. Generates nothing if DWiki authentication is disabled. As a side effect, kills page modification time if it generates anything.

blog::blog: Generate a Blog rendering of the current directory: all descendant real pages, from most recent to oldest, possibly truncated at a day boundary if there's 'too many', and sets up information for blog navigation renderers. Each displayed page is rendered with the blog/blogentry.tmpl template. Supports VirtualDirectory restrictions.

blog::blogdir: Generate a BlogDir rendering of the current directory: display all real pages in the current directory from most recent to oldest, rendering each with the template blog/blogdirpage.tmpl. Supports VirtualDirectory restrictions.

blog::date: Generates a YYYY-MM-DD timestamp of the current page.

blog::datecrumbs: Create date breadcrumbs for the blog directory if the current page is in a blog directory but is not being displayed inside a virtual directory. The 'blog directory' is the directory that made the blog view the default view.

blog::datemarker: Inside a blog::blog or blog::blogdir rendering, generate a YYYY-MM-DD date stamp for the current page if this has changed from the last page; otherwise, generates nothing.

blog::namedate: Generate a Month DD, YYYY timestamp of the current page.

blog::next:title: Create a link to the next page (if one exists) for the current page if the current page is in a blog directory but is not being displayed inside a virtual directory; the title of the link is the page's title if available. The 'blog directory' is the directory that made the blog view the default view.

blog::owner: Display the owner of the current page.

blog::prev:title: Create a link to the previous page (if one exists) for the current page if the current page is in a blog directory but is not being displayed inside a virtual directory; the title of the link is the page's title if available. The 'blog directory' is the directory that made the blog view the default view.

blog::prevnext: Create Previous and Next links for the current page if the current page is in a blog directory but is not being displayed inside a virtual directory. The 'blog directory' is the directory that made the blog view the default view.

blog::seemonthyear: With blog::blog, generate a 'see more' set of links for the month and the year of the next entry if the display of pages has been truncated.

blog::seemore: With blog::blog, generates a 'see more' link to the date of the next entry if the display of pages has been truncated; the text of the link is the target date. This renderer is somewhat misnamed.

blog::time: Generate a YYYY-MM-DD HH:MM:SS timestamp of the current page.

blog::timeofday: Generates a HH:MM:SS timestamp of the current page.

blog::titles: Like blog::blog, except that instead of rendering entries through a template, it just displays a table of dates and entry titles (or relative paths for entries without titles), linking to entries and to the day pages. Respects VirtualDirectory restrictions. Unlike blog::blog, it always displays information for all applicable entries.

breadcrumbs: Display a 'breadcrumbs' hierarchy of links from the DWiki root to the current page.

comment::atomlink: Just like comment::countlink, except that the URL is absolute and the HTML is escaped so that it can be used in an Atom syndication feed.

comment::author: Display the author information for a comment, drawing on the given name, website URL, DWiki login, and comment IP address as necessary and available. Only works inside comment::showall. This potentially generates HTML, not just plain text.

comment::comment: Display a particular comment. Only works inside comment::showall.

comment::count: Display a count of comments for the current page.

comment::countlink: Display the count of comments as a link to show them for the current page.

comment::date: Display the date of a comment. Only works inside comment::showall.

comment::form: Create the form for writing a new comment in, if the page is commentable by the current user.

comment::pre: In a comment-writing context, generate a <pre> block of the comment being written.

comment::preview: In a comment-writing context, show a preview of the comment being written.

comment::showall: Display all of the comments for the current page (if any), using the template comment/comment.tmpl for each in succession.

comment::user: Display the user who wrote a comment if it isn't the default DWiki user. Only works inside comment::showall.

comment::write: Generate a link to start writing comments on the current page, if the current user can comment on the page.

cond::anonymous: Suceeds (by generating a space) if this is an anonymous request, one with no logged in real user. Fails otherwise.

cond::blogclipped: Succeeds (by generating a space) if we are in a blog view that is clipped. Fails otherwise.

cond::blogyearmonth: Suceeds (by generating a space) if we are a directory, in a blog view, and we are in a month or year VirtualDirectory. Fails otherwise.

cond::invirtual: Succeed (by generating a space) if we are in a VirtualDirectory (either directly or during rendering of a subpage). Fails otherwise.

cond::notblogroot: Succeds (by generating a space) if we are a directory that is in a default blog view but is not the directory that made it the default view. Fails otherwise.

cond::pageinblog: Succeeds (by rendering a space) if the current page is in a blog directory but is not being displayed inside a virtual directory (ie the page itself is being displayed). This also excludes 'utility' pages.

cond::realuser: Suceeds (by generating a space) if this is a request made by a logged-in real user. Fails otherwise. This is the opposite of cond::anonymous.

dir::altviews: Generate a list of links to acceptable alternate ways to view the page if it is a directory.

error::body: Generates the body for an error page from a template in errors/, if the template exists; otherwise uses a default. Only usable during generation of an error page.

error::title: Generate the title for an error from a template in errors/, if the template exists; otherwise uses a default. Only usable during generation of an error page.

hist::dirty: If the current page has been RCS-locked, display whether or not it has been modified from the version in RCS.

hist::lockedby: If the current page is under RCS and is locked, display who has locked it.

hist::revtable: If the current page is under RCS, display a version history table.

inject::blogreadme: Like inject::readme, except it looks for __readme only in the 'blog directory', the directory that made the blog view the default view. If there is no such directory between the current directory and the DWiki root directory, this does nothing.

inject::index: Insert the wikitext file __index in HTML form, if such a file exists in the current directory.

inject::readme: Insert the wikitext file __readme, in HTML form, if such a file exists in the current directory.

inject::upreadme: Like inject::readme, except it searches for __readme all the way back to the DWiki root directory, not just in the current directory.

lastchangetime: Display the page's last change time, if it has one. The change time is taken from the inode ctime.

lastmodified: Display the page's last modification time, if it has one. (This is not the same as the last-modified time that the HTTP response will have, which is taken from all of the pieces that contribute to displaying the page, including all templates.)

linkhistory: Generate a link to this page's history called 'View History', if it has any.

linknormal: Generate a link to this page's normal view called 'View Normal' if it is a file and we are not displaying it in normal view.

linkrelname: Inside blog::blog, generate a link to this page titled with the page's path relative to the blog::blog page. Outside that context, the same as linktoself.

linkshort: A link to this page, titled with the page's name.

linkshortnormal: A link to this page in the normal view, titled with the page's name.

linksource: Generate a link to this page's source called 'View Source', if it has any and you can see it.

linktocomments: Create a link to this page that will show comments (if any). Otherwise the same as linktonormal.

linktonormal: A link to this page in the normal view, titled with the full page path.

linktoself: A link to this page, titled with the full page path.

listdir: List the contents of the current directory, with links to each page and subdirectory. Supports VirtualDirectory restrictions, but always shows subdirectories.

listofdirs: Display a list of the subdirectories in the current directory.

pagetools: Generate a comma-separated list of all 'page tools' links, such as 'View Source' and alternate directory views, that are applicable to the current page.

post::oldpage: Generate a link to the origin page for a POST request in a POST form context.

range::bar: Display a simple range navigation bar inside a VirtualDirectory.

range::blogrange: With blog::blog, generates a day navigation bar if the display of pages has been truncated.

range::calbar: With blog::blog, generates a calendar-based navigation bar.

range::moreclip: With blog::blog, generate a 'or back N more' link if the display of pages has been truncated outside of a VirtualDirectory context.

readmore: Generate a 'Read more' link to this page.

rooturl: Generate the URL to the root of this DWiki.

rss2::pages: Generate a RSS 2.0 feed of the current directory and all its descendants (showing only the most recent so many entries, newest first). Each page is rendered through syndication/rss2entry.tmpl, which should result in a valid RSS 2.0 feed entry. Supports VirtualDirectory restrictions.

rss2::recentpage: Generate an RSS 2.0 format timestamp for an RSS 2.0 page feed for the current directory (and all its descendants).

rss2::timestamp: Generate a RSS 2.0 timestamp for the current page.

search::display: Display the results of a search.

search::enter: Create the search form, if searching is enabled.

seterror:permissions: If we are rendering the top level page of a request (ie, not rendering a subpage for blog, blogdir, atom feed, etc context), mark this page as having a permission error. This causes the page to be reported as a HTTP 403 error.

sitemap::minurlset: Generate a Google Sitemap set of <url> entities for the directory hierarchy starting at the current directory. Supports VirtualDirectory restrictions.

wikitext: Convert wikitext into HTML.

wikitext:cache: Convert wikitext into HTML but do not display the result; instead it is just cached for later (re)use. This has three effects. First, it makes variables like ${:wikitext:title} available (as do all other wikitext renderers). Second, it's somewhat more efficient if you intend to use a sequence of wikitext renderers, such as a title one followed by a text one. Third, it can be used as a conditional renderer to check permissions; this renderer succeeds (by generating a space) if permissions allow the wikitext to be displayed, and fails (generating nothing) if they don't.

wikitext:firstpara: Convert wikitext into HTML, showing only the first paragraph (and the title) if this is possible. This renderer fails if there is no findable first paragraph. It honors the {{CutShort}} macro.

wikitext:notitle: Convert wikitext into HTML but without the title.

wikitext:short: Convert wikitext into HTML, honoring the {{CutShort}} macro.

wikitext:terse: Convert wikitext into terse 'absolute' HTML, with all links fully qualified and no macros having any effect except CutShort, CanComment, IMG, and Restricted.

wikitext:terse:notitle: Convert wikitext into terse 'absolute' HTML with all links fully qualified et al (as with wikitext:terse) but omit the title of the page, as with wikitext:notitle.

wikitext:title: Generate and return the title of a wikitext page.

wikitext:title:html: Generate and return the title of a wikitext page complete with its surrounding '<hN>' and '</hN>' tags.

wikitext:title:nohtml: Generate and return the title of a wikitext page without HTML markup.

wikitext:title:nolinks: Generate and return the title of a wikitext page without links.

For quite a lot of these, the best real documentation is to see how they are used in the default DWiki template set. (Which is unfortunately a bit of a dark twisted maze at the moment.)

Renderers normally produce things about or from the current page, although some of them (for use in peculiar context) operate on other things. Unless otherwise specified, all renderers are silent if they can't produce something appropriate, which is handy for use in %{....} or just in general.

VirtualDirectory , 2005-09-03 01:11:04 by cks

Virtual Directories in DWiki

A virtual directory is a way of restricting what pages get shown out of a real directory. It works by tacking on 'virtual' directories after the real directory (ie, as subdirectories) to tell DWiki what you want to see.

Virtual directories restrict pages based on their most recent modification time. There are three versions available:

calendar: with the format <year>/[<month>/[<day>]], all as digits. Only pages most recently changed in the time period get selected.

latest: with the format latest/<howmany>. They show just the most recently changed <howmany> pages.

oldest: with the format oldest/<howmany>. They show just the least recently changed <howmany> pages.

range: with the format range/<start>-<end>. They show the start'th to the end'th most recently changed page.

All pieces of a virtual directory must really be virtual. If you have a directory Foo/ with a Foo/2005/ subdirectory (or file), you cannot use the virtual directory Foo/2005/05/ to see things from May of 2005 in Foo/. Moral: let DWiki organize things based on time for you, don't do it yourself.

Virtual directories are paid attention to by some renderers, which are generally used in some views. You can get the full list in TemplateSyntax.

ToDo , 2005-06-13 18:17:32 by cks

DWiki bugs/needfix

/{....} as a template comment, because I think I want them. (maybe another character, but ehh; this sort of looks like a C/etc comment.)

inode ctime is last modified, inode mtime is created. The split has started. This may or may not work well; I'll have to see. (Partly based on what else screws with ctimes in our Unix environment.)

It seems clear that ctime is not too useful in at least some context. I should use it for safety in Atom feed generation and some other contexts, but not otherwise by default.

http://projects.edgewall.com/trac/wiki/WikiFormatting documents some stuff better than me, plus has 'processors'. I could steal that.

Searching needs to be less lame, at least for searching through the searchbox. It probably wants to be case-independant and possibly only for word starts (instead of word boundaries on both sides; arguably all searches should be only word boundary start ones).

The real rule is not 'identifier boundary', it is 'identifier component boundary', which is \b or a-z0-9 at the start, and \b or A-Z at the end.

It should be possible to create an Atom feed template that included all of the comments as part of the page.

CSS work. This implies that I need to actually understand CSS. I laugh at myself, hollowly. (Progress: we now style some stuff with CSS.)

We should be able to see the history page for any RCS-but-not-displayable page.

There should be some form of RecentChanges that throws in time information. (Clearly not Striped'able.)

Open issues

Do I want a 'render this page as wikitext' magic template option? That's what the injectors hard-code right now.

writecomment needs some way to generate a good link to help/DWikiText, so that people can actually know what to write a comment in. (It has one now, but the way may be a bit lame.)

Do we need a way to turn off WikiWord links? (The current approach is to use [[...|]], which is perhaps good enough for the rare cases.)

Should we forbid switching to alternate views in a virtual directory? The 'normal' view doesn't work entirely right (drops subdirectories); this may be a bug. (Fixed now: the listdir renderer needs to always include all subdirectories, despite their timestamps possibly being outside the restriction.)

We need to sort out when a link stays in the same view and when it doesn't. At the moment it is somewhat ad-hoc.

[[...]] links don't chase redirects, and they should. Well, now they do and I'm not convinced it's the right thing. It's convenient, but it changes the explicitly written link text; this might be good or it might be bad.

Decide: should access restrictions look sort of like Unix access restrictions, being enforced top-down, or the current bottom-up way? I am starting to think that bottom-up is open to some reliability issues. But on the other hand, top-down has semantic issues too.

Profile the code. Laugh hysterically. Fix what I can.

DWiki should be more configurable through the filesystem. Can we support adding new views (directory and/or file) by reading the canonical template directory, for example? This would suffice for anything that doesn't require special handling.

Long-range:

DWiki knows a lot about what views do what. Unfortunately I suspect that this is impossible to work around, especially given how htmlviews.py is set up.

Templates should mark up with <div> and so on.
wikirend.py needs to style-mark much of the things that it emits. I would like to find some general augmentation mechanism, although it's probably not going to be pretty.

We're going to need to genericize access control. I think it will be some matrix of view + file patterns + file attributes. (Punt for now, everyone can see everything.)

This is a permanently FIXME page.

Security , 2005-06-08 17:04:54 by cks

Security Aspects of DWiki

DWiki has a general attitude about security: it really distrusts incoming requests, it somewhat distrusts itself, but it has a rational trust of the people creating DWiki templates and pages. DWiki will try to save people from accidental mistakes, but doesn't bother with things that are just half-hearted attempts to stop people from deliberately sidestepping security restrictions. Moral: don't let people write DWiki pages unless you trust them.

Some knowledge of the ProcessingModel and the ConfigurationFile (and what can be set there) may be helpful for the rest of this discussion.

A Quick Summary

DWiki itself is written in Python (a lot of Python). This means that unless there is a gross implementation error in the Python interpreter, it is secure from simple problems such as buffer overruns. While DWiki uses some components from the standard Python libraries, they too are well-tested and believed to be entirely safe.

Because it is quite careful at multiple levels about how it handles requests, hostile HTTP requests should not be able to trick DWiki into serving anything from outside the page directory (or the comments directory, or the static content directory). InvalidPageNames discusses things it won't serve even inside them.

DWiki doesn't attempt to stop insiders from using DWiki to serve 'bad' content, ultimately because there are so many ways a malicious insider can do that. ChrisSiebenmann feels that it is better to be honest about not making any attempt rather than making an attempt and causing people to put more trust in it than it warrants.

If run as a CGI-BIN, DWiki should not be run with a UID that has any special access to restricted files. But then, no CGI-BIN should be run that way.

DWiki has some degree of optional Authentication, but it is no stronger than the usual run of the mill login and password on other web sites. Really sensitive content is probably best not served from a web server that the public (whatever that means to you) can access.

Pages versus Templates

What people can do with the ability to write DWikiText in DWiki pages is somewhat less powerful than what they can do with the ability to write DWiki templates. Similarly, errors in DWikiText are considered far less fatal than errors in templates; DWikiText errors just result in funny-looking pages, while template errors result in terse web error pages.

Thus: while it's safe to let people write DWiki pages in general, you probably want to restrict (at least somewhat) who can write or modify your templates. Plus, your templates (being, you know, templates) shouldn't need modification all that often. People can create and modify pages all the time.

How DWiki tries to be secure

Cautious processing

Internally, DWiki tries to operate in a relatively 'security conservative' fashion. For example, the frontend rejects clearly invalid things without passing them through to the DWiki core, because the core has a lot more power than the frontend so a mistake has larger ramifications.

DWiki also is deliberately structured so as to give itself as little power as possible.

Errors Abort Processing

DWiki can hit a number of internal problems while processing a request; for example, a template that's called for might be missing. When this happens, DWiki aborts processing the entire request, throwing an error all the way back to the front end, which generates a terse error page about the situation.

This may be abrupt ... but it is safe.

File Access

DWiki reads only a few files: the ConfigurationFile, the global-authseed-file file, the authfile password file, and things under the page, template, RCS, static files, and comments directories (if those are configured on).

Except for the password file, the DWiki core only accesses files through a simple storage layer abstraction, which provides 'storage pools' to the rest of DWiki. Each storage pool confines all file requests to relative paths under the pool's root, explicitly ruling out InvalidPageNames when retrieving files for the rest of DWiki.

The storage layer has no general file writing capabilities. The only interface it has for writing files is specifically designed for comments, using a specific naming and storage scheme. And only the comments directory uses a storage pool that supports this abstraction.

Following Symlinks

Unlike some web servers (eg, Apache), DWiki takes no special care to not follow symbolic links that point outside one of its storage pool directory roots. If you put such a symbolic link into a storage pool area, DWiki assumes that you know what you're doing.

This is deliberate. Attempting to duplicate the kernel's namei() function in user space is inevitably very complicated (and prone to surprising races). Rather than run the risk of making a mistake in the amount of code required, DWiki is honest about the whole situation.

Limitations DWiki imposes on itself

Limited URL scope

DWiki refuses to serve any request that is not under staticurl (if set) or rooturl. Anything under staticurl must be a static request and is served only as such.

Limited static-content serving

In addition to dynamic DWiki pages, DWiki can serve static content via the staticdir ConfigurationFile directive. Since DWiki's goals for serving static content are very modest (CSS files, images, etc), DWiki refuses requests for static directories. As mentioned in ProcessingModel, static content is served by the frontend, thereby keeping the amount of code involved in the process down.

In addition, DWiki rejects any request for static content that is not in the default 'normal' view.

ProcessingModel , 2005-06-08 16:38:20 by cks

A brief sketch of the Dwiki processing model

The core of DWiki is a template expansion engine and a collection of (text) renderers; DWiki displays pages by figuring out what template to use and then rendering it out.

Renderers generate text based on the current context, such as the page that is being displayed. The most important (and largest) renderer is the wikitext renderer, which takes page content in DWiki's wiki text format and turns it into HTML.

Other renderers create things like the navigation 'breadcrumbs' up at the top of this page and and the page tools and last-modified lines at the bottom. Renderers generally create only the essential pieces of that information; surrounding text is created through template expansion. Renderers are hardcoded parts of DWiki and are thus written in Python.

Templates are text files; they get expanded by the template engine through a recursive process of applying template 'macros' to their text. Template macros can insert other (expanded) templates, insert text taken from context variables, and insert the results of renderers. A typical template might look like:
<html><head><title>${|wikititle|wikiname} :: ${page}</title></head>
<body> @{breadcrumbs} <br/>
@{wikitext}
<hr> #{footer.tmpl} #{site-sig.tmpl} </body> </html>
(the actual templates that render this DWiki are somewhat more complicated than that, but this shows the flavour.)

DWiki produces all pages this way. Displaying different types of pages (regular pages versus directories) and different views of the same page (such as the history view) is done by selecting a different starting template; the template (presumably) uses different renderers that the normal view.

Errors are also rendered using templates (if an appropriate template exists). This allows some error pages to reuse renderers as appropriate; for example, the no-such-page error template includes breadcrumbs just as regular pages do, as you can see at NoSuchPage.

Wart: the view source display is not done by a template: it just barfs the content out straight as plain text. One current limitation of renderers and templates is that they can't control the content-type, which is set in the HTML view core.

Wart: the mapping of view + file attributes to templates is currently hard-coded.

The frontend versus the core

DWiki is divided into two components: the front end and the core. The front end receives raw HTTP requests, figures out if they are proper requests, and then passes them to the core to go through the core's processing. If the front end can detect that a HTTP request is not something that the core can handle, it rejects it immediately with a terse error.

Similarly, if the core encounters a processing error it throws an exception up to the front end, which logs it and generates another terse error.

It is the front end that can optionally serve static files; the core is not involved in that process.

Features , 2005-06-06 13:19:56 by cks

DWiki features

DWiki's job is to be a good way to display version controlled wiki-text pages that you write in a real editor.

The important DWiki features:

simple but reasonably powerful text rendering (based on WikiText).

natural support for arbitrarily-named links: you don't have to follow some magic page naming standard that doesn't fit well with the natural names for things.

pages are normal, simple files, and you edit them directly in Unix.

support for putting pages in RCS, with strong disincentives to hand-edit files without checking them out (they stop displaying).

directories can display like changelogs: pages inline, most recent first.

can generate Atom syndication feeds for recently changed things.

The inevitable feature list:

In no particular order:

simple WikiText-like text rendering. (Chris wrote pages in GNU Emacs and relentlessly smushed anything that got in the way of how GNU Emacs wanted to autoformat things.)

The text rendering choices are designed to make it easy to write about Unix systems.

full support for directly editing wiki pages.

does not force a flat page namespace; uses straightforward Unix files and directories to organize the DWiki page space. (Thereby keeping the Unix view of DWiki's pages simple.)

supports a blog-like view of a directory that inlines pages there, most recent first.

in-filesystem page redirects make it trivial to support plurals, moved/renamed pages, etc.

text-based page templates control how all pages appear, making it easy to control various bits of a DWiki's appearance.

pages can be put in RCS for version control and multi-person editing access. RCS files can live in either the page directory hierarchy (for simplicity) or another parallel directory tree (for neatness).

forces people not to edit RCS-controlled files without locking them by refusing to display inconsistent unlocked files.

generates Atom syndication feeds for recently changed pages and recent comments, for the entire DWiki or any subtree thereof.

written in Python.

simple-ish yet powerful enough (I hope) user authentication system, with an equally simple yet powerful way of restricting who can read DWiki pages.

supports the option of letting people (possibly including the world) comment on some or all of the pages.

takes some pride in properly generating and handling Last-Modified: and ETag: headers in HTTP responses.

wikitext to HTML generates fully HTML 4.01 Transitional compliant HTML provided only that you don't jump multiple indent levels in at once in lists (thus Formatting doesn't validate).

can run as a CGI-BIN or standalone, and support for additional environments (SCGI, WSGI, whatever) should be easy to add if it is needed. Disclaimer: standalone does not use a production-quality webserver implementation; it uses Python's BaseHTTPServer with a hack to use threading.

Missing DWiki features

Also in no particular order:

you can't edit DWiki pages from the web, but see WhyNotWebEditing.

no user authentication.

therefor, no access restrictions on who can read what.

searching is primitive at best.

A necessary acknowledgement:

A number of DWiki's features and design decisions are shamelessly inspired by C.J. Silverio's as yet (22 May 2005) unfinished Snippy. Note that Snippy is much more powerful than DWiki probably ever will be, plus if it had been finished when I was writing DWiki I probably wouldn't have.

InvalidPageNames , 2005-06-01 17:07:18 by cks

Page Names That DWiki Won't Serve

There are some paths and page names that DWiki categorically refuses to serve, even if they seem to resolve to real files. Because they're enforced by both low-level code and high-level code, they apply to DWiki pages, static files being served by DWiki, and even templates. (Technically they apply to comments too, but comments can't generate file names that violate these rules.)

What gets rejected:

Any path that includes a path component that starts with a ., ends with ,v or a ~, or is RCS.

Any non-relative path that includes .., ., or a sequence //; usually this might appear in the URL of an incoming request. (Incoming requests are not supposed to include things like that. But ChrisSiebenmann declines to believe that everyone sending DWiki requests is going to do what they're supposed to.)

DWiki will reject REDIRECT files that either have too many '..' entries (so that they are trying to escape the root of the page directory) or that fail these checks after they've potentially been converted from relative path names to absolute inside-DWiki paths.

When DWiki rejects bad paths, generally it says that there is no page by that name. Sometimes it rejects the request entirely in huge flames.

RedirectFile , 2005-06-01 15:13:23 by cks

Redirection Files

Files in the page directory can create HTTP redirections, making it trivial to support plurals, moved/renamed pages, and so on. There are two ways of doing it: REDIRECT content and symbolic links.

If a file starts with a line that says 'REDIRECT somewhere', and does not have more than a few lines of content, DWiki considers it a redirection. The somewhere is basically interpreted as if it was appearing in a [[....]], so it can be:

redirection to another DWiki page.

redirection to an external web site, written as http://....

redirection to an absolute URL on this web site, written as <...>

These files are generically called REDIRECT files.

A symbolic link is only considered a redirect if DWiki can 'resolve' it into an existing page. To resolve the symbolic link redirect, DWiki tries to interpret the symbolic link's value as if it was appearing in a [[...]] as a DWiki relative page name.

If the symbolic link doesn't resolve this way, DWiki treats the whole thing as an ordinary page; this keeps 'ordinary' uses of symlinks intact in most cases, including when the symlinks point to something outside the DWiki page directory.

Redirects to http:// links or absolute URL links are a convenient way of creating WikiWord abbreviations to external things for local use. Make an appropriate REDIRECT file, stick it in your Aliases area, and now every page in the DWiki can say GoogleSearch or something and get a link, bam.

(WikiWord redirection rewriting means that in many cases the generated link will even point to the real target instead of the REDIRECT file, as you can see here.)

DWiki , 2005-05-27 19:46:34 by cks

Hello, I am a DWiki

DWiki stands for 'Dinky Wiki', which is what ChrisSiebenmann calls this for lack of a better name. As far as Chris knows on a casual scan, 'DWiki' is not being used as the name of any other Wiki software.

I (ChrisSiebenmann) wrote DWiki to have a good environment for writing documentation about how we do system administration work on our Unix systems. You can see a bigger list of Features if you want.

DWiki's design choices are very slanted towards making it as easy as possible for us to write, revise, and maintain system documentation. (Because on past evidence, it if isn't dirt easy we just aren't going to do it.)

I like WikiText, because it is a simple, low-effort way to write documentation that comes out looking decent. But I don't like the wiki interface for writing it: web browsers make crappy editing environments and limit the sort of changes you can easily do. So I wrote DWiki to have a good system for displaying wikitext that we'd write through other means.

Wiki or CMS?

Some people say (strongly) that DWiki is not a Wiki because people (currently) can't edit DWiki pages over the web; they argue that instead it should be called a CMS (a (web) Content Management System).

I'm willing to accept this argument in theory, but ...

I like punchy, usefully descriptive names. My personal opinion is that 'wiki' is one: calling DWiki a wiki tells you a great deal about it, while calling it a CMS tells you almost nothing.

So while I may be willing to accept the argument, DWiki stays being called a wiki (and keeps the 'Wiki' in the name).

I maintain web-based editing makes little sense for us, and wrote a bunch about why in WhyNotWebEditing.

And one can always have a big debate over at the granddaddy Wiki, at http://c2.com/cgi/wiki?WikiPrinciples.

WhyNotWebEditing , 2005-05-27 19:21:10 by cks

Why DWiki doesn't have web page editing

One of the signature Wiki features is that people edit pages over the web, often anyone and without restrictions (as the original Wiki was/is). It's said that this is a defining trait of Wikis, and that without it what you have isn't really a wiki.

DWiki has no from-web editing of pages. There are several reasons why.

Interface power

A web browser's form input text boxes are a totally crappy editing environment compared to what I have on a Unix system. Yeah, sure, I could require Javascript and load a huge editing library and maybe get somewhere, but a) I browse with Javascript off and b) am I going to get half as good as GNU Emacs or vi or sam? (I don't think so.)

So I want the primary way of editing DWiki pages to be from Unix, through the filesystem, with real editors. (And it is.)

Global edit doesn't make sense for us

The principles of global edit permissions leading to the world help write your pages simply don't make sense for us. DWiki's goal is to let us easily document how our Unix systems work. We're the only people who can write most of that documentation; outsiders can at best add side commentary.

This would be different if we were interested in running a Wiki on system administration best practices or the like. But we're not; we're just documenting our systems. We let other people read it so that they can learn from anything interesting we do (and that's primarily aimed at other people at the University of Toronto).

It's a drain we can't afford

Anything that allows semi-public writing on the Internet requires tending. Wikis are no exception to this rule.

Like many places, we are historically very bad at creating documentation. The more effortless I can make the process, the better the odds that we will actually write documentation.

Keeping DWiki running is part of the overall process; the less effort this takes, the better, especially if we aren't actively writing documentation at the time. Thus, I don't want DWiki to take up any time when we're not actively writing things with it.

If DWiki allowed web writing from anything except a small set of people, we would have to tend it. It is simpler and less risky to avoid that, especially given that we can't expect significant contributions from outsiders.

Skipping hard design problems

Eliminating web-based editing immediately kills the need to tackle a bunch of hard problems, because Unix handles them for me. Particularly, I don't need to authenticate people or do access control, provided I'm willing to let everyone read (I am, so far).

Access control, authentication, and registering people is not an easy area. It's also one where failures and program bugs can have severe consequences. Not having to worry about that means that DWiki is faster to write, smaller, and safer.

I also don't have to worry about random outsiders writing pages that make extensive use of expensive DWiki features, or writing things in pages that cause rendering errors.

But web editing can be done from anywhere!

Pragmatically, the odds of us wanting to edit our systems documentation from anywhere that we can't just run ssh to log in to our servers is fairly low. This is especially the case given that there are Java SSH applets, so that any browser that runs Java can let us log in to our servers.

DWiki is aimed at the low-hanging fruit of the 90% or 80% or so solution. (I maintain that any wiki is, partly because the text rendering is deliberately simplified.)

The future: maybe limited web editing

The most likely web editing feature for DWiki to pick up is to let web people write comments on pages but not edit the pages themselves. This would let outsiders give us feedback and commentary without running the risks of scribbling over valuable page content.

This would still require me to either write an authentication system or live with the likelyhood of comment spammers showing up to yammer madly. Plus some of the above worries.

(The clever person will notice that some of this future has arrived. DWiki now has an authentication system and comments, although both will be improved in the future.)

CodeStructure , 2005-05-24 02:42:24 by cks

The code structure:

There's three chunks of code: the HTTP layer, the HTML view core, and the model.

The model deals with view-independant wiki level things, primarily retrieving raw pages and templates. (To do this it calls on a storage pools managed by a storage layer; the storage layer handles much of the RCS magic.)

The DWiki HTML view core gets a request context and is responsible for returning a response, whether that be rendered page content, redirections, or (rendered) errors. Renderers and template expansion are part of the HTML view core.

The HTTP layer is responsible for generating the request context, sending the response (including conditional GETs and other fun), and giving the HTML view core ways of generating proper URLs for given wiki pages.

The split between HTML view core and HTTP layer exists because the HTML view core is agnostic about how it is connected to the web, while the HTTP layer is intimiately tied to CGI-BIN versus Python BaseHTTPServer versus etc. So the HTML view core has everything that is web server independant.

FIXME: investigate this Python SCGI thing I've heard bits about. Or is that WSGI? See http://www.python.org/peps/pep-0333.html and http://wiki.python.org/moin/WSGIImplementations. Unfortunately on a preliminary look it seems I might as well write to CGI-BIN to start with.

I think that this is almost but not quite Model-View-Controller, but then I don't understand how MVC works (especially on the web).

To look at:

http://java.sun.com/blueprints/patterns/MVC-detailed.html

http://www.phpwact.org/pattern/model_view_controller

This suggests I am almost MVC except that my Controller is smeared over View code and that I have split the View into two pieces: the HTTP layer and the HTML view core.