Welcome to mwlib’s documentation!

Contents:

Getting started

mwlib provides a library for parsing MediaWiki articles and converting them to different output formats.

The collection extension is a MediaWiki extensions enabling users to collect articles and generate PDF files from those.

Both components are used by wikipedia’s ‘Print/export’ feature.

If you’re running a low-traffic public mediawiki installation, you only have to install the collection extension. You’ll have to use the public render server run by pediapress GmbH. Please read Collection Extension for MediaWiki.

If you need to run your own render server instance, you’ll have to install mwlib and mwlib.rl first. Please read Installation of mwlib.

Contact/Need help

If you need help with mwlib or the Collection extension you can either browse the mwlib mailing list or subscribe to it via mail.

The developers can also be found on IRC in the #pediapress channel

Dev Support

Need help with architectural advice or design validation? Get development help from our core team!

Want more information about our Development Support? Let us know how to get a hold of you and we’ll be in touch soon.

contact us

contact@brainbot.com / +49 (0) 6131 2116391

Production Support

Have mwlib live in production and looking for SLA-based support? In production? Get reliable support from team that built mwlib.

We’d love to talk to you about our Production Support. Let us know how to get a hold of you and we’ll be in touch soon.

contact us

contact@brainbot.com / +49 (0) 6131 2116391

Installation of mwlib

If you’re running Ubuntu 10.04 or a similar system, and you just want to copy and paste some commands, please read Installation Instructions for Ubuntu 10.04 LTS

Microsoft Windows is not supported.

Basic Prerequisites

You need to have a C compiler, a C++ compiler, make and the python development headers installed. mwlib will work with python 2.6 and 2.7. It will not work with python versions >= 3 or < 2.6. mwlib requires a recent UNIX-like operating system.

mwlib requires the python imaging library (pil) and the python lxml package. In order to compile pil from source the libjpeg, zlib, freetype and lcms header files and libraries must be present on the system. Compiling lxml requires the libxslt and libxml2 header files and libraries.

mwlib is split into multiple namespace packages, that each provide different functionality:

mwlib
core functionality; provides a parser
mwlib.rl
generates PDF files from mediawiki articles. This is what is being used on wikipedia in order to generate PDF output.
mwlib.zim
generate ZIM files from mediawiki articles

Installation of mwlib with pip/easy_install

We recommend that you use a virtualenv for installation. If you don’t use a virtualenv for installation, the commands below must probably be run as root.

Installation of mwlib can be done with:

$ pip install -i http://pypi.pediapress.com/simple/ mwlib

Make sure the output of the last command contains:

...
--- JPEG support available
--- ZLIB (PNG/ZIP) support available
--- FREETYPE2 support available
...

This will install mwlib and it’s dependencies. The “-i http://pypi.pediapress.com/simple/” command line arguments instruct pip to use our private pypi server. It contains known “good versions” of mwlib dependencies and bugfixes for the greenlet package.

Installation of mwlib.rl with pip/easy_install

The following command installs the mwlib.rl package:

pip install -i http://pypi.pediapress.com/simple/ mwlib.rl

If you want to render right-to-left texts, you must also install the pyfribidi package:

pip install -i http://pypi.pediapress.com/simple/ pyfribidi

Testing the installation

Use the following two commands to test the installation:

mw-zip -c :en -o test.zip Acdc Number
mw-render -c test.zip -o test.pdf -w rl

Open test.pdf in your PDF viewer of choice and make sure that the result looks reasonable.

Optional Dependencies

mwlib uses a set of external programs in order to handle certain mediawiki formats. You may have to install some or all of the following programs depending on your needs:

  • imagemagick
  • texvc
  • latex
  • blahtexml

Installation Instructions for Ubuntu 10.04 LTS

The following commands can be used to install mwlib on Ubuntu 10.04 LTS. Run the following as root:

apt-get install -y gcc g++ make python python-dev python-virtualenv   \
  libjpeg-dev libz-dev libfreetype6-dev liblcms-dev                   \
  libxml2-dev libxslt-dev                                             \
  ocaml-nox git-core                                                  \
  python-imaging python-lxml                                          \
  texlive-latex-recommended ploticus dvipng imagemagick               \
  pdftk

After that switch to a user account and run:

virtualenv --distribute --no-site-packages ~/pp
export PATH=~/pp/bin:$PATH
hash -r
export PIP_INDEX_URL=http://pypi.pediapress.com/simple/
pip install pyfribidi mwlib mwlib.rl

Install texvc:

git clone https://github.com/pediapress/texvc
cd texvc; make; make install PREFIX=~/pp

Then test the installation.

Development version

The source code is managed via git and hosted on github. Please visit pediapress’s profile on github to get an overview of what’s available and for further instruction on how to checkout the repositories.

You will also need to install cython, re2c and gettext if you plan to build from the git repositories.

Running a renderserver

Overview

Running a renderserver consists in running multiple programs [1]. Unless you have some special requirements, you should be able to start a working renderserver by running the following commands:

$ nserve
$ mw-qserve
$ nslave --cachedir ~/cache/
$ postman

These programs have the following purposes:

nserve
nserve is a HTTP server. The Collection extension is talking to that program directly. nserve uses at least one mw-qserve instance in order to distribute and manage jobs.
mw-qserve
mw-qserve is a job queue server used to distribute and manage jobs. You should start one mw-qserve instance for each machine that is supposed to render pdf files. Unless you’re operating the Wikipedia installation, one machine should suffice.
nslave
nslave pulls new jobs from exactly one mw-qserve instance and calls the mw-zip and mw-render programs in order to download article collections and convert them to different output formats. nslave uses a cache directory to store the generated documents. nslave also starts an internal http server serving the content of the cache directory.
postman
postman uploads zip collections to pediapress in case someone likes to order printed books. You should start one instance for each mw-qserve instance.

None of the programs has the ability to run as a daemon. We recommend using runit for process supervision. daemontools is similar solution. Another alternative is to use supervisor.

[1]In mwlib prior to version 0.13 it was possible to get away with running a single mw-serve program or even running no program at all by using the mwlib.cgi script. These programs have been removed in favor of the new tools, which provide the ability to scale an installation.

nserve usage

nserve understands the following options:

--port=PORT

specify port to listen on. Default is to listen on port 8899 on any interface.
--qserve=HOST:PORT
register qserve instance running on host HOST listening on port PORT

Any additional arguments are interpreted as additional qserve instances to register.

The following command starts nserve listening on port 8000 using two qserve instances:

nserve --port 8000 example1:14311 example2

mw-qserve usage

mw-qserve understands the following options:

-p PORT
specify port to listen on. Default is to listen on port 14311
-i INTERFACE
specify interface to listen on. Default is to listen on any interface.

nslave usage

nslave understands the following options:

--cachedir=CACHEDIR

specify cachedir to use. this is where nslave will store generated documents.
--serve-files-port
port on which to start the http server (default is 8898)
--url=URL
specify url under which the cache directory is being served. The default is to compute this value dynamically.
--numprocs=NUMPROCS
allow up to NUMPROCS parallel jobs to be executed

postman usage

postman understands the following options:

--cachedir=CACHDIR
specify cachedir to use. use the same value as specified when calling nslave

command line tools

Common Options

This section contains a description of options that are accepted by more than one command.

-h, --help

Show usage information and exit.

-c, --config=CONFIG

The value for this option describes the source of MediaWiki articles and images for the command and can be of one of the following types:

  • A “base URL” of a MediaWiki installation. A base URL is the URL up to, but not including the index.php/api.php part.

    This URL can differ from the prefix seen in “pretty” article URLs. For example the article Physics in the English Wikipedia has the URL http://en.wikipedia.org/wiki/Physics, but the base URL is http://en.wikipedia.org/w/.

    If you’ve set up your own MediaWiki you probably know what your base URL should be, but if you’re using a different MediaWiki, you can see the base URL if add a query string to the URL, e.g. by clicking on the edit link or by looking at an older revision of an article.

    This value for --config corresponds to type=mwapi in a configuration file (see docs/configfiles.txt), i.e. articles and images are fetched with the MediaWiki API. Specifying the URL directly as value for --config is usually the quicker way to achieve exactly the same result.

    This requires MediaWiki 1.11 or later.

  • A shortcut for a base URL. Currently there are the following shortcuts:

  • A filename of a ZIP file generated with the the mw-zip Command.

  • A filename of a configuration file (see docs/configfiles.txt).

-m, --metabook=METABOOK

Description of the article collection to be rendered in JSON format. This is used by the Collection extension to transfer this information to mw-serve which in turn passes the information to mw-render and mw-zip.

--collectionpage=COLLECTIONPAGE

Title of a saved article collection (using the Collection extension)

-x, --no-images

If given, no images are included in the output document.

-i, --imagesize=IMAGESIZE

Maximum size (which can be either width or height, whichever is greater) of images. If images exceed this maximum size, they’re scaled down.

-o, --output=OUTPUT

Write output to given file.

-l, --logfile=LOGFILE

Log output to the given file.

--login=USERNAME:PASSWORD[:DOMAIN]

For MediaWikis that restrict the viewing of pages, login with given USERNAME, PASSWORD and optionally DOMAIN.

Currently this is only supported for mwapidb, i.e. when the –config argument is a base URL or shortcut, or when type=mwapi in the configuration file.

--title

Specify a title for the article collection. This is e.g. used by some writers to produce a title page. This title overrides titles contained in ZIP files or metabook files.

--subtitle

Specify a subtitle for the article collection. This is e.g. used by some writers to produce a title page (note that subtitle might require a tilte). This subtitle overrides subtitles contained in ZIP files or metabook files.

The mw-render Command

Render MediaWiki articles to one of several output formats like PDF or OpenDocument Text.

Usage

mw-render [OPTIONS] [ARTICLETITLE...]

Specific Options

-w, --writer

Name of the writer to produce the output. The list of available writers can be seen with mw-render --list-writers.

--list-writers

List the available writers.

-W, --writer-options

Writer specific options in a “;” separated list (depending on your shell, quoting with “…” or ‘…’ might be needed). Each item in that list can either be a single option or an option=value pair. To list the available writer options use mw-render --writer-info WRITERNAME.

--writer-info=WRITER

Show available options and some additional information about the given writer.

-s, --status-file=STATUS_FILE

Write status/progress information in JSON format to this file. The file is continuously updated during the execution of mw-render.

-e, --error-file=ERROR_FILE

If an error occurs, write the error message to this file. If no error occurs this file is not written/created.

--keep-zip=FILENAME

Do not remove the (otherwise temporary) ZIP file, but save it under FILENAME.

The mw-zip Command

Generate a ZIP file containing

  • articles,
  • images,
  • templates and
  • additional meta information (especially if --metabook is given, see Common Options) like name and URL of the MediaWiki, licensing information and title, subtitle and the hierarchical structure of the article collection.

Usage

mw-zip [OPTIONS] [ARTICLETITLE...]

Specific Options

-p, --posturl=POSTURL

Upload the ZIP file with an HTTP POST request to the given URL.

-g , --getposturl

Retrieve the POSTURL from PediaPress and open the upload page in the web browser.

The mw-post Command

Send a ZIP file generated with the mw-zip command to a given or an automatically retrieved URL via HTTP POST request.

Usage

mw-post [OPTIONS]

Specific Options

-i, --input=INPUT

Filename of ZIP file.

-p, --posturl=POSTURL

Upload the ZIP file with an HTTP POST request to the given URL.

-g , --getposturl

Retrieve the POSTURL from PediaPress and open the upload page in the web browser.

The mw-serve-ctl command

--purge-cache=HOURS

Remove all cached files in –cache-dir that haven’t been touched for the last HOURS hours. This is meant to be run as a cron job.

--clean-up

Report errors for processes that have died irregularly.

Internals

The following section describes some of the internals of mwlib. Only read this if you plan to extend mwlib’s functionality.

Writers

A writer in mwlib generates output from a collection of MediaWiki articles in some writer-specific format.

The writer function

Essentially a writer is just a Python function with the following signature:

def writer(env, output, status_callback, **kwargs): pass

Note that the function doesn’t necessarily have to be called “writer”.

The env argument is an mwlib.wiki.Environment instance which always has the wiki attribute set to the configured WikiDB instance and the metabook attribute set to a filled-in mwlib.metabook.MetaBook instance. If images are used, the images attribute of the env object is set to the configure ImageDB instance.

The output argument is a filename of a file in which the writer should write its output.

The status_callback argument is a callable with the following signature:

def status_callback(status=None, progress=None, article=None): pass

which should be called from time to time to update the status/progress information. status should be set to a short, English description of what’s happening (e.g. “parsing”, “rendering”), progress should be an integer value between 0 and 100 indicating the percentage of progress (actually you don’t have to worry about setting it to 0 at the start and to 100 at the end, this is done by mw-render) and article should be the unicode string of the currently processed article. All parameters are optional, so you can pass only one or two of the parameters to status_callback() and the other parameters will keep their previous value.

The return value of the writer function is not used: If the function returns, this is treated as success. To indicate failure, the writer must raise an exception. Use the WriterError exception defined in mwlib.writerbase (or a subclass thereof) and instantiate it with a human readable English error message if you want the message to be written to the error file specified with the --error-file option of mw-render. For all other exceptions, the traceback is written to the error file.

Your writer function can define additional keyword arguments (indicated by the “**kwargs” above) that can be passed to the writer with the --writer-options argument of the mw-render command (see below). If the user specified a writer option with option=value, the kwarg option gets passed the string "value", if she specified a writer option just with option, the kwarg option gets passed the value True. All writer options should be optional and documented using the options attribute on the writer object (see below).

Attributes

Optionally – and preferably – this function object has the following additional attributes:

writer.description = 'Some short description'
writer.content_type = 'Content-Type of the output'
writer.file_extension = 'File extension for documents'
writer.options = {
    'foo: {
        'help': 'help text for "switch" foo',
    },
    'bar': {
        'param': 'PARAM',
        'help': 'help text for option bar with parameter PARAM',
    }
}

For example the writer “odf” (defined in mwlib.odfwriter) sets the attributes to these values:

writer.description = 'OpenDocument Text'
writer.content_type = 'application/vnd.oasis.opendocument.text'
writer.file_extension = 'odt'

and the writer “rl” from mwlib.rl (defined in mwlib.rl.rlwriter) sets the attributes to these values:

writer.description = 'PDF documents (using ReportLab)'
writer.content_type = 'application/pdf'
writer.file_extension = 'pdf'
writer.options = {
    'coverimage': {
        'param': 'FILENAME',
        'help': 'filename of an image for the cover page',
    }
}

The description is used when the list of writers is displayed with mw-render --list-writers, all information is displayed with mw-render --writer-info SOMEWRITER. The content type and file extension are written to a file, if one is specified with the --status-file argument of mw-render.

Publishing the writer

Writers are made available as plugins using setuptools entry points. They have a name and must belong to the entry point group “mwlib.writers”. To publish writers in your distribution, add all included writers to the entry group by passing the entry_points kwarg to the call to setuptools.setup() in your setup.py file:

setup(
    ...
    entry_points = {
        'mwlib.writers': [
            'foo = somepackage.foo:writer',
            'bar = somepackage.barbaz:bar_writer',
            'baz = somepackage.barbaz:baz_writer',
        ],
    },
    ...
)

Using writers

From the command line, writers can be used with the mw-render command. Called with just the --list-writers option, mw-render lists the available writers together with their description. A name of an available writer can then be passed with the --writer option to produce output with that writer. For example this will use the ODF writer (named “odf”) to produce a document in the OpenOffice Text format:

$ mw-render --config :en --writer odf --output test.odt Test

Additional options for the writer can be specified with the --writer-options argument, whose value is a “;” separated list of keywords or “key=value” pairs.

Metabooks

A Metabook describes a collection of articles and chapters together with some metadata like title or version. The actual data (e.g. the wikitext of articles) is not contained in the Metabook.

The Metabook is a simple dictionary containing lists, integers, strings (which are Unicode-safe; they are represented as unicode in Python) and other dictionaries. When read from/written to a file or sent over the network, it”s serialized in JSON format.

Metabook Types

Every dictionary contained in the Metabook (and the Metabook dicionary itself) has a type. The different types are described below. The Metabook dictionary itself has type “collection”.

Collection

type (string):

Fixed value “collection”

version (integer):

Protocol version, 1 for now

title (string, optional):

Title of the collection

subtitle (string, optional):

Subtitle of the collection

editor (string, optional):

Editor of the collection

items (list of article and/or chapter objects, can be empty):

Chapters and top-level articles contained in the collection

licenses (list of license objects):

List of licenses for articles in this collection

License

type (string)

Fixed value “license”

name (string)

Name of license

mw_license_url (string, optional)

URL to license text in wikitext format

mw_rights_page (string, optional)

Title of article containing license text

mw_rights_icon (string, optional)

URL of license icon

mw_rights_url (string, optional)

URL to license text in any format

mw_rights_text (string, optional)

Name and possibly a short description of the license

Article

type (string):

Fixed value “article”

content_type (string):

Fixed value “text/x-wiki”

title (string):

Title of this article

displaytitle (string, optional):

Title to be used in rendered output instead of the real title

revision (string, optional):

Revision of article, i.e. oldid for MediaWiki. If omitted, the latest revision is used.

timestamp (integer, optional):

UNIX timestamp (seconds since 1970-1-1) of the revision of this article

url (string):

URL to article in source wiki

authors (list of strings):

list of principal authors

source-url (string)

URL of source wiki. This URL is the key to an item in the sources dictionary in the content.json object of the ZIP file.

Chapter

type (string):

Fixed value “chapter”

title (string):

Title of this chapter

items (list of article objects, can be empty):

List of articles contained in this chapter

Source

type (string)

Fixed value “source”

system (string):

Fixed value “MediaWiki” for now

url (string, optional):

“home” URL of source, e.g. “http://en.wikipedia.org/wiki/Main_Page” (same as key for this entry)

name (string):

Unique name of source, e.g. “Wikipedia (en)”

language (string)

2-character ISO code of language, e.g. “en”

interwikimap (dictionary mapping prefixes to interwiki objects, optional)

Interwiki

Interwiki entries can describe language links and interwiki links

type (string)

Fixed value “interwiki”

prefix (string)

Prefix is MediaWiki links, i.e. the part before the “:”. This is the key in the interwikimap attribute of a source object.

url (string)

URL template, the string “$1” gets replaced with the link target (w/out prefx)

local (bool, optional)

True if the interwiki link is a “local” one

language (string, optional)

Name of the language, if this interwiki describes language links

Example

Given in JSON notation:

{
    "type": "collection",
    "version": 1,
    "title": "This is the Collection Title",
    "subtitle": "An optional subtitle",
    "editor": "Jane Doe",
    "items": [
        {
            "type": "article",
            "title": "Top-level Article",
            "content_type": "text/x-wiki"
        },
        {
            "type": "chapter",
            "title": "First Chapter",
            "items": [
                {
                    "type": "article",
                    "title": "First Article in Chapter",
                    "revision": "1234",
                    "timestamp": 122331212312,
                    "content_type": "text/x-wiki"
                    "source-url": "http://en.wikipedia.org/wiki/Main_Page",
                },
                {
                    "type": "article",
                    "title": "Second Article in Chapter",
                    "content_type": "text/x-wiki"
                    "source-url": "http://en.wikipedia.org/wiki/Main_Page",
                }
            ]
        },
    ],
    "licenses": [
        {
            "type": "license",
            "name": "GFDL",
            "mw_license_url": "http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License"
        }
    ]
}

Collection Extension for MediaWiki

About the Collection Extension

The Collection extension for MediaWiki allows users to collect articles and generate downloadable version in different formats (PDF, OpenDocument Text etc.) for article collections and single articles.

The extension has been developed for and tested with MediaWiki version 1.14 and later. Some features may not be available with older MediaWikis that don’t have the MediaWiki API enabled.

The extension is being developed under the GNU General Public License by PediaPress GmbH in close collaboration with Wikimedia Foundation and the Commonwealth of Learning.

Copyright (C) 2008-2012, PediaPress GmbH, Siebrand Mazeland, Marcin Cieślak and other contributors

Prerequisites

If you use a render server the MediaWiki API must be enabled (i.e. just don’t override the default value of true for $wgEnableApi in your LocalSettings.php).

Install PHP with cURL support

Currently Collection extension needs PHP with cURL support, see http://php.net/curl

Installation and Configuration of the Collection Extension

  • For MediaWiki versions up to and including 1.18: Download the Collection extension matching your MediaWiki version from http://www.mediawiki.org/wiki/Special:ExtensionDistributor/Collection and unpack it into your mediawiki extensions directory:

    cd /srv/http/wiki/extensions
    tar -xzf Collection-MW1.18-113990.tar.gz -C /var/www/mediawiki/extensions
    
  • For MediaWiki versions 1.19 and newer: You can checkout the newest code of the Collection extension from the Git repository into the extensions directory of your MediaWiki installation:

    cd extensions/
    git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Collection
    
  • Put this line in your LocalSettings.php:

    require_once("$IP/extensions/Collection/Collection.php");
    

If you intend to use the public render server, you’re now ready to go.

Install and Setup a Render Server

Rendering and ZIP file generation is done by a server, which can run separately from the MediaWiki installation and can be shared by different MediaWikis.

If you have a low-traffic MediaWiki you can use the public render server running at http://tools.pediapress.com/mw-serve/. In this case, just keep the configuration variable $wgCollectionMWServeURL (see below) at its default value.

Your MediaWiki must be accessible from the render server, i.e. if your MediaWiki is behind a firewall you cannot use the public render server.

If you can’t use the public render server, you’ll have to install mwlib and run your own render server. See http://mwlib.readthedocs.org/ for more information.

Finally you’ll have to set $wgCollectionMWServeURL in your LocalSetting.php:

$wgCollectionMWServeURL (string)

Set this to the URL of a render server (see above).

The default is http://tools.pediapress.com/mw-serve/, the public render server hosted by PediaPress.

Password protected wikis

Password protected wikis require some more information. You’ll have to set the $wgCollectionMWServeCredentials variable.

$wgCollectionMWServeCredentials (string)

Set this to a string of the form “USERNAME:PASSWORD” (or “USERNAME:PASSWORD:DOMAIN” if you’re using LDAP), if the MediaWiki requires to be logged in to view articles. The render server will then login with these credentials using MediaWiki API before doing other requests.

SECURITY NOTICE: If the MediaWiki and the render server communicate over an insecure channel (for example on an unencrypted channel over the internet), please DO NOT USE THIS SETTING, as the credentials will be exposed to eavesdropping!

Advanced Settings

The following variables can be set in LocalSetting.php. Most people do not have to change them:

$wgCollectionMWServeCert (string)
Filename of a SSL certificate in PEM format for the mw-serve render server. This needs to be used for self-signed certificates, otherwise cURL will throw an error. The default is null, i.e. no certificate.
$wgCollectionFormats

An array mapping names of mwlib writers to the name of the produced format. The default value is:

array(
    'rl' => 'PDF',
)

i.e. only PDF enabled. If you want to add OpenDocument Text in addition to PDF you can set $wgCollectionFormats to something like this:

$wgCollectionFormats = array(
    'rl' => 'PDF',
    'odf' => 'ODT',
);

On the public render server tools.pediapress.com, currently the following writers are available:

  • docbook: DocBook XML
  • odf: OpenDocument Text
  • rl: PDF
  • xhtml: XHTML 1.0 Transitional

If you’re using your own render server, the list of available writers can be listed with the following mwlib command:

$ mw-render --list-writers
$wgCollectionContentTypeToFilename (array)

An array matching content types to filenames for downloaded documents. The default is:

$wgCollectionContentTypeToFilename = array(
        'application/pdf' => 'collection.pdf',
        'application/vnd.oasis.opendocument.text' => 'collection.odt',
);
$wgCollectionPortletFormats (array)

An array containing formats (keys in $wgCollectionFormats) that shall be displayed as “Download as XYZ” links in the “Print/export” portlet. The default value is:

array( 'rl' );

i.e. there’s one link “Download as PDF”.

$wgCollectionHierarchyDelimiter (string or null)

If not null, treat wiki pages whose title contains the configured delimiter as subpages.

For example, to treat article [[Foo/Bar]] as subpage of article [[Foo]] set this variable to “/”. This makes sense e.g. on wikibooks.org, but it’s questionable on wikipedia.org (cf. [[AC/DC]]).

The (only) effect is that the display title for subpages in collections is set to the title of the (deepest) subpage. For example, the title of article [[Foo/Bar]] will be displayed/rendered as “Bar”.

The defaul value is null, which means that no hierarchy is assumed.

$wgCollectionArticleNamespaces (array)

List of namespace numbers for pages which can be added to a collection. Category pages (NS_CATEGORY) are always an exception (all articles in a category are added, not the category page itself). Default is:

array(
  NS_MAIN,
  NS_TALK,
  NS_USER,
  NS_USER_TALK,
  NS_PROJECT,
  NS_PROJECT_TALK,
  NS_MEDIAWIKI,
  NS_MEDIAWIKI_TALK,
  100,
  101,
  102,
  103,
  104,
  105,
  106,
  107,
  108,
  109,
  110,
  111,
);
$wgCommunityCollectionNamespace (integer)

Namespace for “community collections”, i.e. the namespace where non-personal article collection pages are saved.

Note: This configuration setting is only used if the system message Coll-community_book_prefix has not been set (see below).

Default is NS_PROJECT.

$wgCollectionMaxArticles (integer)

Maximum number of articles allowed in a collection.

Default is 500.

$wgCollectionLicenseName (string or null)

License name for articles in this MediaWiki. If set to null the localized version of the word “License” is used.

Default is null.

$wgCollectionLicenseURL (string or null)

HTTP URL of an article containing the full license text in wikitext format for articles in this MediaWiki. E.g.

$wgCollectionLicenseURL = 'http://en.wikipedia.org/w/index.php?title=Wikipedia:Text_of_the_GNU_Free_Documentation_License&action=raw';

for the GFDL. If set to null, the standard MediaWiki variables $wgRightsPage, $wgRightsUrl and $wgRightsText are used for license information.

If your MediaWiki contains articles with different licenses, make sure that each article contains the name of the license and set $wgCollectionLicenseURL to an article that contains all needed licenses.

$wgCollectionPODPartners (array or false)

Array of parameters needed to define print on demand providers:

$wgCollectionPODPartners = array(
        'pediapress' => array(
                'name' => 'PediaPress',
                'url' => 'http://pediapress.com/',
                'posturl' => 'http://pediapress.com/api/collections/',
                'infopagetitle' => 'coll-order_info_article',
        ),
);

(This is the default.)

name, url and posturl are mandatory parameters to display information on the list of available providers.

If infopagetitle is present, it will be interpreted as the MediaWiki message that contains the name of the short information on particular provider. For example, it can be coll-order_info_mypress and if the message contains Help:Books/MyPress order information, a contents of this page will be used. The message itself can be localized for different languages.

Setting $wgCollectionPODPartners to false disables ordering interface altogether.

$wgEnableWriteAPI

If you want to let users save their collections as wiki pages, make sure $wgEnableWriteAPI is set to true, i.e. put this line in your LocalSettings.php:

$wgEnableWriteAPI = true;

(This is the default.)

There are two MediaWiki rights that are checked, before users are allowed to save collections: To be able to save collection pages under the User namespace, users must have the right ‘collectionsaveasuserpage’; to be able to save collection pages under the community namespace (see $wgCommunityCollectionNamespace), users must have the right ‘collectionsaveascommunitypage’. For example, if all logged-in users shall be allowed to save collection pages under the User namespace, but only autoconfirmed users, shall be allowed to save collection pages under the community namespace, add this to your LocalSettings.php:

$wgGroupPermissions['user']['collectionsaveasuserpage'] = true;
$wgGroupPermissions['autoconfirmed']['collectionsaveascommunitypage'] = true;

You may also want to configure some of the following:

  • As the current collection of articles is stored in the session, the session timeout should be set to some sensible value (at least a few hours, maybe one day). Adjust session.cookie_lifetime and session.gc_maxlifetime in your php.ini accordingly.

  • Add a help page (for example Help:Books for wikis in English language).

    A repository of help pages in different languages can be found on Meta-Wiki.

    The name of the help page is stored in the system message Coll-helppage and can be adjusted by editing the wiki page [[MediaWiki:Coll-helppage]].

  • Add a template [[Template:saved_book]] which is transcluded on top of saved collection pages. An example for such a template can be found on the English Wikipedia: http://en.wikipedia.org/wiki/Template:Saved_book

    The name of the template can be adjusted via the system message Coll-savedbook_template, i.e. by editing [[MediaWiki:Coll-savedbook_template]].

  • To enable ZENO and Okawix export, uncomment the corresponding lines in $wgCollectionFormats (file Collection.php). These exports are devoted to the Wikimedia projects and their mirrors.

    They cannot be used on other wikis since they get data and search engine indexes from the cache of wikiwix.com.

Customization via System Messages

There are several system messages, which can be adjusted for a MediaWiki installation. They can be changed by editing the wiki page [[MediaWiki:SYSTEMMESSAGENAME]], where SYSTEMMESSAGENAME is the name of the system message.

  • Coll-helppage: The name of the help page (see above).

    The default for English language is Help:Books, and there exist translations for lots of different languages.

  • Coll-user_book_prefix: Prefix for titles of “user books” (i.e. books for personal use, as opposed to “community books”). If the system message is empty or ‘-‘ (the default), the title of user book pages is constructed as User:USERNAME/Books/BOOKTITLE. If the system message is set and its content is PREFIX, the title of user book pages is constructed by directly concatenating PREFIX and the BOOKTITLE, i.e. there’s no implicitly inserted ‘/’ inbetween!

  • Coll-community_book_prefix: Prefix for titles of “community books” (cf. “user books” above). If the system message is empty or ‘-‘ (the default), the title of community pages is constructed as NAMESPACE:Books/BOOKTITLE, where NAMESPACE depends on the value of $wgCommunityCollectionNamespace (see above). If the system message is set and its content is PREFIX, the title of community book pages is constructed by directly concatenating PREFIX and BOOKTITLE, i.e. there’s no implicitly inserted ‘/’ inbetween. Thus it’s possible to define a custom namespace ‘Book’ and set the system message to ‘Book:’ to produce community book page titles Book:BOOKTITLE.

  • Coll-savedbook_template: The name of the template (w/out the Template: prefix) included at the top of saved book pages (see above).

    The default is: saved_book, and there exist translations for lots of different languages.

  • Coll-bookscategory: Name of a category (w/out the Category: prefix) to which all saved book pages should be added (optional, set to an empty value or “-” to turn that feature off).

  • Coll-book_creator_text_article: The name of a wiki page which is transcluded on the “Start book creator” page (the page which is shown when a user clicks on “Create a book”).

    The default is: {{MediaWiki:Coll-helppage}}/Book creator text i.e. a subpage of the configured help page named “Book creator text”

  • Coll-suggest_enabled: If set to 1, the suggestion tool is enabled. Any other value will disable the suggestion tool.

    The default is: ‘1’, i.e. the suggestion tool is enabled.

  • Coll-order_info_article: The name of a wiki page which is included on the Special:Book page to show order information for printed books.

    The default value is: {{MediaWiki:Coll-helppage}}/PediaPress order information i.e. a subpage of the configured help page named “PediaPress order information”.

    This wiki page is used only if included in the $wgCollectionPODPartners configuration.

  • Coll-rendering_page_info_text_article: The name of a wiki page with additional informations to be displayed when single pages are being rendered.

  • Coll-rendering_collection_info_text_article: The name of a wiki page with additional informations to be displayed when collections are being rendered.

Changelog

mwlib

2016-02-18 mwlib 0.15.16

  • ignore hatnote
  • implement TIME Tag extension for wp.fr

2014-02-19 mwlib 0.15.15

  • catch IOError when reading image sizes
  • prevent race conditions in rlwriter.toc by using a tmp directory

2014-01-13 mwlib 0.15.14

  • set user agent from environment variable MWLIB_USER_AGENT

2014-01-09 mwlib 0.15.13

  • add –disable-all-writers argument to nserve
  • add note about professional support
  • adapt bot filtering a bit

2013-11-11 mwlib 0.15.12

  • workaround ‘first run’ tox issue
  • fix tests with new wsgi_intercept 0.6 and require that version
  • Match IPv6 addresses as anonymous users
  • handle __NOGLOSSARY__ magicword
  • Fix #37

2013-08-09 mwlib 0.15.11

  • fix possible problems on solaris containers in init_tmp_cleaner
  • don’t waste people’s lifetime in init_tmp_cleaner
  • fix xnet tests
  • use os.urandom in utils.uid
  • generate junitxml files
  • remove empty reference nodes

2013-07-04 mwlib 0.15.10

  • add some tests for purge_cache
  • don’t step into nested directories in purge_cache
  • catch errors while examining directory in purge_cache
  • only log error if it’s not ENOENT in purge_cache
  • Add –serve-files-address parameter to nslave.
  • make make-manifest useable as pre-commit hook
  • is_good_baseurl(): eliminate some false positives

2013-07-02 mwlib 0.15.9

  • set timeout for makezip in postman
  • remove more template blacklisting/template exclusion handling code
  • get rid of template blacklisting/print templates in nslave and postman
  • mention that template blacklisting and print templates do not work anymore.
  • use tox 1.5’s whitelist_externals in order to suppress warnings
  • fix imports in test_nserve.py and move it to tests directory

2013-04-23 mwlib 0.15.8

  • do not install pil in tox testenv
  • install Pillow
  • also fetch used images when fetching ‘redirected revisions’

2013-04-23 mwlib 0.15.7

  • remove explicitly positioned nodes regardless of nesting level. fix bug where children were skipped and not removed

2013-03-26 mwlib 0.15.6

  • fix redirect handling when fetching by articles by revision

2013-03-26 mwlib 0.15.5

  • fix redirect handling

2013-03-26 mwlib 0.15.4

  • fix missing img attribute translation
  • remove duplicate coordinates

2013-03-12 mwlib 0.15.3

  • fix nserve, nslave, postman

2013-03-12 mwlib 0.15.2

  • use post request when posting text to action=expandtemplates

2013-03-12 mwlib 0.15.1

  • fix mw-serve-ctl

2013-03-12 mwlib 0.15.0

Note

you’ll have to adapt your start scripts, some programs have been renamed!

Note

Unfortunately the ‘template blacklisting’ and ‘print templates’ functionality had to be removed in order to support the scribunto extension. The documentation has not been updated and may still mention those features.

  • nslave.py, nserve.py, postman.py have been renamed to nslave, nserve and postman
  • require python 2.6, python 2.5 isn’t supported anymore
  • fetch expanded articles
  • force pyparsing < 2
  • remove open street maps used in wikivoyage - they can’t be rendered currently
  • fix for missing revid attribute
  • fix and improve wikivoyage tagextensions
  • allow item lists in div
  • transform single-col, single-row table into div, even if it is an “infobox”
  • tweak region lists for wikivoyage
  • fix bug for article http://en.wikivoyage.org/wiki/Africa (and possibly more from wikivoyage)
  • quick hack to expand the {{REVISIONID}}

2012-12-04 mwlib 0.14.3

  • prefer UTF-8 locales for use in formatnum

2012-12-03 mwlib 0.14.2

  • remove byte order mark (bom) in _do_request
  • return unicode from formatnum
  • improve table border code
  • add noprint css class “rellink”

2012-09-24 mwlib 0.14.1

  • implement locale aware formatnum
  • implement wikipedia’s braindamaged scientific notation
  • adapt single col splitting heuristics of treecleaner

2012-06-18 mwlib 0.14.0

  • get rid of the _Version class, up version to 0.14.0
  • install scripts via plain old distutils instead of “console_scripts” entry point
  • remove cdbwiki
  • remove mwlib.xfail, use pytest.mark.xfail instead
  • expect setuptools or distribute to be installed
  • remove some problematic dependencies in PP_MAINTAINER mode

2012-06-18 mwlib 0.13.11

  • skip checkpil if PP_MAINTAINER is set
  • relax simplejson requirement a bit
  • fix content disposition header when filenames contain commas
  • make it easier to test the content disposition logic

2012-06-17 mwlib 0.13.10

  • fix handling of filenames with spaces

2012-06-17 mwlib 0.13.9

  • use filenames derived from content for downloads
  • synchronize documentation with MediaWiki

2012-06-11 mwlib 0.13.8

  • do not embed apipkg anymore
  • make sure temp files are removed even if mw-render is killed

2012-05-08 mwlib 0.13.7

  • unconditionally require simplejson
  • workaround a inspect module bug
  • fix pypi url used by tox
  • improve transformSingleColTables in treecleaner
  • expose DumpParser’s redirect-ignoring functionality as an optional boolean command-line flag to mw-buildcdb

2012-03-07 mwlib 0.13.6

  • make mw-zip -gg post test.pediapress.com
  • implement protocol relative urls in named links

2012-02-29 mwlib 0.13.5

  • simplify the brain-damaged iferror_rx regular expression, fixes #10
  • support syntaxhighlight nodes

2012-02-15 mwlib 0.13.4

  • require qserve >= 0.2.7 in order to be compatible with the latest gevent
  • move our custom argument parser to mwlib
  • prefer simplejson to json
  • allow nserve to listen on a specific interface with -i/–interface
  • fix styleutils: limit rgb values to [0,1]
  • remove mw-watch in setup.py

2012-01-12 release 0.13.3

  • fix pagename when expanding <pages> tag
  • handle the case where NAMESPACE is called as a template
  • get rid of lxml warnings

2012-01-11 release 0.13.2

  • add support for adding spacing for cjk text
  • add initial support for the pages tag
  • protect page-break info from removal in divs and spans

2011-12-13 release 0.13.1

  • replaced mw-serve with nserve.py
  • removed CGI support
  • removed lots of obsolete code
  • updated documentation, available online at http://mwlib.readthedocs.org

2011-10-24 release 0.12.17

  • handle siteinfo without “magicwords” key in templ.parser
  • use gevent instead of twisted in mw-zip/mw-render
  • show memory usage in mw-zip
  • use sqlite3dmb to store html
  • fix directionality of math nodes for RTL documents

2011-08-31 release 0.12.16

  • remove xhtmlwriter
  • remove docbookwriter
  • fix_wikipedia_siteinfo for kdb, ltg and xmf
  • remove zipwiki
  • implement safesubst
  • match noinclude and onlyinclude tags with whitespace
  • bail out when running setup.py with an unsupported python version

2011-08-12 release 0.12.15

  • require lxml.
  • dont switch fonts for direction switch chars lrm/rlm
  • set teletype style by css
  • fix rtl direction check bug
  • quick fix in order to support the kbd tag.
  • fix switch statements with localized #default case.
  • dont remove direction switching nodes
  • resolve aliases when expanding templates.
  • support localized parser functions.
  • make tests work with latest py.test 2.1.
  • add support for css direction switching
  • Code and Var nodes now use teletype style
  • be more verbose when collection params can not be retrieved
  • fix subpage links (bugzilla #28055)
  • fix for https://bugzilla.wikimedia.org/show_bug.cgi?id=29354
  • dont die on treecleaner errors
  • remove paragraphs from galleries
  • add license templates
  • get rid of some more parsing calls
  • cache img display info in licensehandler
  • speed up getting template args (for licensehandling)
  • always show full text of contributors of images
  • fix for getAllDisplayText
  • add nofilter to licensehandling
  • make licensechecker less fragile to bad config format
  • improve image license handling
  • improve stats for licensechecker
  • add custom element to metabook
  • dont throw away collapsible boxes. fixes: #935
  • decrease api_request_limit
  • limit max. simultaneous img downloads to 15
  • moar categories. less whitespace. untangle revision/category fetching
  • increase standard resolution of images
  • fix getting html with revisions
  • clean up after fixNesting
  • fetch extension images
  • prevent adding same api url twice
  • retry failed img downloads
  • workaround for missing descriptionurl
  • fix: descriptionurl returned from api seems be “false” sometimes.
  • fix for #925. make syntaxhighlighting work again
  • fix for #755
  • support older mediawikis
  • add lower bound on word splitting hints
  • mwlib.refine: parse <caption> tags inside tables
  • be more generous when trying to detect see also
  • fix for “See Also “Section removal
  • fix #905: remove See also sections.
  • remove edit links
  • magics.py: handle second argument to fullurl magic function.
  • convert tiff images to png
  • fix for infobox detection
  • handle Abbreviation node in xhtmlwriter
  • add Abbreviation node
  • improve table splitting

2010-10-29 release 0.12.14

  • magics.py: fix NS magic function.
  • refine/core.py: do not parse links if link target would contain newlines.
  • setup.py: require lockfile==0.8.
  • add xr formatting in #time
  • replace mwlib.async with qserve package.
  • move fontswitcher to writer dir
  • remove collapsible elements
  • fix for #830
  • move gallery nodes out of tables.
  • handle overflow:auto crap
  • fix for reference handling
  • better handling for references nodes.
  • fix for ReferenceLists
  • fix whitespace handling and implicit newlines in template arguments. fixes http://code.pediapress.com/wiki/ticket/877.
  • Add support for more PageMagic as per http://meta.wikimedia.org/wiki/Help:Magic_words
  • Fix PageMagic to consider page as argument
  • fetch parsed html from mediawiki and store it as parsed_html.json. We store the raw result from mediawiki since it’s not clear what’s really needed.
  • make mwapi work for non query actions.

2010-7-16 release 0.12.13

  • omit passwords from error file
  • make login work with latest mediawiki.
  • use content_type, not content-type in metabooks
  • filter crap from ref node names
  • try to set GDFONTPATH to some sane value. call EasyTimeline with font argument.
  • do not scale easytimeline images after rendering rather scale then in EasyTimeline.pl
  • update EasyTimeline to 1.13
  • another fix for nested references
  • fix for broken tables
  • make #IFEXIST handle images
  • add treecleaner method to avoid large cells
  • fix img alignment
  • fix nesting of section with same level
  • do not let tablemode get negative.
  • fix #815
  • call fix_wikipedia_siteinfo based on contents of server (instead of sitename)
  • workaround for broken interwikimap. fixes #807
  • handle the case, where the <br> ends up in a new paragraph. fixes #804
  • move the poem tag implementation to mwlib.refine.core and make it expand templates
  • add #ifeq node. fixes #800
  • fix for images with spaces in file extensions
  • fix and test for #795
  • pull tables out of DefinitionDescriptions
  • add getVerticalAlign to styleutils
  • remove tables from image captions
  • remove –clean-cache option to mw-serve
  • allow floats as –purge-cache argument
  • workaround for buggy lockfile module.
  • implement DISPLAYTITLE
  • generate higher resolution timelines
  • handle abbr and hiero tags
  • make sure print_template_pattern is written to nfo.json, when getting it as part of the collection params
  • relax odfpy requirement a bit
  • make hash-mark only links work again
  • remove empty images

2009-12-16 release 0.12.12

  • dont remove sections containing only images.
  • improve handling of galleries
  • fix use of uninitialized last variable
  • do not ‘split’ links when expanding templates
  • quick workaround for http://code.pediapress.com/wiki/ticket/754

2009-12-8 release 0.12.11

  • beware python 2.4 is not supported anymore
  • parse paragraphs before spans
  • parse named urls before links.
  • fix urllinks inside links
  • fix named urls inside double brackets
  • avoid splitting up Reference nodes.
  • parse lines/lists before span.
  • add getScripts method. improve rtl compat. for fontswitching
  • do not replace uniq strings with their content when preprocessing gallery tags. fixes e.g. ref tags inside gallery tags.
  • run template expansion for each line in gallery tags
  • handle mhr, ace, ckb, mwl interwiki links
  • add clearStyles method
  • add another condition to avoid single col tables in border-boxes
  • refactor node style handling
  • remove fixInfoBoxes from treecleaner
  • fix for identifiying image license information
  • handle closing ul/ol tags inside enumerations
  • correctly determine text alignment of node.
  • fix for image only table check
  • add code for simple rpc servers/clients based on the gevent library.
  • add flag for split itemlists
  • do not blacklist articles
  • add upper limit for font sizes

2009-10-20 release 0.12.10

  • fix race condition when fetching siteinfo
  • introduce flag to suppress automatic escaping when cleaning text
  • sent error mails only once
  • add ‘pageby’, ‘uml’, ‘graphviz’, ‘categorytree’, ‘summary’ to list of tags to ignore

2009-10-13 release 0.12.9

  • fix #709
  • allow higher resolution in math formulas
  • fetch collection parameters and use them (template exclusion category,…)
  • fix #699
  • fix <ref> inside table caption
  • refactor filequeue
  • adjust table splitting parameter
  • move invisible, named references out of table nodes
  • fix late #if
  • fix bug with inputboxes
  • fix parsing of collection pages: titles/subtitles may but do not need to have spaces
  • use new default license URL
  • fix race condition in mw-serve/mw-watch

2009-9-25 release 0.12.8

  • fix argument handling in mw-serve Previously it had been possible to overwrite any file by passing arguments containing newlines to mw-serve.

2009-9-23 release 0.12.7

  • ensure that files extracted from zip files end up in the destination directory.

2009-9-15 release 0.12.6

  • fix for reference nodes
  • allow most characters in urls
  • fix for setting content-length in response
  • fix problem with blacklisted templates creating preformatted nodes (#630)
  • do not split preformatted nodes on non-empty whitespace only lines
  • do not create preformatted nodes inside li tags
  • pull garbage out of table rows. fix #17.
  • dont remove empty spans if an explicit size is given.
  • uncomment fix_wikipedia_siteinfo and add pnb as interwiki link
  • remove mwxml writer.
  • add mw-version program

2009-9-8 release 0.12.5

  • fix missing page case in get_page when looking for redirects
  • some minor bugfixes

2009-8-25 release 0.12.3

  • better compatibility with older mediawiki installations

2009-8-18 release 0.12.2

  • fix status callbacks to pod partner

2009-8-17 release 0.12.1

  • added mw-client and mw-check-service
  • mw-serve-ctl can now send report mails
  • fixes for race conditions in mwlib.filequeue (mw-watch)
  • lots of other improvements…

2009-5-6 release 0.11.2

  • fixes

2009-5-5 release 0.11.1

  • merge of the nuwiki branch: better, faster resource fetching with twisted_api, new ZIP file format with nuwiki

2009-4-21 release 0.10.4

  • fix chapter handling
  • fix bad #tag params

2009-4-17 release 0.10.3

  • fix issue with self-closing tags
  • fix issue with “disappearing” table rows

2009-4-15 release 0.10.2

  • fix for getURL() method in zipwiki

2009-4-9 release 0.10.1

  • the parser has been completely rewritten (mwlib.refine)
  • fix bug in recorddb.py: do not overwrite articles
  • removed mwapidb.WikiDB.getTemplatesForArticle() which was broken and wasn’t used.

2009-3-5 release 0.9.13

  • normalize template names when checking against blacklist
  • make NAMESPACE magic work for non-main namespaces
  • make NS template work

2009-03-02 release 0.9.12

  • fix template expansion bug with non self-closing ref tags containing equal signs

2009-2-25 release 0.9.11

  • added –print-template-pattern
  • fix bug in LOCALURLE with non-ascii characters (#473)
  • fix ‘upright’ image modifier handling (#459)
  • allow star inside URLs (#483)
  • allow whitespace in image width modifiers (#475)

2009-2-19 release 0.9.10

  • do not call check() in zipcreator: better some missing articles than an error message

2009-2-18 release 0.9.8

  • localize image modifiers
  • fix bug in serve with forced rendering
  • fix bug in writerbase when no URL is returned
  • return only unqiue image contributors, sorted
  • #expr with whitespace only argument now returns the empty string instead of marking the result as an error.
  • added mw-serve-ctl command line tool (#447)
  • mwapidb: omit title in URLs with oldid
  • mwapidb: added getTemplatesForArticle()
  • zipcreator: check articles and sources to prevent broken ZIP files
  • mwapidb: do query continuation to find out all authors (#420)
  • serve: use a deterministic checksum for metabooks (#451)

2009-2-9 release 0.9.7

  • fix bug in #expr parsing
  • fix bug in localised namespace handling/#ifexist
  • fix bug in redirect handling together with specific revision in mwapidb

2009-2-3 release 0.9.6

  • mwapidb: return authors alphabetically sorted (#420)
  • zipcreator: fixed classname from DummyScheduler to DummyJobScheduler; this bug broke the –no-threads option
  • serve: if rendering is forced, don’t re-use ZIP file (#432)
  • options: remove default value “Print” from –print-template-prefix
  • mapidb: expand local* functions, add them to source dictionary
  • expander: fix memory leak in template parser (#439)
  • expander: better noinclude, includeonly handling (#426)
  • expander: #iferror now uses a regular expression (#435)
  • expander: workaround dateutils bug (resulting in a TypeError: unsupported operand type(s) for +=: ‘NoneType’ and ‘int’)

2009-1-26 release 0.9.5

  • initial release

Indices and tables