Installing Cyrus Backups

    Cyrus Backups are a replication-based backup service for Cyrus IMAP
    servers. This is currently an experimental feature. If you have the
    resources to try it out alongside your existing backup solutions,
    feedback would be appreciated. 

Introduction and assumptions

    This document is intended to be a guide to the configuration and
    administration of Cyrus Backups.

    This document is a work in progress and at this point is incomplete. 

    This document assumes that you are familiar with compiling, installing,
    configuring and maintaining Cyrus IMAP servers generally, and will only
    discuss backup-related portions in detail. 

    This document assumes a passing familiarity with Cyrus Replication. 

Limitations

    Cyrus Backups are experimental and incomplete. 

    The following components exist and appear to work: 

      * backupd, and therefore inbound replication 
      * autovivification of backup storage for new users, with automatic
        partition selection 
      * rebuilding of backup indexes from backup data files 
      * compaction of backup files to remove stale data and combine chunks
        for better compression 
      * deep verification of backup file/index state 
      * examination of backup data 
      * locking tool, for safe non-cyrus operations on backup files 

    The following components don't yet exist in a workable state -- these
    tasks must be massaged through manually (with care):

      * recovery of data back into a Cyrus IMAP server 
      * reconstruct of backups.db from backup files 

    The following types of information are currently backed up 

      * mailbox state and annotations 
      * messages 
      * mailbox message records, flags, and annotations 

    The following types of information are not currently backed up 

      * sieve scripts 
      * subscriptions 
      * quota information 
      * seen data (?) other than the basic \Seen flag attached to a mailbox
        message record 

Architecture

    Cyrus Backups are designed to run on one or more standalone, dedicated
    backup servers, with suitably-sized storage partitions. These servers
    generally do not run an IMAP daemon, nor do they have conventional
    mailbox storage. 

    Your Cyrus IMAP servers synchronise mailbox state to the Cyrus Backup
    server(s) using the Cyrus replication (aka sync, aka csync) protocol. 

    Backup data is stored in two files per user: a data file, containing
    gzipped chunks of replication commands; and an SQLite database, which
    indexes the current state of the backed up data. User backup files are
    stored in a hashed subdirectory of their containing partition. 

    A twoskip database, backups.db, stores mappings of users to their
    backup file locations 

Installation

Requirements

      * At least one Cyrus IMAP server, serving and storing user data. 
      * At least one machine which will become the first backup server. 

Cyrus Backups server

     1. Compile cyrus with the --enable-backup configure option and install
        it. 
     2. Set up an imapd.conf file for it with the following options
        (default values shown): 

        backup_db: twoskip 
        backup_db_path: {configdirectory}/backups.db 
                The backups db contains a mapping of user ids to their
                backup locations 
        backup_staging_path: {temp_path}/backup 
                Directory to use for staging message files during backup
                operations. The replication protocol will transfer as many
                as 1024 messages in a single sync operation, so,
                conservatively, this directory needs to contain enough
                storage for 1024 * your maximum message size * number of
                running backupd's, plus some wiggle room. 
        backup_retention_days: 7 
                Number of days for which backup data (messages etc) should
                be kept within the backup storage after the corresponding
                item has been deleted/expunged from the Cyrus IMAP server. 
        backuppartition-name: /path/to/this/partition 
                You need at least one backuppartition-name to store backup
                data. These work similarly to regular/archive IMAP
                partitions, but note that there is no relationship between
                backup partition names and regular/archive partition names.
                New users will be have their backup storage provisioned
                according to the usual partition selection rules. 
        backup_compact_minsize: 0 
                The ideal minimum data chunk size within backup files, in
                kB. The compact tool will try to combine chunks that are
                smaller than this into neighbouring chunks. Larger values
                tend to yield better compression ratios, but if the data is
                corrupted on disk, the entire chunk will become unreadable.
                Zero turns this behaviour off. 
        backup_compact_maxsize: 0 
                The ideal maximum data chunk size within backup files, in
                kB. The compact tool will try to split chunks that are
                larger than this into multiple smaller chunks. Zero turns
                this behaviour off. 
        backup_compact_work_threshold: 1 
                The number of chunks within a backup file that must
                obviously need compaction before the compact tool will
                attempt to compact the file. Larger values are expected to
                reduce compaction I/O load at the expense of delayed
                recovery of storage space. 

     3. Create a user for authenticating to the backup system, and add it
        to the admins setting in imapd.conf 
     4. Add appropriate sasl_* settings for your authentication method to
        imapd.conf 
     5. Set up a cyrus.conf file for it: 

          * In the SERVICES section, arrange for backupd to run:
            backupd cmd="backupd" listen="csync" prefork=0 
          * You probably don't need any other SERVICES entries 
          * In the EVENTS section, arrange for compaction to occur at some
            interval(s)
            compact cmd="ctl_backups compact -A" at=0400 

     6. Start up the server, and use synctest to verify that you can
        authenticate to backupd 

Cyrus IMAP servers

    Your Cyrus IMAP servers must be running version 3 or later of Cyrus,
    and must have been compiled with the --enable-replication configure
    option. It does not need to be recompiled with the --enable-backup
    option. 

    It's recommended to set up a dedicated replication channel for backups,
    so that your backup replication can coexist independently of your other
    replication configurations 

    Add settings to imapd.conf like: 

    sync_log_channels: channel 
            Add a new channel "channel" to whatever was already here.
            Suggest calling this "backup" 
    sync_log: 1 
            Enable sync log if you want rolling replication to the backup
            server (and if it wasn't already) 
    channel_sync_host: backup-server.example.com 
            The host name of your Cyrus Backup server 
    channel_sync_port: csync 
            The port on which your Cyrus Backup server's backupd process
            listens 
    channel_sync_authname: ... 
    channel_sync_password: ... 
            Credentials for authenticating to the Cyrus Backup server 
    channel_sync_repeat_interval: 1 
            Minimum time in seconds between rolling replication runs.
            Smaller value means livelier backups but more network I/O.
            Larger value reduces I/O. 

    Update cyrus.conf to arrange for replication to occur. If you want to
    use rolling replication, add a sync_client invocation to the SERVICES
    section specifying (at least) the -r and -n channel options. 

    If you want to use scheduled replication, add sync_client invocations
    to the EVENTS section (or cron, etc), specifying at least the -n
    channel option (to use the channel-specific configuration), plus
    whatever other options you need for selecting users to back up. See the
    sync_client manpage for details. 

Administration

Storage requirements

    It's not really known yet how to predict the storage requirements for a
    backup server. Experimentation in dev environment suggests around
    20-40% compressed backup file size relative to the backed up data,
    depending on compact settings, but this is with relatively tiny
    mailboxes and non-pathological data. 

    The backup staging spool conservatively needs to be large enough to
    hold an entire sync's worth of message files at once. Which is your
    maximum message size * 1024 messages * the number of backupd processes
    you're running, plus some wiggle room probably. In practice it'll
    probably not hit this limit unless someone is trying to. (Most users, I
    suspect, don't have 1024 maximum-sized messages in their account, or
    don't receive them all at once anyway.) 

    Certain invocations of ctl_backups and cyr_backup also require staging
    spool space, due to the way replication protocol (and thus backup data)
    parsing handles messages. So keep this in mind I suppose. 

Initial backups

    Once a Cyrus Backup system is configured and running, new users that
    are created on the IMAP servers will be backed up seamlessly without
    administrator intervention. 

    The very first backup taken of a pre-existing mailbox will be big --
    the entire mailbox in one hit. It's suggested that, when initially
    provisioning a Cyrus Backup server for an existing Cyrus IMAP
    environment, that the sync_client commands be run carefully, for a
    small group of mailboxes at a time, until all/most of your mailboxes
    have been backed up at least once. Also run ctl_backups compact on the
    backups, to break up big chunks, if you wish. Only then should you
    enable rolling/scheduled replication. 

Restoring from backups

    There's no dedicated tooling for this (yet). For now, you need to use
    cyr_backup invocations to extract the relevant information, then
    massage it back into the user's mailbox(es) by hand (and then probably
    reconstruct). 

File locking

    All backupd/ctl_backups/cyr_backup operations first obtain a lock on
    the relevant backup file. ctl_backups and cyr_backup will try to do
    this without blocking (unless told otherwise), whereas backupd will
    block.

    This means, for now, that working on a user's backup will cause a
    backup replication stall if that user was about to be backed up but is
    already locked. 

    It's anticipated that in the future backupd will (configurably) not
    block waiting for a lock, but sync_client doesn't currently know how to
    deal with a MAILBOX_LOCKED response, so backupd needs to block and wait
    for now. 

Moving backup files to different backup partitions

    There's no tool for this (yet). To do it manually, stop backupd, copy
    the files to the new partition, then use cyr_dbtool to update the
    user's backups.db entry to point to the new location. Run ctl_backups
    verify on both the new filename (-f mode) and the user's userid (-u
    mode) to ensure everything is okay, then restart backupd. 

Provoking a backup for a particular user/user group/everyone/etc right now

    Just run sync_client by hand with appropriate options (as cyrus user,
    of course). See its man page for ways of specifying items to replicate. 

What about tape backups?

    As long as backupd, ctl_backups and cyr_backup are not currently
    running (and assuming no-one's poking around in things otherwise), it's
    safe to take/restore a filesystem snapshot of backup partitions. So to
    schedule, say, a nightly tape dump of your Cyrus Backup server, make
    your cron job shut down Cyrus, make the copy, then restart Cyrus. 

    Meanwhile, your Cyrus IMAP servers are still online and available.
    Regular backups will resume once your backupd is running again. 

    If you can work at a finer granularity than file system, you don't need
    to shut down backupd. Just use ctl_backups lock to hold a lock on each
    backup while you work with its files, and the rest of the backup system
    will work around that. 

    Restorating is more complicated, depending on what you actually need to
    do: when you restart the backupd after restoring a filesystem snapshot,
    the next time your Cyrus IMAP server replicates to it, the restored
    backups will be brought up to date. Probably not what you wanted -- so
    don't restart backupd until you've done whatever you were doing. 

Multiple IMAP servers, one backup server

    This is fine, as long as each user being backed up is only being backed
    up by one server (or they are otherwise synchronised). If IMAP servers
    have different ideas about the state of a user's mailboxes, one of
    those will be in sync with the backup server and the other will get a
    lot of replication failures. 

Multiple IMAP servers, multiple backup servers

    Make sure your sync_client configuration(s) on each IMAP server knows
    which users are being backed up to which backup servers, and selects
    them appropriately. See the sync_client man page for options for
    specifying users, and run it as an event (rather than rolling). 

    Or just distribute it at server granularity, such that backup server A
    serves IMAP servers A, B and C, and backup server B serves IMAP servers
    D, E, F, etc. 

One IMAP server, multiple backup servers

    Configure one channel plus one rolling sync_client per backup server,
    and your IMAP server can be more or less simultaneously backed up to
    multiple backup destinations. 

Reducing load

    To reduce load on your client-facing IMAP servers, configure sync log
    chaining on their replicas and let those take the load of replicating
    to the backup servers. 

    To reduce network traffic, do the same thing, specifically using
    replicas that are already co-located with the backup server. 

Other setups

    The use of the replication protocol and sync_client allows a lot of
    interesting configuration possibilities to shake out. Have a rummage in
    the sync_client man page for inspiration. 

Tools

ctl_backups

    This tool is generally for mass operations that require few/fixed
    arguments across multiple/all backups 

    Supported operations: 

    compact Reduce backups' disk usage by: 

              * combining small chunks for better gzip compression --
                especially important for hot backups, which produce many
                tiny chunks 
              * removing deleted content that has passed its retention
                period 

            Note that the original backup/index files are preserved (with a
            timestamped filename), so that in the event of compact
            bugs/failures, data is not lost. But this also means that
            compact actually increases disk usage in practice, until the
            old files are cleaned up. This will probably be automated in
            some way once things are stable and reliable. 
    list    List known backups. Add more -v's for more detail. 
    lock    Lock a single backup, so you can safely work on it with
            non-cyrus tools. (This may be moved into cyr_backup at some
            point.) 
    reindex Regenerate indexes for backups from their data files. Useful if
            index becomes corrupted by some bug, or invalidated by working
            on data with non-cyrus tools. Note that the original index file
            is preserved (with a timestamped filename), so that in the
            event of reindex bugs/failures, data is not lost. But this also
            means that reindex increases disk usage in practice, until the
            old files are cleaned up. This will probably be automated in
            some way once things are stable and reliable. 
    verify  Deep verification of backups. Verifies that: 

              * Checksums for each chunk in index match data 
              * Mailbox states are in the chunk that the index says they're
                in 
              * Mailbox states match indexed states 
              * Messages are in the chunk the index says they're in 
              * Message data checksum matches indexed checksums 

    There's no man page yet, but run it without arguments to see a full
    usage summary 

cyr_backup

    This tool is generally for operations on a single mailbox that require
    multiple additional arguments 

    Supported operations 

    list [ chunks | mailboxes | messages | all ] 
            Line-per-item listing of information stored in a backup. 
    show [ chunks | mailboxes | messages ] items... 
            Paragraph-per-item listing of information for specified items.
            Chunk items are specified by id, mailboxes by mboxname or
            uniqueid, messages by guid. 
    dump [ chunk | message ] item 
            Full dump of one item. chunk dumps the uncompressed content of
            a chunk (i.e. a bunch of sync protocol commands). message dumps
            a raw rfc822 message (useful for manually restoring until a
            proper restore tool appears) 

    There's no man page yet, but run it without arguments to see a full
    usage summary 
