This directory contains some of the functionality needed to provide
eager update everywhere replication for PostgreSQL version 6.4.2.

		  (The basic work for this replication concept has been done at ETH
		  Zurich in the context of the DRAGON project. Main contributors are

		  Win Bausch  (bausch@inf.ethz.ch)
		  Michael Baumer (baumer@inf.ethz.ch)
		  Ignaz Bachmann
		  Bettina Kemme (kemme@cs.mcgill.ca))


Description of the Files (further information can be found within the
files):

replicaManger.c:

Control of the replication protocol takes place at the replication
manager (created at server start-up by postmaster).  The replication
manager is implemented as a message handling process. It receives
messages from the local and remote backends and forwards write sets
and decision messages via the communication manager to the other
sites. It also receives the messages delivered by the communication
manager and forwards them to the corresponding backends. The
replication manager keeps track of the states of all the transactions
running at the local server. This includes in which phases they are
(read, send, lock or write phase), and which messages have been
delivered so far. This is necessary in order to trigger the
appropriate actions in the case of a failure. The replication manager
maintains a two-way-channel implemented as buffered Unix sockets to
each backend (libpq/bufferedSock.c).  The channel between a local
backend and the replication manager is created when the backend sends
its first transaction to the replication manager and closes when the
client disconnects and the backend is killed. The channel between a
remote backend and the replication manager is created once at backend
startup time and maintained until Postgres-R is shut down. 

pg_ensemble_simple.c
pg_ensemble_global.c

implement the two communication managers (one for globally ordered
messages, one for simple ordered messages). They are also created at
server start-up by the postmaster.  They provide a simple socket based
interface between the replication manager and the group communication
system Ensemble. The communication managers of all servers are the
members of the communication group and messages are multicast within
this group.  The separation between replication and communication
managers allows us to hide the interface and characteristics of the
group communication system. If another group communication system is
used (Transis, Spread etc._, then only the communication manager files
have to be rewritten. The replication manager maintains two-way
channels (again implemented as Unix sockets) to the communication
managers: broadcast channels to send messages, a total-order channel
to receive totally ordered write sets and a no-order channel to listen
for decision messages from the communication system.  There are two
receiving channels because we want decision messages to be received at
any time, while reception of totally ordered write sets will be
blocked in certain phases.


writeset.c

contains 
* all the functionality to marshall and unmarshall
messages. Might be replaced by other functions that access the WAL
log.
* calls newly created functions in mainExec (executor) to execute write
sets.
* calls newly created functions in lock.c/multi.c etc. to request
  all locks for a write set in an atomic step.

rmgrLib.c

Functions provideded in this file are used by backends
to communicate with the replication manager

