$Header: /home/t-ishii/repository/pgpool/README,v 1.15 2004/02/05 14:27:44 t-ishii Exp $

pgpool version 0.1.8 (palia) README

1. What is pgpool

   pgpool is a connection server program for PostgreSQL. pgpool runs
   between PostgreSQL's client(frontend) and server(backend). Any
   PostgreSQL clients can connect to pgpool as if it's a real
   PostgreSQL server.

   pgpool caches the connection to PostgreSQL server to reduce the
   overhead to establish the connection to it.

   Also pgpool could use two PostgreSQL servers for fail over
   purpose. If the first server goes down, pgpool will automatically
   switch to the secondary server.

2. Advantages of pgpool

   There are some connection pool servers other than pgpool. This
   section explains why you should use pgpool:

   1) you do not need to modify your applications

   There are some connection pool servers which require special
   API(Application Program Interface) to play with them. Since pgpool
   looks like PostgreSQL server from the client's point of view,
   existing PostgreSQL applications can be used without any
   modifications.

   2) any programming languages can be used

   Since pgpool is not an API, applications written in any languages
   including PHP, Perl and Java can be used.

   3) employing prefork architecture

   pgpool employing prefork architecture, meaning no need to start up
   process for each connection from its clients, gives better
   performance.

   4) resource usage control

   pgpool can limit number of connections to PostgreSQL server. Users
   could avoid too much load of PostgreSQL by using pgpool especially
   under Web application environment.

   5) fail over

   pgpool has a functionality so called "fail over". If the first
   server goes down, pgpool will automatically switch to the secondary
   server.


3. Disadvantage of pgpool

   1) overhead

   Any access to PostgreSQL must go through pgpool, which means some
       overhead is added to each database access. In my testing using
       pgbench shows 7 to 15% performance penalty. This number may
       vary for each testing environment though.

   2) not all libpq protocols are supported

   currently following protocols are not supported:

   o any authentication methods except "trust"

   3) no access control to pgpool using pg_hba.conf

   Any client can connect to pgpool. If this is your concern, you
   could limit access by using another software such as iptables.

4. supported environments

   pgpool supports libpq protocol version 2(employed by PostgreSQL 7.0
   to 7.3). If you are going to use with PostgreSQL 7.1 or earlier,
   you need to modify following line in pool.h:

   change:

   #undef NO_RESET_ALL

   to:

   #define NO_RESET_ALL

   Here are small lists from users where pgpool is running:

   Vine Linux 2.6CR (kernel 2.4.20-0vl29.1)/PostgreSQL 7.3.4
   RedHat Linux 8.0 (kernel 2.4.18-14)/PostgreSQL 7.3.2
   FreeBSD 4.7-RELEASE/PostgreSQL 7.2.4 
   FreeBSD 4.2-RELEASE/PostgreSQL 7.3.2

5. How to install pgpool

      ./configure
      make
      make install

      of course "make" should be read as "gmake" if you are using
      FreeBSD or Solaris.

      Default installation directories are:

      /usr/local/bin/pgpool		pgpool executable
      /usr/local/etc/pgpool.conf.sample	example configuration file

      You could change the installation directory by giving --prefix
      option to configure:
      
      configure --prefix=path... 

6. Setting up pgpool.conf

   pgpool.conf is the configuration file for pgpool.

   Copy pgpool.conf.sample as pgpool.conf and change it if neccessary.

   Here is a explanation of pgpool.conf's grammar.

   1) configuration variables can be set by:

      item = value

      pair.

   2) if the value is a numeric, just write numerics. If the value is
      a string, you need to quote using single quote pair. example:
      
      'foo'

   3) empty lines are ignored

   4) lines starting with # are ignored.


   Here is a list for existing items:

   allow_inet_domain_socket

   if you allow connections via TCP/IP network, set it to 1. Default
   value is 0. Note that connections via UNIX domain sockets are
   always allowed.

   port

   the port number where pgpool is running on. Default value is 9999.

   backend_host_name

   the real PostgreSQL server name pgpool could connect. Default
   value is '' (empty string), which means pgpool will connect via
   UNIX domain sockets. Any string other than '' is considered as a
   host name where the PostgreSQL server is running. In this case the
   pg_hba.conf file must be properly set so that pgpool could connect
   to.

   backend_port

   the port number where real PostgreSQL server is running on. Default
   value is 5432.

   secondary_backend_host_name

   if you are going to use fail over functionality of pgpool, you need
   to set the hostname or ''. Default value is ''.

   secondary_backend_port

   if you are going to use fail over functionality of pgpool, you need
   to set the port number where PostgreSQL is running on. Default
   value is 0, which means the fail over functionality is disabled.

   num_init_children

   number of pgpool process initially forked. Default value is 16.

   max_pool

   number of connection pools each pgpool server process are keeping.
   pgpool will make a new connection if there's no user name and
   database name pair yet. Thus max_pool must exceeds the number of
   such that possible pairs. Otherwise you will get "too many
   connections already" errors. The default value is 1.

   note that the total number of connections to the PostgreSQL server
   can be calculated by following:

   num_init_children*max_pool

   connection_life_time

   life time for each idle connection in seconds. 0 means the life
   time is forever. The default value is 0.

   logdir

   the directory name to store pgpool's log files. Currently only a
   file named pgpool.pid(has pgpool's process id) is stored. The
   default value for logdir is '/tmp'.

7. Starting pgpool

   The simplist way to start pgpool is:

   $ pgpool

   pgpool will load /usr/local/etc/pgpool.conf.

   available options for pgpool are:

   -f path

   the path to the configuration file.

   -n

   do not start as daemon. Error messages go to stdout or stderr. Thus
   you could play with utilities such as logger and rotatelogs. You
   need to run in background explicitly if you use this option.

   -d

   lots of debugging messages come out

   -h

   print the help message and quit

8. Stopping pgpool

   You could stop by using "stop" option:

   $ pgpool stop

9. Playing with regression test

   $ cd /usr/local/src/postgresql-7.3.4/src/test/regress
   $ make all
   $ ./pg_regress --schedule=parallel_schedule --port=9999

10. Playing with benchmarking

   Here is a brief explanation how to play with benchmarking using pgbench/PHP/ab.

   Initialize the pgbench database.

   $ pgbench -i test
  
   Prepare PHP script. Here is an exmaple PHP script.

   <?php
    ini_set("track_errors", "1");
    define_syslog_variables();

    $con = pg_pconnect("dbname=test user=postgres port=9999");
    if ($con == FALSE) {
      syslog(LOG_ERR, "could not connect $php_errormsg");
      trigger_error("Could not connect to DB", E_USER_ERROR);
      exit;
    }
    $aid = rand(1,10000);
    pg_query($con, "SELECT * FROM accounts WHERE aid = $aid");
    pg_close($con);
  ?>

  run ab.

  $ /usr/local/apache/bin/ab -c 100 -n 1000 "http://localhost/bench.php"
