Installation and prerequisites
==============================

keygrabber is deliberately designed to function with modest hardware
requirements, and a minimum of (freely-available) third-party software
to compose its runtime environment.


Computing platform
------------------

keygrabber is used extensively on Linux platforms, dominated by `CentOS 6
<http://centos.org>`_, a clone of `RedHat Enterprise Linux 6
<http://www.redhat.com/products/enterprise-linux>`_. Occasional development
and testing occur on a range of current `Fedora <http://fedoraproject.org>`_
releases. The stock packages in CentOS 6 for key software components
(`Python <http://www.python.org>`_, `PostgreSQL <http://www.postgresql.org>`_)
are sufficient for keygrabber's needs.

While there is no reason why keygrabber could not thrive on a 32-bit platform,
use of a 64-bit platform is strongly recommended, mainly due to system memory
constraints on 32-bit platforms.

While not tested on `FreeBSD <http://www.freebsd.org>`_ or Solaris, there are
no significant barriers to deployment on either platform.


Computing hardware
------------------

The two primary resource bottlenecks for keygrabber are system memory and
disk storage. Even then, the system memory bottleneck is largely due to
cacheing issues, mitigating the impact of a relatively slow disk subsystem.
In addition to speed, the disk subsystem will need to address capacity issues;
a million rows of data will occupy something like 150 megabytes of disk space.
Establishing minimums for system memory, disk bandwidth, and disk capacity
will depend on the event volume expected to be recorded by keygrabber. Small
instances may be fine on a low-end workstation with two gigabytes of system
memory and a conventional hard drive.

Any modern processor, so long as it is not grossly underpowered, would be
sufficient to handle even a busy keygrabber instance. Disk and memory
performance are far more critical.

A single computer at Mt. Hamilton is used to record roughly 65 million events
per week. This system is running 64-bit CentOS 6, with eight gigabytes of
system memory, and a pair of three terabyte disks in a software RAID mirror.
This system is operating at more or less its maximum capacity; any intensive
operations, such as a background resilvering of the RAID mirror, will trigger
a backlog of unprocessed keygrabber events. The CPU in this host is a quad-core
2.5 GHz part, and is largely idle except when compressing (or uncompressing)
archival data.


kroot
-----

keygrabber is installed as a kroot component. Full instructions on proper
acquisition and installation of kroot are beyond the scope of this document.

Beyond the basic kroot requirements, an acceptable Python interpreter must be
discovered as part of the kroot installation. As of May 2013, this means
Python 2.4 or later, up to and including any Python 2.7 release. The output
from the ``configure`` script in ``kroot/etc`` will immediately indicate
whether an acceptable Python interpreter was located; post-installation,
running ``make tell var=PYTHON2`` should return the location of a valid
Python interpreter. In general, more modern releases of Python will be
measurably faster than earlier releases, but are otherwise interchangeable
from keygrabber's perspective.

keygrabber itself resides in ``svn/kroot/util/keygrabber``, and will need to
be checked out separately if not included in the svn module used for the
initial kroot checkout on the target computer.

Retrieving keyword history data via ``gshow`` will require a modern checkout
of ``svn/kroot/music``.

A typical svn checkout to support keygrabber requires checking out the
``keygrabber`` subdirectory, for example::

        cd ~/svn/kroot
        ./svnget util/keygrabber
        cd util && make install


PyGreSQL
--------

Keygrabber relies on the :mod:`pgdb` module for all database interactions.
This is typically provided by the `PyGreSQL <http://www.pygresql.org>`_ software
package, which is available on RedHat Linux systems via yum as
``postgresql-python``. This module needs to be installed before keygrabber
will build or install, though its absence will not prevent the rest of kroot
from building or installing.


PostgreSQL
----------

While keygrabber could be abstracted to work with alternate database backends
(such as `MySQL <http://www.mysql.com>`_), it is not; because PostgreSQL is
readily available on all platforms of interest, there has been no motivation
to pursue this abstraction.

keygrabber has been directly tested with a variety of PostgreSQL releases
between 8.1 and 9.2, inclusive. While known to work with 8.1, the minimum
recommended release is 8.3, as it has improved support for autovacuum
functionality. For performance reasons, the newest readily available stable
release should be used.

Once installed, a PostgreSQL role should be created for the exclusive use
of the keygrabber daemon. The default name for this role is *turk*; designating
an alternate role can be done either on the command line or in the keygrabber
configuration file. Example role creation::

        $ createuser turk
        Shall the new role be a superuser? (y/n) n
        Shall the new role be allowed to create databases? (y/n) n
        Shall the new role be allowed to create more new roles? (y/n) n

A dedicated database should also be created for the use of keygrabber. This
database should be owned by the dedicated keygrabber PostgreSQL role. The
default name for the database is *keywordlog*, but again, keygrabber can use
an alternate database specified on either the command line or in the keygrabber
configuration file. There is no need to create tables in advance, keygrabber
will create them at run-time as necessary. Example database creation::

        $ createdb -O turk keywordlog

For acceptable performance, the amount of memory allocated to PostgreSQL
buffers should also be increased in ``postgresql.conf``. 1/4 of the amount
of system memory should be adequate; for example, two gigabytes on a host
with eight gigabytes of system memory::

        shared_buffers = 2048MB

By default, PostgreSQL only listens for connections on the localhost interface.
If desired, the ``postgresql.conf`` can be changed to enable connections over
the local area network::

        listen_addresses = '*'

Lastly, PostgreSQL can to be configured to allow roles to connect over the
network. The default configuration is for 'ident' authentication when
connecting via the localhost interface. The full flexibility of the PostgreSQL
authentication scheme is available for use: on the simplest side,
``pg_hba.conf`` can be configured to 'trust' connections from specific network
addresses, or ranges of network addresses; with a modest amount of extra work,
passwords can be assigned to the roles of interest, and connecting users can
establish ``.pgpass`` files in their home directories to semi-securely store
passwords for subsequent access. Regardless of the method chosen, if the
keygrabber database is going to be accessed via anything but the localhost
interface, at least one additional authentication method must be enabled.