Installation and prerequisites

keygrabber is deliberately designed to function with modest hardware requirements, and a minimum of (freely-available) third-party software to compose its runtime environment.

Computing platform

keygrabber is used extensively on Linux platforms, dominated by CentOS 6, a clone of RedHat Enterprise Linux 6. Occasional development and testing occur on a range of current Fedora releases. The stock packages in CentOS 6 for key software components (Python, PostgreSQL) are sufficient for keygrabber’s needs.

While there is no reason why keygrabber could not thrive on a 32-bit platform, use of a 64-bit platform is strongly recommended, mainly due to system memory constraints on 32-bit platforms.

While not tested on FreeBSD or Solaris, there are no significant barriers to deployment on either platform.

Computing hardware

The two primary resource bottlenecks for keygrabber are system memory and disk storage. Even then, the system memory bottleneck is largely due to cacheing issues, mitigating the impact of a relatively slow disk subsystem. In addition to speed, the disk subsystem will need to address capacity issues; a million rows of data will occupy something like 150 megabytes of disk space. Establishing minimums for system memory, disk bandwidth, and disk capacity will depend on the event volume expected to be recorded by keygrabber. Small instances may be fine on a low-end workstation with two gigabytes of system memory and a conventional hard drive.

Any modern processor, so long as it is not grossly underpowered, would be sufficient to handle even a busy keygrabber instance. Disk and memory performance are far more critical.

A single computer at Mt. Hamilton is used to record roughly 65 million events per week. This system is running 64-bit CentOS 6, with eight gigabytes of system memory, and a pair of three terabyte disks in a software RAID mirror. This system is operating at more or less its maximum capacity; any intensive operations, such as a background resilvering of the RAID mirror, will trigger a backlog of unprocessed keygrabber events. The CPU in this host is a quad-core 2.5 GHz part, and is largely idle except when compressing (or uncompressing) archival data.

kroot

keygrabber is installed as a kroot component. Full instructions on proper acquisition and installation of kroot are beyond the scope of this document.

Beyond the basic kroot requirements, an acceptable Python interpreter must be discovered as part of the kroot installation. As of May 2013, this means Python 2.4 or later, up to and including any Python 2.7 release. The output from the configure script in kroot/etc will immediately indicate whether an acceptable Python interpreter was located; post-installation, running make tell var=PYTHON2 should return the location of a valid Python interpreter. In general, more modern releases of Python will be measurably faster than earlier releases, but are otherwise interchangeable from keygrabber’s perspective.

keygrabber itself resides in svn/kroot/util/keygrabber, and will need to be checked out separately if not included in the svn module used for the initial kroot checkout on the target computer.

Retrieving keyword history data via gshow will require a modern checkout of svn/kroot/music.

A typical svn checkout to support keygrabber requires checking out the keygrabber subdirectory, for example:

cd ~/svn/kroot
./svnget util/keygrabber
cd util && make install

PyGreSQL

Keygrabber relies on the pgdb module for all database interactions. This is typically provided by the PyGreSQL software package, which is available on RedHat Linux systems via yum as postgresql-python. This module needs to be installed before keygrabber will build or install, though its absence will not prevent the rest of kroot from building or installing.

PostgreSQL

While keygrabber could be abstracted to work with alternate database backends (such as MySQL), it is not; because PostgreSQL is readily available on all platforms of interest, there has been no motivation to pursue this abstraction.

keygrabber has been directly tested with a variety of PostgreSQL releases between 8.1 and 9.2, inclusive. While known to work with 8.1, the minimum recommended release is 8.3, as it has improved support for autovacuum functionality. For performance reasons, the newest readily available stable release should be used.

Once installed, a PostgreSQL role should be created for the exclusive use of the keygrabber daemon. The default name for this role is turk; designating an alternate role can be done either on the command line or in the keygrabber configuration file. Example role creation:

$ createuser turk
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n

A dedicated database should also be created for the use of keygrabber. This database should be owned by the dedicated keygrabber PostgreSQL role. The default name for the database is keywordlog, but again, keygrabber can use an alternate database specified on either the command line or in the keygrabber configuration file. There is no need to create tables in advance, keygrabber will create them at run-time as necessary. Example database creation:

$ createdb -O turk keywordlog

For acceptable performance, the amount of memory allocated to PostgreSQL buffers should also be increased in postgresql.conf. 1/4 of the amount of system memory should be adequate; for example, two gigabytes on a host with eight gigabytes of system memory:

shared_buffers = 2048MB

By default, PostgreSQL only listens for connections on the localhost interface. If desired, the postgresql.conf can be changed to enable connections over the local area network:

listen_addresses = '*'

Lastly, PostgreSQL can to be configured to allow roles to connect over the network. The default configuration is for ‘ident’ authentication when connecting via the localhost interface. The full flexibility of the PostgreSQL authentication scheme is available for use: on the simplest side, pg_hba.conf can be configured to ‘trust’ connections from specific network addresses, or ranges of network addresses; with a modest amount of extra work, passwords can be assigned to the roles of interest, and connecting users can establish .pgpass files in their home directories to semi-securely store passwords for subsequent access. Regardless of the method chosen, if the keygrabber database is going to be accessed via anything but the localhost interface, at least one additional authentication method must be enabled.