Installation and prerequisites¶
keygrabber is deliberately designed to function with modest hardware requirements, and a minimum of (freely-available) third-party software to compose its runtime environment.
Computing platform¶
keygrabber is used extensively on Linux platforms, dominated by CentOS 6, a clone of RedHat Enterprise Linux 6. Occasional development and testing occur on a range of current Fedora releases. The stock packages in CentOS 6 for key software components (Python, PostgreSQL) are sufficient for keygrabber’s needs.
While there is no reason why keygrabber could not thrive on a 32-bit platform, use of a 64-bit platform is strongly recommended, mainly due to system memory constraints on 32-bit platforms.
While not tested on FreeBSD or Solaris, there are no significant barriers to deployment on either platform.
Computing hardware¶
The two primary resource bottlenecks for keygrabber are system memory and disk storage. Even then, the system memory bottleneck is largely due to cacheing issues, mitigating the impact of a relatively slow disk subsystem. In addition to speed, the disk subsystem will need to address capacity issues; a million rows of data will occupy something like 150 megabytes of disk space. Establishing minimums for system memory, disk bandwidth, and disk capacity will depend on the event volume expected to be recorded by keygrabber. Small instances may be fine on a low-end workstation with two gigabytes of system memory and a conventional hard drive.
Any modern processor, so long as it is not grossly underpowered, would be sufficient to handle even a busy keygrabber instance. Disk and memory performance are far more critical.
A single computer at Mt. Hamilton is used to record roughly 65 million events per week. This system is running 64-bit CentOS 6, with eight gigabytes of system memory, and a pair of three terabyte disks in a software RAID mirror. This system is operating at more or less its maximum capacity; any intensive operations, such as a background resilvering of the RAID mirror, will trigger a backlog of unprocessed keygrabber events. The CPU in this host is a quad-core 2.5 GHz part, and is largely idle except when compressing (or uncompressing) archival data.
kroot¶
keygrabber is installed as a kroot component. Full instructions on proper acquisition and installation of kroot are beyond the scope of this document.
Beyond the basic kroot requirements, an acceptable Python interpreter must be
discovered as part of the kroot installation. As of May 2013, this means
Python 2.4 or later, up to and including any Python 2.7 release. The output
from the configure
script in kroot/etc
will immediately indicate
whether an acceptable Python interpreter was located; post-installation,
running make tell var=PYTHON2
should return the location of a valid
Python interpreter. In general, more modern releases of Python will be
measurably faster than earlier releases, but are otherwise interchangeable
from keygrabber’s perspective.
keygrabber itself resides in svn/kroot/util/keygrabber
, and will need to
be checked out separately if not included in the svn module used for the
initial kroot checkout on the target computer.
Retrieving keyword history data via gshow
will require a modern checkout
of svn/kroot/music
.
A typical svn checkout to support keygrabber requires checking out the
keygrabber
subdirectory, for example:
cd ~/svn/kroot
./svnget util/keygrabber
cd util && make install
PyGreSQL¶
Keygrabber relies on the pgdb
module for all database interactions.
This is typically provided by the PyGreSQL software
package, which is available on RedHat Linux systems via yum as
postgresql-python
. This module needs to be installed before keygrabber
will build or install, though its absence will not prevent the rest of kroot
from building or installing.
PostgreSQL¶
While keygrabber could be abstracted to work with alternate database backends (such as MySQL), it is not; because PostgreSQL is readily available on all platforms of interest, there has been no motivation to pursue this abstraction.
keygrabber has been directly tested with a variety of PostgreSQL releases between 8.1 and 9.2, inclusive. While known to work with 8.1, the minimum recommended release is 8.3, as it has improved support for autovacuum functionality. For performance reasons, the newest readily available stable release should be used.
Once installed, a PostgreSQL role should be created for the exclusive use of the keygrabber daemon. The default name for this role is turk; designating an alternate role can be done either on the command line or in the keygrabber configuration file. Example role creation:
$ createuser turk
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
A dedicated database should also be created for the use of keygrabber. This database should be owned by the dedicated keygrabber PostgreSQL role. The default name for the database is keywordlog, but again, keygrabber can use an alternate database specified on either the command line or in the keygrabber configuration file. There is no need to create tables in advance, keygrabber will create them at run-time as necessary. Example database creation:
$ createdb -O turk keywordlog
For acceptable performance, the amount of memory allocated to PostgreSQL
buffers should also be increased in postgresql.conf
. 1/4 of the amount
of system memory should be adequate; for example, two gigabytes on a host
with eight gigabytes of system memory:
shared_buffers = 2048MB
By default, PostgreSQL only listens for connections on the localhost interface.
If desired, the postgresql.conf
can be changed to enable connections over
the local area network:
listen_addresses = '*'
Lastly, PostgreSQL can to be configured to allow roles to connect over the
network. The default configuration is for ‘ident’ authentication when
connecting via the localhost interface. The full flexibility of the PostgreSQL
authentication scheme is available for use: on the simplest side,
pg_hba.conf
can be configured to ‘trust’ connections from specific network
addresses, or ranges of network addresses; with a modest amount of extra work,
passwords can be assigned to the roles of interest, and connecting users can
establish .pgpass
files in their home directories to semi-securely store
passwords for subsequent access. Regardless of the method chosen, if the
keygrabber database is going to be accessed via anything but the localhost
interface, at least one additional authentication method must be enabled.