Mirroring NetEc

Thomas Krichel

August 1996


1: Introduction

2: Requirements

3: Arrangements

1: Introduction

This is the document describing a mirror arrangement for NetEc called the Tokyo scheme since it was first tried on a server in that city. It is a not a general mirror arrangement for NetEc, but will apply to all sites that are mirroring the NetEc site in Manchester (the midas site). Other NetEc sites may have different arrangements, in particular if they are not mirroring the services offered at the midas site. Note that this documents lists ideal mirror requirements for a Tokyo style mirror, if some of those can not be met it would be possible to find local workarounds. In the following the term mirror means a Tokyo style mirror.

2: Requirements

The requirements for a mirror are

3: Arrangements

NetEc consists of two kinds of projects. BibEc and WoPEc are database projects, they collect data in files known as AFA files. There are kept in the ~AFA/BibEc and ~/AFA/WoPEc. The midas implementation provides a relational database using the digger implementation of the whois++ protocol as described in the internet RFC1835. For an transitional period the midas implementation also provides a full-text WAIS index for these projects.

Other projects are html based in the sense that they come essentially as a set of html pages. For these pages the midas implementation provides only a full-text WAIS index. This is the case for WebEc, and the Bill Goffe's EconFAQ. Although the EconFAQ is not a part of NetEc we provide mirroring and indexing services for it over the midas site. CodEc is a special case. Although it is using some element database elements these are not yet in a form that is usable for whois++ At the moment the directories to be mirrored are ~/AFA ~/global and all its subdirectories i.e. ~/global/etc, ~/global/perl and ~/global/html, as well as some parts of the ~/WWW tree. Theses are WWW/WebEc, ~/WWW/CodEc ~/WWW/doc ~/WWW/images ~/WWW/icons and ~/WWW/EconFAQ. More directories could be added at any time without changing anything manually on any of the mirror sites. The list of all packages is kept on the Manchester machine in a file called ~/global/etc/mirrorall which is mirrored to all other machines, see below.

The overall imperative for the mirror is that information about local circumstances should all be made available in ~/etc/local. This includes for example path information, handle information for digger servers, name of local machine etc. This local information has to be hard-wired into some of the configuration and html files, e.g. This server is located in.... To enable this hard-wiring, the configuration files and some html files are mirrored as forms, and a script ~/global/localpages is able to convert the forms into the pages and configuration files that are needed on the local mirror site. For example, ~/etc/local defines a variable $home as the home directory of user adnetec. The value of that variable is different depending on the local setup. All configuration files that require absolute pathnames have to changed. A special form is ~/global/etc/mirrorall. This is a form that is converted by localpages into a file ~/mirror/packages/NetEc and defines all the packages to be mirrored. The special string &&&&HOME&&&& is changed to the value of $home, thus ~adnetec/global/perl/localpages contains something like:

$netecmirror="$home/mirror/packages/NetEc" ;
system("cp $home/global/etc/mirrorall $netecmirror") ;
system("perl -pi -e \"s@&&&&HOME&&&&@".$home."@;\" $netecmirror") ;

Forms like ~adnetec/global/html/local/NetEc.form are mirrored. Since the ~/global/perl/localpages script takes all local information out of ~/etc/local, it can be mirrored. The way the mirror runs is as follows. Each night, during a convenient time, a cronjob fires up ~/global/perl/nightly. This job starts by firing up mirror with packages defined in ~mirror/packages/NetEc that was generated by localpages during the previous night. Mirror mirrors all the files listed in ~mirror/packages/NetEc. The files to be mirrored include the special file ~/global/etc/mirrorall, that defines all the packages to be mirrored. ~/global/etc/mirrorall contains lines like

package=NetEc-WebEc
site=netec.mcc.ac.uk
remote_user=madnetec
local_dir=&&&&HOME&&&&/WWW/WebEc
remote_dir=/home/cs6400a/adnetec/WWW/WebEc
user=adnetec
group=users
dir_mode=0755
file_mode=0644

(the password line is not included here for security reasons)

When the mirror is finished, the localpages script will install all local html pages out of form files in ~/global/html and and configuration files from ~/global/etc. Thus ~/global/etc/mirrorall is converted to ~/mirror/packages/NetEc, the digger boot files $home/etc/digger.BibEc.boot and $home/etc/digger.WoPEc.boot are generated from ~/global/etc/digger.boot.form by adding information retrieved from ~/etc/local etc. When the local pages are made, the updating of the derived pages and indexes starts. For the html based projects (CodEc, WebEc and EconFAQ) if there appears that new files have been copied, then the project data are being reindexed. To find out if any new files have arrived, nightly performs a grep -A 1 project | grep Got (hence the requirement for GNU grep) (where project is the packages name of the project) on the mirror log that is kept in ~/var/log/mirror. The indexing is performed simultaneously for all non-database projects using background jobs that call up the mirrored indexing script index that receives the name of the project as an argument. Note that wais indexing can not be performed in an incremental fashion because when updates records are inserted, the old versions of those record are not deleted. For these datasets are either small or do not change often this is not a tragic waste of CPU.

After these non-database projects, the nightly script attacks WoPEc and BibEc, in that order. If there are any new files, a digger feed of the last day's records is performed in the background. Then the ~/global/perl/wopa and ~/global/perl/bipa scripts, respectively are called that build html out of the AFA files. These scripts are mirrored from the midas site such that any technical change on the midas site is copied on the local mirror. Indexing for the both projects has to wait until these scripts are finished. Again indexing is performed with a mirrored script. Clearly indexing will take some time in the case of BibEc, we still think that this method of indexing as soon as something has changed is preferable to the indexing on regular periods, not only because it increases the currency of the resource but it also reduces the amount of indexing for BibEc if compared to say the weekly re-indexing as we have done in the past. Note the the midas site plans to abolish WAIS indexing of BibEc and WoPEc as soon as possible.

What happens if the WAIS indexing lasts more than 24 hours? In that case starting a new index would result in the two indexes mixing up. To avoid this situation we need to check if a previous index is still running before proceeding to a new index. All indexing activity finishes by packing the log file which is ~/var/log/index.project, where project is the name of the project. If the index log is not packed, in that case nightly prints a message to its log and does not index. To deal with the case where the machine is rebooted during the running index, which would give rise to a false diagnose of a running index, we need to erase the non-packed index logs when the machine reboots. This instruction is found in ~/global/etc/boot which is called up by ~/etc/boot when the machine starts. ~global/etc/boot also fires up the digger daemons for both BibEc and WoPEc.

For WebEc and BizEc, the code is kept in ~adnetec/WebEc and ~adnetec/BizEc. The raw code supplied by Cartwell and Saarinen needs to be adopted to the user adnetec. This is done on midas only with cd $HOME/WebEc; webecabsolute *.html and cd $HOME/BizEc; bizecabsolute *.html respectively. These scripts must be fired up manually on midas each time a new version of the code is installed. Essentially these two scripts (not mirrored) make sure that that a (~adnetec) is being added into the URLs such as to make the URL portable. Adding hte user adnetec into the URL also enables all pages to be found correctly from the main page. It is this modified version of the code that is mirrored. On each mirror, as soon as the mirror indicates that data has changed, a script ~global/perl/laurimirror is fired up with the name of the project as an argument. This script ensures that local resources are being pointed to. For example it would be nonsense to point to the UK version of EconFAQ from the Japanese machine. After laurimirror is completed the code for the project is indexed using the index command. index and laurimirror are fired up automatically when there is a change to the code. In midas these two scripts have to be called manually as soon as there is a change in the code. To sum up, suppose BizEc is bing updated in midas, the following commands have to be given, one after the other (no background processing !). cd BizEc, bizecabsolute, laurimirror BizEc, index BizEc.