1: Introduction
2: Requirements
3: Arrangements
The requirements for a mirror are
~/WWW. This can be done with a link.
NetEc.html -> ~adnetec/local/NetEc.html
BibEc.html -> ~adnetec/BibEc/BibEc.html
CodEc.html -> ~adnetec/CodEc/CodEc.html
WebEc.html -> ~adnetec/WebEc/WebEc.html
This allows to address each project by an individual URL. Further links may need to be added, but when Apache is used it may not be necessary to provide those links, as far as I understand its virtual server technique.
~adnetec/WWW/cgi-bin
~adnetec/etc/boot to be executed on startup of the system as
user adnetec. This would normally only call up
a mirrored script ~/global/etc/boot but may do other things
as well according to local circumstances. The main task perl
of ~/global/etc/boot is to fire up the digger daemon
upon reboot of the machine. A script for this can be found in
Manchester as /etc/init.d/adnetec and is available on request.
~AFA/BibEc and ~/AFA/WoPEc. The midas
implementation provides a relational database using the digger
implementation of the whois++ protocol as described in the internet
RFC1835. For an transitional period the midas implementation also
provides a full-text WAIS index for these projects.
Other projects are html based in the sense that they come essentially
as a set of html pages. For these pages the midas implementation
provides only a full-text WAIS
index. This is the case for WebEc, and the Bill Goffe's
EconFAQ. Although the EconFAQ is not a part of NetEc we provide
mirroring and indexing services for it over the midas site. CodEc is a special
case. Although it is using some element database elements these are
not yet in a form that is usable for whois++
At the moment the directories to be mirrored are ~/AFA
~/global and all its subdirectories i.e. ~/global/etc,
~/global/perl and ~/global/html, as well as some parts of the
~/WWW tree. Theses are WWW/WebEc, ~/WWW/CodEc ~/WWW/doc
~/WWW/images ~/WWW/icons and ~/WWW/EconFAQ. More directories
could be added at any time without changing anything manually on any
of the mirror sites. The list of all packages is kept on the Manchester
machine in a file called ~/global/etc/mirrorall which is mirrored
to all other machines, see below.
The overall imperative for the mirror is that information about local
circumstances should all be made available in ~/etc/local. This
includes for example path information, handle information for digger
servers, name of local machine etc. This local information has to be
hard-wired into some of the configuration and html files, e.g. This
server is located in.... To enable this hard-wiring, the
configuration files and some html files are mirrored as forms, and a
script ~/global/localpages is able to convert the forms into the
pages and configuration files that are needed on the local mirror
site. For example, ~/etc/local defines a variable $home as the
home directory of user adnetec. The value of that variable is
different depending on the local setup. All configuration files that
require absolute pathnames have to changed. A special form is
~/global/etc/mirrorall. This is a form that is converted by
localpages into a file ~/mirror/packages/NetEc and defines all
the packages to be mirrored. The special string &&&&HOME&&&& is
changed to the value of $home, thus
~adnetec/global/perl/localpages contains something like:
$netecmirror="$home/mirror/packages/NetEc" ;
system("cp $home/global/etc/mirrorall $netecmirror") ;
system("perl -pi -e \"s@&&&&HOME&&&&@".$home."@;\" $netecmirror") ;
Forms like ~adnetec/global/html/local/NetEc.form are mirrored.
Since the ~/global/perl/localpages script takes all local
information out of ~/etc/local, it can be mirrored.
The way the mirror runs is as follows. Each night, during a convenient
time, a cronjob fires up ~/global/perl/nightly. This job starts by
firing up mirror with packages defined in ~mirror/packages/NetEc
that was generated by localpages during the previous night.
Mirror mirrors all the files listed in ~mirror/packages/NetEc. The
files to be mirrored include the special file
~/global/etc/mirrorall, that defines all the packages to be
mirrored. ~/global/etc/mirrorall contains lines like
package=NetEc-WebEc
site=netec.mcc.ac.uk
remote_user=madnetec
local_dir=&&&&HOME&&&&/WWW/WebEc
remote_dir=/home/cs6400a/adnetec/WWW/WebEc
user=adnetec
group=users
dir_mode=0755
file_mode=0644
(the password line is not included here for security reasons)
When the mirror is finished, the localpages script will install
all local html pages out of form files in ~/global/html and and
configuration files from ~/global/etc. Thus
~/global/etc/mirrorall is converted to
~/mirror/packages/NetEc, the digger boot files
$home/etc/digger.BibEc.boot and $home/etc/digger.WoPEc.boot
are generated from ~/global/etc/digger.boot.form by adding
information retrieved from ~/etc/local etc.
When the local pages are made, the updating of the derived pages and
indexes starts. For the html based projects (CodEc, WebEc and EconFAQ)
if there appears that new files have been copied, then the project
data are being reindexed. To find out if any new files have arrived,
nightly performs a grep -A 1 project | grep Got
(hence the requirement for GNU grep) (where project is the
packages name of the project) on the mirror log that is kept in
~/var/log/mirror. The indexing is performed simultaneously for all
non-database projects using background jobs that call up the mirrored
indexing script index that receives the name of the project as an
argument. Note that wais indexing can not be performed in an
incremental fashion because when updates records are inserted,
the old versions of those record are not deleted. For these
datasets are either small or do not change often this is not
a tragic waste of CPU.
After these non-database projects, the nightly script attacks
WoPEc and BibEc, in that order. If there are any new files, a digger
feed of the last day's records is performed in the background. Then
the ~/global/perl/wopa and ~/global/perl/bipa scripts,
respectively are called that build html out of the AFA files.
These scripts are mirrored from the midas site such that any technical
change on the midas site is copied on the local mirror. Indexing for
the both projects has to wait until these scripts are finished. Again
indexing is performed with a mirrored script. Clearly indexing will
take some time in the case of BibEc, we still think that this method
of indexing as soon as something has changed is preferable to the
indexing on regular periods, not only because it increases the
currency of the resource but it also reduces the amount of
indexing for BibEc if compared to say the weekly re-indexing as we have
done in the past. Note the the midas site plans to abolish WAIS
indexing of BibEc and WoPEc as soon as possible.
What happens if the WAIS indexing lasts more than 24 hours? In that
case starting a new index would result in the two indexes mixing
up. To avoid this situation we need to check if a previous index is
still running before proceeding to a new index. All indexing activity
finishes by packing the log file which is
~/var/log/index.project, where project is the name of the
project. If the index log is not packed, in that case nightly prints
a message to its log and does not index. To deal with the case where
the machine is rebooted during the running index, which would give
rise to a false diagnose of a running index, we need to erase the
non-packed index logs when the machine reboots. This instruction is
found in ~/global/etc/boot which is called up by ~/etc/boot
when the machine starts. ~global/etc/boot also fires up the
digger daemons for both BibEc and WoPEc.
For WebEc and BizEc,
the code is kept in ~adnetec/WebEc and
~adnetec/BizEc. The raw code supplied by Cartwell and Saarinen
needs to be adopted to the user adnetec. This is done on midas only
with cd $HOME/WebEc; webecabsolute *.html and
cd $HOME/BizEc; bizecabsolute *.html respectively. These
scripts must be fired up manually on midas each time a new version
of the code is installed. Essentially these
two scripts (not mirrored) make sure that that a (~adnetec)
is being added into the URLs such as to make the URL portable.
Adding hte user adnetec into the URL also enables all pages
to be found correctly from the main page.
It is this modified version of the code that is mirrored.
On each mirror, as soon as the mirror indicates that data has
changed, a script ~global/perl/laurimirror is fired up
with the name of the project as an argument. This script ensures that local
resources are being pointed to. For example it would be nonsense
to point to the UK version of EconFAQ from the Japanese machine.
After laurimirror is completed the code for the project is
indexed using the index command. index and laurimirror are
fired up automatically when there is a change to the code. In midas
these two scripts have to be called manually as soon as there is
a change in the code. To sum up, suppose BizEc is bing updated in
midas, the following commands have to be given, one after
the other (no background processing !). cd BizEc,
bizecabsolute, laurimirror BizEc, index BizEc.