Web Automata - Supplement


Web automata and distributed runs of systems of Web automata are defined formally in [STVdB02]. This section provides supplementary information concerning the present implementation of both Web automata and their distributed runs. A self-contained and more detailed description of the implementation will be subject of a separate paper.

For technical and efficiency reasons the currently implemented version of Web automata slightly deviates from the formal definition in [STVdB02]. In the following, we briefly explain the most important changes, starting with the central notion of a Web automaton definition.

Web automaton definition. A Web automaton definition has the form 

id:       "STRING"
source:   "STRING"
spawnPot:   NUM
spawnMax:   NUM
program:

PROGRAM

where STRING, NUM, and PROGRAM stand for a string, a natural number, and a Web automaton program, respectively. The first two lines specify the id and source of the query computed by the defined automaton. Lines three and four specify the spawn potential and spawn maximum of the automaton. Each component is discussed below. Here is an example of a Web automaton definition (taken from WebAuto/auto.def):
id:       "test"
source:   "index.html"
spawnPot: 10
spawnMax: 2
program:

if ! headOfQ = _ then
   send(headOfQ)
else
   if ( Contains("PODS'02", thisNode) & r1 = 0 ) then
      send(thisNode);
      r1 := 1
   else
      send()
   endif
endif


ID. Since many different Web queries may run concurrently on a Web automaton server, each query is equipped with a unique ID tag. All automata participating in the computation of one and the same query use the query's ID for identification. In a Web automaton definition it is (usually) save to choose the user's name as ID. If a conflict occurs, the Web automaton server will issue a complain before accepting the query.

Source. In order to run a Web automaton, it must be associated with a node. In principle, a node can be any object on the Web, be it a word in a text file or the contents of an entire Web server. However, since the present implementation is still in an experimental state - its main purpose is to test different types of protocols by which automata can communicate, currently a Web automaton can only be associated with (and thus run at) a single HTML or ASCII file.

The source of a Web query is the node where the query is initiated [AV00, STVdB02]. Intuitively, it is the location from which we start exploring the Web in order to answer the query. The source specified in a Web automaton definition determines the location of the source automaton, i.e., the copy of the automaton which will run at the source node. Currently, the source must be a path which uniquely determines a file in the subtree rooted at base. For instance, if "index.html" is specified as source, then the source automaton will be associated with the file base/index.html.

Spawn potential. The spawn potential of a Web automaton is a natural number which determines the automaton's ability to create copies of itself at neighboring nodes. Currently, a neighboring node is a Web page linked to from the page at which the automaton is running. An automaton can `spawn' if and only if its spawn potential is greater than 0. Note that the spawn potential specified in a Web automaton definition determines the spawn potential of the source automaton only. The spawn potentials of all descendants of the source automaton are computed internally. In the present implementation, the spawn potential of a non-source automaton is computed by simply subtracting 1 from the spawn potential of its creator. As a consequence, the spawn potential of the source automaton is identical with the maximal distance of Web pages (from the source) at which copies of the automaton can be created and run. Hence, the spawn potential specified in a Web automaton definition can be viewed as the search and query depth. Of course, other, more sophisticated heuristics for controlling the creation of automata can be implemented.

Spawn maximum. The spawn maximum of a Web automaton is an upper bound on the number of children (direct descendants) which the automaton can trigger in parallel. Let us briefly recall the situation in the formal model [STVdB02]. Each automaton participating in a distributed run sends messages to the automaton which created it, following a simple protocol based on three rules:
If you do not have children, start computing the next message whenever the user triggers the system.
If you do have children, start computing the next message only when your queue contains `enough' input (i.e., messages from children); when all children have send their messages, start anyway.
Once you have sent a message, stop and wait for the next trigger.
This naturally organizes a distributed run into rounds, where each round is triggered by the user.

In the present system, the second of the above rules is implemented by the following algorithm. If an automaton discovers that it has not enough input for computing the next message, it triggers n children in parallel (i.e., it sends them "please send me your next message" requests), where n = min{number of children, spawn maximum}. If the n new messages sent by those children still do not suffice for completing the computation, the automaton triggers the next n children, and so forth, until all children have been triggered.

The spawn maximum specified in a Web automaton definition determines the spawn maximum of all copies of the defined automaton. This enables the user to limit the number of parallel computations during a distributed run. For instance, if the spawn maximum is chosen to be 1, the execution order of automata is strictly sequential, which is space efficient but (maybe) slow. On the other hand, choosing a high spawn maximum implies that many automata can run in parallel, possibly resulting in a shorter response time of the system and (maybe) extensive space allocation on the participating machines.

Web automaton program. The syntax of Web automaton programs has been slightly expanded; it is now defined by the following grammar:
 

PROGRAM -> ACTION
| if  FORMULA  then  PROGRAM  endif
| if  FORMULA  then  PROGRAM  else  PROGRAM  endif
| PROGRAM ; PROGRAM
ACTION -> r NUM := TERM  |  send()  | send( TERMS )
TERM -> 0  |  1  |  r NUM  | thisNode  |  headOfQ  |  _
TERMS -> TERM  |  TERM , TERMS
FORMULA -> ATOM
| ! FORMULA
| ( FORMULA & FORMULA )
| ( FORMULA | FORMULA )
ATOM -> AtSource  |  TERM = TERM  |  PREDICATE

where NUM stands for a natural number greater than 0, "_" on the right-hand side of the TERM rule represents the empty queue, and "AtSource" on the right-hand side of the ATOM rule replaces the unary predicate Source (see [STVdB02]). Furthermore, PREDICATE stands for one of the following expressions:

  HasTitle(TERM)
  Contains("STRING", TERM)
  TitleContains("STRING", TERM)
  AnchorContains("STRING", TERM)
  Link(thisNode, TERM)
The semantics of Web automaton programs remains the same. The only exception is that update and send actions are now performed sequentially according to their occurrence in a program, and not in parallel as defined in [STVdB02]. The meaning of if-then-else rules is as usual. The currently implemented predicates HasTitle, Contains, TitleContains, AnchorContains, and Link have the obvious interpretation.