Queuebusters revisited

Thinking about the future of linuxcnc, it is helpful to revisit the term 'queuebuster operation'. The result is a bit surprising - there are no 'queuebuster operations', there are only 'queuebuster state references'. I sketch a different, more flexible, less error prone model which is data-driven, not operation-driven.

It turns out this model supports any language in the frontend, without requiring that yet-to-be-defined language having to explicitly handle 'queuebusters'.

The current interpreter execution model

classify *operations* into 'queuebuster' and 'non-queuebuster' operations
when a block containing a queuebuster is parsed, stop the interpreter at this block, and wait 'until the machine has caught up'
synchronize interpreter state with the machine state (the 'sync' operation)
continue - now the parameters reflect what actually happened.

A different perspective on the 'queuebuster problem'

Standing back a bit we see:

there is distributed state in the system (interpreter/machine/user interface/what have you)
references to state in one component need to take into account the fact that either:
- state may have changed (eg any HAL pin value)
- a desired property of some state does not yet hold (e.g., probe operation not yet finished, hence probe position still invalid)

So this is really a cache consistency problem. A cache always brings a tradeoff between consistency and performance. Under this angle 'readahead' really means 'generate blocks as long as the state cache is valid'.

Explicit and implicit state references

The key term here is 'reference': that is, when state is referred to, either implicitly or explicitly.

Examples of state references:

explicit state reference: using, say, #5410 (current tool dia) in an expression like '#123 = [#5401 * 2]'. The right-hand-side expression *refers* to some state (tool dia), and it might not what it used to be.

implicit state reference: G91 G0 X5 - move relative to the current point (implies that #5420-#5428 is current)

Tainted references and the syncing requirement

Some machine operations 'taint' certain parameters. For instance, 'probe' taints the current point, and parameters #5061 up to #5070.

Tainting here means: 'the result of the affected parameter cannot be determined at this point (readahead time)'. Not every change to a parameter taints it: for instance, for endpoint calculation, if the starting point is known, then the endpoint will be known to match between interp and machine, so for instance a move does not taint the current point..

tainting example

Lets look at an example program under the angle 'what information gets tainted by which operation':

 N10    G90
 N20    G38.2 (taints current point, i.e. #5420-#5428, plus #5061 and #5070)
 N30    G91
 N40    G0 X5 (implicit state reference to current point, #5420-#5428)
 N50    o100 if [#5070 EQ 1]   (explicit state reference to #5070)

By tracking the tainting behavior of an instruction, it is possible to automatically determine that a synchronization of the tainted state is required, and where that must happen the latest. In the above example, syncing #5420-#5428 must happen before N40 is executed, and syncing #5070 must happen before N50 is executed.

Forms of state tainting

If we classify interpreter state with respect to tainting we find:

Some parameters never get tainted - for instance local variables, #31-#5000.
some parameters must be assumed always tainted, for instance parameters which implicitly refer to HAL pins may never be assumed to have a valid cached value. Assuming changing tool length offset or cutter radius during pause is possible, both tool length offset and cutter radius need to be assumed "always tainted" as well.
some parameters are tainted by a a given G-code operation.

The #5XXX parameters by tainting category

Going through the #5xxx parameters at http://www.linuxcnc.org/docs/devel/html/gcode/overview.html#_numbered_parameters_a_id_sub_numbered_parameters_a we find:

never tainted (I assume here origins and g28, g30 are frozen at program start)
- 31-5000 (local variables)
- 5161-5169
- 5181-5189
- 5211-5219
- 5210
- 5220
- 5221-5229
- 5241-5249
- 5261-5269
- 5281-5289
- 5301-5309
- 5321-5329
- 5341-5349
- 5361-5369
- 5381-5389
- 5599

tainted by probe (category 'byprobe')
- 5061-5069
- 5070
- 5420-5428 current position

Since the current point is often used, we define its own category for it:

category 'currentpoint' (also tainted by 'byprobe'):
- 5420-5428 current position

always tainted (originally: tainted by input read M66, see below)
- 5399
- 5500 (see example next section)

tainted by tool change (M6, category 'bytoolchange'))
- 5400 (tool number)
- 5401-5409 (tlo)
- 5410 (tool dia)
- 5411 - frontangle
- 5412 - back angle
- 5413 orientation
- 5600 tc fault indicator
- 5601 toolchanger fault code
- 5420-5428 current position (assuming user is allowed to jog around)

So we really have only 4 tainting categories: never, by probe, by input read, by toolchange. For further discussion, we better rename 'by input read' to mean 'assume as always tainted' . Because it is so common, we add a fifth category: 'current point' (current point valid)

When a category gets tainted, all parameters which are in that category become stale cache values. When any stale parameter is needed as reference, it needs to be refreshed 'by sync'.

Example: Introducing and categorizing a new parameter

Let's assume we want to introduce a new parameter, 'execution time', and give it #5500 for now. We want it to reflect the execution time, in seconds, when the referencing statement is executed. So a use case would be:

     (debug, we are now #5500 seconds into the program)

Clearly, this is a parameter which must be considered 'always tainted' - it is a queuebuster parameter. So to create meaningful output, readahead must stop before this block, wait until the machine 'executes' this statement, signaling that point in execution has been reached. Then it would be ok to determine the elapsed time, set #5500, and continue.

So we'd add #5500 to the 'always tainted' category above.

There are no 'queuebuster operations' - the queue is busted by referencing stale state

I chose the 'execution time' example on purpose - there is no corresponding G-code 'operation', there is just a parameter reference, and *still* we have a queuebuster?

The important conclusion is: readahead needs stopping *when stale state is referenced*, not when 'some queuebuster operation is executed'.

The inverse is also true: Not all programs containing our former 'queuebuster operations' actually need syncing - only those referencing stale state do.

That realization suggests that 'queuebusting' better be handled at the state reference level, not at the operation model.

Tracking tainting and automatic syncing

For each block, we can determine the static pre- and postconditions with respect to state references at parse time:

block precondition: a set of state categories required to be untainted before the block can execute. The union of all requirements of all words, and parameters referenced in this block.
taint set:: the set of state categories tainted after the block has been executed.

Then we have a 'current cache condition' which describes which state categories are currently in sync. At program start or interpreter initialization, the 'current cache condition is set to 'all state references invalid:

        valid = {}

This will cause an initial sync, so post-sync we have

       valid = {'byprobe','bytoolchange', 'currentpoint'}.

We do not record the 'never' category. Also, we do not add the 'always' category.

Outline of cache coherency algorithm

This would be executed as the blocks list is generated, which could be parse time or later.

 initial sync. # start cache condition now: s = {'byprobe','bytoolchange', 'currentpoint'}. 
 while not end of program
      inspect current block and determine set of preconditions p =  {...}
   redo:
      i = intersect( p, s)
      m = i - p # determine missing categories
      if m != {} # current block references some tainted state
      	 sync(m)
	 wait for sync to succeed
	 s = union(s,m)
	 goto redo:
      q = {...} # set of categories tainted by current block
      s = s - p # invalidate these in current valid set

An example:

 ;s = {'byprobe','bytoolchange','currentpoint'} through initial sync
 G90 ;p = {}, q = {} # G90 has no implicit state references, no categories tainted

 ;s = {'byprobe','bytoolchange','currentpoint'}
 G38.2 ;p = {}, q = {'byprobe','currentpoint'}

 ;s = {'bytoolchange'}
 G91 ;p = {}, q = {} # G91 has no implicit state references, no categories tainted

 ;s = {'bytoolchange'}
 ; since in relative mode, current  point needs to be valid to determine endpoint
 G0 X5 ;p = {'currentpoint'}, q = {}

 --> s is missing the 'currentpoint' category here, so a sync is
    needed.
    Post-sync we again have
    s = {'byprobe','bytoolchange','currentpoint'}
    so the execution succeeds since 'currentpoint' is in the valid set.

 ; s = {'currentpoint','byprobe','bytoolchange'}
 o100 if [#5070 EQ 1]  ;p = {'byprobe'}, q = {} 
 ; no sync needed since 'byprobe' category valid

Determining the precondition set of a block

expressions: for each parameter referenced in the expression, add the taint category to the precondition set. (above 'o100' example: the reference to #5070 adds 'byprobe' to the precondition set)
The precondition of a block is the union set of the preconditions of all words in the block, and all expressions.
G91 (relative) and a motion is generated: add 'currentpoint' to preconditions

Determining the taint set of a block:

if a HALpin reference, add 'always'.
if an M6, add 'bytoolchange', and 'currentpoint'.
if an G38.x, add 'byprobe' and 'currentpoint'.
expressions: currently I dont see how assignments can add to the taint set.

Implementation outline

(not fully thought through, but straightforward and really simple):

parameter management: spin out into separate class
the rs274 codes (interp_arrays.cc) need descriptor masks added (really state category masks, preconditions and taint sets)
the parameters need descriptor masks
active modes/gcodes may add to the precondition of a block; in one point in the code only, namely after parse, and before execute
the current mask needs to be set on sync, and adapted after each block
access to parameters needs to go through 'getters/setters' instead of directly accessing float values; the getter determines whether a sync is needed and returns only thereafter
ANY SYNC OPERATION HAPPENS ONLY IN A STATE 'getter', and as such is in principle language-independent.
- Assuming the state 'getter' can block until sync complete, the interpreter has no need to know what a sync() is. It can be coded in a completely linear fashion.
- This would suggest that state management would be done with a task of its own, interacting with the world model as needed.
- The preview instance of the interpreter has no need to sync with the world model, and as such call the state getter with a parameter indicating 'no sync needed', or link to a simplified state getter which chooses to ignore taint sets and syncing.
- an interesting side effect of accumulating the current taint set in the interpreter is the fact that the resulting taint set at the end of the preview run will allow to tell whether the preview path will coincide with the path generated by the actual program run, or may deviate from it - if the result taint set is empty, they must coincide

Memory requirements

set size: given that all features including save/restore are recorded here an unsigned32 is fine with lots of reserve.

the current-cache-validity descriptor: a set
per parameter: a set; Note named parameters already have an attribute mask; numbered params dont.
per M/G-code: two sets: the precondition, and the taint set. probably best held in interp_array.cc .
per remapped code: as for M/G-code. An ini parameter on the REMAP= line.
per block: the union of preconditions and taint sets of all codes in the block.

Processing requirements

during parsing a block: on a per-block basis, accumulate the preconditions and taint sets in the corresponding per-block sets. An unsigned32 'or' operation.
during parsing an expression: collect the taint set of the expression. An unsigned32 or operation.
modal state: depending on modal state, preconditons might need to be added to the current block, for instance 'G91 relative mode' adds the 'current point valid' preconditon.
during block evaluation: if a precondition is missing, a sync MAY be generated in advance. See 'Processing tradeoffs' below.
parameter reference: if sync is done at the block level: no change - direct reference to underlying float possible. If reference-driven, a parameter reference can cause a sync and hence must go through getter method.
parameter assignment: no change, I dont see any tainted r/w parameters around, so at this point a setter isnt needed. It might make sense long term. Since it can be inlined it could be very lightweight.
interpreter init: clear the current-cache-validity set.

Processing tradeoffs

I see the following ways to deal with syncing tainted values:

automatic minimal sync (only sync whats needed right now - more syncs but less information synced per operation)
automatic full sync (less syncs but all possibly tainted parameters synced)
sync explicitly triggered by interpreter when a block is evaluated

Options 1 and 2 mean 'the interpreter has no need to know about tainted parameters'; this would be the future as any new interpreter could choose to just ignore the problem, but it needs to go through getter methods.

Option 3 is likely the easiest to do in the current interpreter code. In this case, getters are optional.

In terms of number of syncs generated, 2 and 3 are identical. not sure whether the difference between 1 and 2 is worth making configurable.

Stepping

The current interpreter doesnt have a lot of state introspection capabilites, and none are used during stepping, but other interpreter code bases might.

Since performance is not an issue while stepping the interpreter, it could just as well sync at every step. The only change needed to achieve this is to set the current-cache-validity set to 'all values tainted' before a block is evaluated, which will cause a sync before every block evalutaion. There is no point in doing that when using the automatic scheme (options 1 or 2).

Summary

Dealing with queuebusters can be handled at the state access level
this access is completely language independent
- an interpreter for a new language need only define the operation preconditions and taint sets, which are static descriptors
changing the handling of a parameter due to different execution semantics can be achieved by only changing the static descriptor of that parameter

Note added 8-8-2012

It just occurred to me that in the algorithm outline above, I implicitly assumed a strict separation of parse and execute phases for a block; during parse one would determine precondition and taint sets, sync if required, only then execute the block.

This assumed separation in the current rs274ngc code doesnt exist, in particular wrt expressions: expressions and assignments are evaluated at parse time. This implies that either syncing must happen on the fly during reference to stale parameters during expression evaluation, or expression handling is separated into a parse tree generation during read() and parse tree evaluation during execute().

QueuebustersRevisited

Memory requirements

Processing requirements

Processing tradeoffs

Stepping

Note added 8-8-2012

Queuebusters revisited

The current interpreter execution model

A different perspective on the 'queuebuster problem'

Explicit and implicit state references

Examples of state references:

Tainted references and the syncing requirement

tainting example

Forms of state tainting

The #5XXX parameters by tainting category

Example: Introducing and categorizing a new parameter

There are no 'queuebuster operations' - the queue is busted by referencing stale state

Tracking tainting and automatic syncing

Outline of cache coherency algorithm

An example:

Determining the precondition set of a block

Determining the taint set of a block:

Implementation outline

Memory requirements

Processing requirements

Processing tradeoffs

Stepping

Summary

Note added 8-8-2012