LinuxCNC Documentation Wiki: WhyManualWhilePausedIsHard

This page was taken in its entirety from a post to the emc-users list (emc-users@lists.sourceforge.net) on 5/17/2010 2:50 AM, John Kasunich.

(Note: to understand this, you will have to think like a developer for a while. That means you need to understand the basic internal architecture of EMC2, and you need to think in detail about how to implement what you want, not just what you want the machine to do.)

First: Basic EMC2 architecture. EMC consists of several levels. From the top down, it goes GUI, Interpreter, Motion Controller, HAL. (I'm leaving out a lot, but this is still going to be too long, so forgive any oversimplification.) The first two levels are normal "user space" programs. Like all normal programs, they are at the mercy of the operating system and any other programs that are running at the same time.

When the computer gets busy, regular programs temporarily stop or slow down while the operating system or another program does something else. Everybody has experienced that with everyday programs. You click on something that normally happens instantly, but it takes a half-second or a couple of seconds instead. That kind of thing happens all the time, usually for a tenth of a second, or a hundredth, and you never notice, but it is there. Not a big deal with normal computer programs, and not even a big deal for the GUI of a machine tool. But not acceptable for the low-level motion control.

To avoid this problem, EMC runs the motion controller (and HAL) as realtime processes. When a realtime process is configured to run every 1000th of a second, that is exactly what you get, no matter how busy the rest of the computer gets. (There is still a small amount of variation, measured in microseconds, but we're ignoring that).

The motion controller runs 1000 times a second. Most of the time, all it does is calculate a new position a little farther along the line or arc described by the current line of g-code. But sooner or later that line or arc ends, and a new one starts. When that happens, the info about the next line of g-code MUST be available. That info comes from the g-code interpreter. But what if the interpreter happens to be right in the middle of a 1/10 second delay?

EMC solves this problem with the motion queue. The queue holds a couple hundred motions (lines, arcs, etc). The interpreter runs as fast as it can, turning g-code into simple motions and putting them in the queue. The motion controller takes them out of the queue and moves the tool.

What this means is that the interpreter is usually many lines ahead of the motion controller. The interpreter applies work offsets to each move. It translates units from whatever the program uses (inches or mm) to machine units. It applies cutter compensation and tool length offset. It breaks canned cycles down into individual lines and arcs. After doing all of that, it puts the lines and arcs into the motion queue.

The motion controller pulls lines and arcs out of the queue and makes the tool move along that path. A particular line or arc might sit in the queue for a couple tenths of a second, if you have a program that consists of many short moves. It also might be in the queue for minutes or even hours, if the program has very long, very slow moves. A short program can be completely interpreted and in the queue before the tool ever touches metal.

All of the above information is background - a very simplified version of what happens as EMC runs a program, just enough to explain what the motion queue is and why we have it. Now lets think about implementing "pause/jog/run".

Steve has put his thoughts into the wiki page at http://wiki.linuxcnc.org/cgi-bin/wiki.pl?ManualWhilePaused He says "EMC need only remember the axis positions it stopped at and on resume should always do a combined move back to that position."

So, how can we do that? I assume he doesn't want to wait till the end of a line or arc to stop. If the tool breaks or swarf wraps 1 inch into a 10 inch long cut, you need to stop now. So that means we MUST do this in the motion controller, since the interpreter simply queues up complete moves and doesn't know anything about the middle of a move.

It might be practical to implement pause/jog/resume entirely in the motion controller. Jogging is currently done there, so it isn't out of the question for the controller to remember the current position, let you jog away, then do a move back to the remembered position. There are plenty of messy details, but no fundamental problems - as long as ALL you want to do is jog.

Steve also writes: "The only requirements are to jog, toggle spindle and coolant on/off and touch off current tool. MDI may be useful to allow accurate axis positioning."

Already said jogs can probably be done. Next is spindle and coolant. Normally, when the interpreter encounters a spindle state change (on/off/speed) or a coolant change (on/off), it stops queuing movement commands. When the motion controller finishes processing all the moves in the queue, the interpreter see that, sends the spindle or coolant command, and starts queuing motions again. We refer to such events as "queue busters". (And as an aside, the fact that spindle speed changes are queue busters might explain why the guy who is trying to change laser intensity on the fly with S words sees motion pause briefly for each change.)

So - here we are, with the motion controller paused in the middle of an arc, and another 2 or 20 or 200 lines and/or arcs sitting in the queue. We can't wait for the queue to empty, because it won't. So we need another method to get the spindle and coolant commands to the motion controller. This is not impossible, but it adds complexity and might introduce bugs - when you have two channels that can send commands to the motion controller, you need to carefully coordinate them to make sure the controller is listening to the right one at the right time. Let's assume that we can deal with all those nasty details - spindle and coolant control is possible, just tricky.

Next on the requirements list is touch-off. Here it gets very nasty. Touch-off changes a coordinate system offset. But the interpreter applies offsets before any line or arc goes into the motion queue. When we stopped the motion controller, there were still 2 or 20 or 200 lines and arcs in the queue using the old offsets. Those motions will happen after we resume with the new tool, so they will be wrong. We must throw away all the queued motions, back the interpreter up, and re-queue with the new offsets. This is NOT just a matter of nasty details. There are fundamental problems with "backing up" the interpreter.

You can't just hand-wave away the complexity of canned cycles, subroutines, etc. for this problem. Just because you paused in the middle of a very ordinary move doesn't mean that the queue doesn't contain motions that were part of a canned cycle that happens later in the program. You have absolutely no control over what got discarded from the queue.

After the touch-off requirement, Steve said that MDI "may be useful". Consider that MDI uses the exact same interpreter as normal program execution. If you've ever noticed that you can type several lines of MDI while the first one is running, that is because MDI lines are interpreted and stuffed into the motion queue just like regular lines.

So, to do MDI we not only have to throw away every programmed motion that is in the queue, we also have to use the interpreter to process the MDI commands. That means that any of the internal data used by the interpreter might change as the operator uses MDI. This makes it even harder to "back up" the interpreter when it is time to resume the g-code program.

The above is simple facts about how EMC works, and some of the reasons why what you want is somewhere between "very hard" and "impossible". It took me over an hour to think it through and write it up, and I've barely scratched the surface of the problem.

Although I've contributed thousands of lines of code to EMC2, I'm mostly focused on HAL and the motion controller, and I don't know enough about the interpreter to understand all the issues involved in this feature. It would take several more hours and input from other programmers before I could even begin to estimate the amount of work needed, or have any confidence that all "show-stopper" issues have been identified.