Collision management in X10 pure RF

toasterking · June 25, 2015, 10:14:05 PM

I recently had the misfortune of dealing with the RF side of X10 a bit more than I wanted to. I set up some appliance and lamp modules, an SS13A Stick-a-Switch, and an MS16A motion detector in my office. Since it is in an office building that I do not manage and I do not have the luxury of isolating noise makers and signal suckers on the power lines all over the building, I just have my X10 PLC modules on two power strips, one at each end of the room, each isolated by an XPPF filter and using an RR501 transceiver on the power strip to forward RF commands to the PLC modules. I wanted to add some manner of "scene changes" also, so I also added a CM19A RF computer interface.

I figured that since the RF protocol is much faster than the PLC protocol, RF signal collisions would likely be a non-issue since the possibility of overlap would be greatly reduced, right? Wrong. It would be right except that every device seems to transmit its RF command repeatedly for a duration of about one second, so the commands actually monopolize the airwaves for even longer on RF than they do the power lines on PLC. Even worse, no transmitter has a receiver or any sort of avoidance/detection implementation for RF signal collisions. The CM19A has a receiver, but it doesn't implement any sort of collision management. At least, not with the ActiveHome Scripting SDK, which is what I am using. If an X10 signal is already being sent from another device (and received by the CM19A) when the SDK is given a command to send, it just blasts it out the CM19A anyway with no regard to the transmission that it ends up overlapping. It was driving me crazy that frequently, one or more commands in a sequence sent by the computer would be lost because the damn motion detector was blindly sending an ON command, like it does for 1 full second out of every 10 while it's sensing motion, which it does almost the entire time that someone is in the room. I had to do something about this.

So I did something, and it works pretty reliably. I implemented a very lame algorithm for RF collision avoidance and detection. The SDK gives me control of the duration of every RF command sent, and I found that even a value of 1 (the shortest duration, only sending a single copy of the command) is very reliable. So why all the redundancy in other devices? If I only make the CM19A send each command once and other devices are blasting copies of the same command for 1000ms, then the CM19A should be able to "hear" the colliding commands so I can avoid them. Well, sort of. It turns out that the CM19A can either send or receive at once, but not both. And immediately after sending, there is a period in which it is "deaf" and misses other RF signals completely. Maybe the AGC on its receiver has to to settle and fade back up after its own transmission deafens it. I'm not sure of the technical reason, but I know that it happens. After a lot of experimenting, I came up with a simple algorithm and timings that work. I'm not happy with the timings; it's still too slow. But this is as fast as I could make it and still keep it reliable, and it's still much better than missing commands.

Simply, it is the following:

Every time any copy of an RF command is received, reset a global timer.

Transmission sequence:

COLLISION AVOIDANCE: Before transmitting, check to see if any commands were received in the last 200ms. If so, delay the transmission until 200ms has elapsed since the last received command. (This accounts for the delay between reception of repeated commands. Based on my measurements, 90ms should have been more than sufficient, but for some reason, it's not in practice.)
Transmit the command with a duration of 1 (shortest).
After transmitting, wait 1100ms before allowing the next transmission to be sent.
COLLISION DETECTION: If a command is received during the ("non-deaf" portion of the) 1100ms window, check to see what the command was.
If it was the same command that was just sent, ignore it because it is probably an echo.
If it was any other command, treat it as a collision and immediately repeat the entire transmission sequence.

Of course, this entire approach relies on other transmitters sending their usual long blast of repeated commands to ensure that they extend into the "non-deaf" part of the CM19A's reception window. Though it is kludgy and inefficient, it is sufficient to deal with the incredibly rude motion detector, it's only slightly slower than the default behavior of repeating the same transmission for 1 second with NO collision management, and I am quite happy that I was able to make this work. And considering how much more reliable I just made the system, I didn't even put a whole lot of effort into doing so. But this is exactly the thing that bugs me: If it is this simple to add very basic collision management to X10's PC RF transceiver with X10's own software tools, why didn't X10 do it themselves? It makes a massive difference for its reliability.

I still feel as though I'm missing something that should be obvious to me and would have made this whole endeavour unnecessary, so whatever it is, please feel free to point it out!

Brian H · June 26, 2015, 07:30:51 AM

I can't say much on your solution but can verify the X10 RF commands are sent multiple times. They say for better success in getting the message to the transceiver.

dhouston · June 26, 2015, 07:50:01 AM

First, most X10 RF transmitters send 5, 6 or more copies of the RF signal. (AFAIK, the SH624 is the only one that sends a single copy.) The reason for this is that their RF receivers have AGC and, with weak signals, it may take multiple copies to reset the AGC, allowing for a clean reception and interpretation. FCC limits of RF transmitter power are rather stringent. Europe allows 10x the power and Australia/NZ allow about 100x. You can see a 'scope screenshot that shows a signal emerging from the noise here...
http://davehouston.org/rf-noise.htm

Second, I implemented a similar solution (at a much lower level) about 15 years ago with the BX24-AHT (and with other designs that followed). For 600mS after receiving and implementing a valid RF code, I examine subsequent codes. If the same code, I ignore; if a different and valid code, I implement. But, that is at a much lower point in the food chain, just as the RF is received.

Third, the probability of an RF collision resulting in a valid X10 code is near zero. X10 used the NEC protocol which repeats each byte as its complement. IOW, all X10 RF receivers have built-in avoidance/detection implementation for RF signal collisions as it is built in to the NEC protocol.

Finally, you are seeing the codes only after X10 hardware/firmware has done its (unknown & mysterious) things with the raw signal. Given, their history, I ~~find~~ think it unlikely they would report an invalid code. So, when the SDK reports a single RF code, it is likely to have discarded 4-5 previous, invalid receptions. And, even with RF collisions, unless the two (or more) signals have identical amplitudes, the receiver may, because of AGC, react only to the strongest signal. See the discussion of the data-slicer, here...
http://davehouston.org/RFTipsTricks.htm

toasterking · June 26, 2015, 11:40:26 AM

Thank you, Brian and Dave, for the replies!

Quote from: dhouston on June 26, 2015, 07:50:01 AM
their RF receivers have AGC and, with weak signals, it may take multiple copies to reset the AGC, allowing for a clean reception and interpretation.

This makes a lot of sense. All of the devices in the office are in very close proximity, which may be one reason it works so well for me when sending a single command.

Quote from: dhouston on June 26, 2015, 07:50:01 AM
the probability of an RF collision resulting in a valid X10 code is near zero.

Quote from: dhouston on June 26, 2015, 07:50:01 AM
Given, their history, I ~~find~~ think it unlikely they would report an invalid code.

It may be a bug in the firmware or SDK, but the SDK captured the M1 ON code several times while I was experimenting and I don't have any devices on house code M. It was always when I was intentionally causing overlapping transmissions. IIRC, the conflicting codes that were being sent at the time were F3 ON and either F2 DIM or F2 BRIGHT. I didn't capture any other errant codes besides M1 ON.

Quote from: dhouston on June 26, 2015, 07:50:01 AM
IOW, all X10 RF receivers have built-in avoidance/detection implementation for RF signal collisions as it is built in to the NEC protocol.

Call it what you want. It doesn't change the fact that the system was broken for me right out of the box. I'm sure you are correct and maybe the behavior you describe is accurate to the terminology, but what I really desire is arbitration so that all commands get through to a receiver in a decodable state. In my testing, two overlapping commands could result in a variety of outcomes. In many cases, both commands were properly received and acted upon, but not always: Either both RR501 transceivers would receive both commands, would receive only one or the other, one RR501 would receive one command and the other RR501 would miss it, or each RR501 would receive a different command, and it was pretty unpredictable. Depending on the specifics of the implementation of that protocol, it may be theoretically possible to completely avoid losing commands, but this is the implementation I am stuck with unless I want to build my own transceiver. (And let's admit it: I have neither the skill nor the patience to do that!) So the only thing I decided I could do about it was detect that another command was received in the crossfire and repeat the transmission.

Quote from: dhouston on June 26, 2015, 07:50:01 AM
Finally, you are seeing the codes only after X10 hardware/firmware has done its (unknown & mysterious) things with the raw signal. [...] when the SDK reports a single RF code, it is likely to have discarded 4-5 previous, invalid receptions.

I thought about the same thing, and at first I had serious doubts whether the solution I implemented would even be possible at such a high level. I am aware that there is much more than meets the eye in the underlying protocol and the way it is processed. I just had to hope that the CM19A and the RR501 process the same receptions the same way and agree on which ones are invalid, and as far as I can tell, they seem to. It is also obvious that the SDK hides some of the repeated codes that are received, only forwarding the "key down" and "key up" events (which are misnamed, as we already discussed), until a certain threshold is reached, then begins forwarding all the received commands. Since my solution is dependent on the SDK, this is probably why I needed such a high guard time of 200ms before deciding that the line is clear of any transmissions.

The extra information you provided helps to shed light on the reasons for some of the behavior I observed, but I don't think it changes how I need to deal with it from behind the SDK.

dhouston · June 26, 2015, 01:11:09 PM

Quote from: toasterking on June 26, 2015, 11:40:26 AM
The extra information you provided helps to shed light on the reasons for some of the behavior I observed, but I don't think it changes how I need to deal with it from behind the SDK.

I didn't intend to dissuade you and my tone may have come across as more critical than intended.

I was impressed that you came up with much the same solution despite being at much more of a disadvantage since you were dealing with the output from the RR501s & CM19A as well as the SDK. I handled the RF directly in order to avoid as much of that as possible (including the SDK) although I still had to deal with how the CM11A and/or TW523 reported only what they considered valid codes.

A couple of additional considerations: Having 2 or more polite PLC transmitters can lead to problems. If you read the X10 spec there are a limited number of time slots for repeat transmissions which can lead to chains of PLC collisions/retransmissions. I found it fairly easy to trigger PLC storms whenever an RR501 was in the mix. And M is frequently the housecode reported as evidence of phantom codes from collisions. So much so that I suspect it's a firmware or SDK issue.

If I manage to finish the projects I've started, they will provide low level access to all the raw RF & PLC signals free of any interpretation by X10 firmware/SDK issues.

dhouston · June 26, 2015, 02:57:39 PM

One more clarification. When capturing/decoding low level X10 RF, I've never seen anything suggesting a collision resulting in a phantom code. I've only seen the valid codes expected during my extensive testing. That's the reason I suspect the reports of such collisions are due to faulty logic in X10 firmware/software.

dhouston · June 26, 2015, 06:07:51 PM

Quote from: toasterking on June 26, 2015, 11:40:26 AM
It may be a bug in the firmware or SDK, but the SDK captured the M1 ON code several times while I was experimenting and I don't have any devices on house code M. It was always when I was intentionally causing overlapping transmissions. IIRC, the conflicting codes that were being sent at the time were F3 ON and either F2 DIM or F2 BRIGHT. I didn't capture any other errant codes besides M1 ON.

Was the M1 ON reported as RF, PLC or both?

toasterking · June 27, 2015, 10:54:54 PM

Quote from: dhouston on June 26, 2015, 01:11:09 PM
I didn't intend to dissuade you and my tone may have come across as more critical than intended.

No worries! I did not feel dissuaded, no tone was interpreted, and I took no offense. I was just summarizing my conclusion.

Quote from: dhouston on June 26, 2015, 01:11:09 PM
Having 2 or more polite PLC transmitters can lead to problems.

I was aware of this, but only because you had already documented it so well at http://davehouston.net/multiples.htm!

Quote from: dhouston on June 26, 2015, 01:11:09 PM
If I manage to finish the projects I've started, they will provide low level access to all the raw RF & PLC signals free of any interpretation by X10 firmware/SDK issues.

I am interested to see what you and others develop with that! I would certainly welcome an attempt at a do-over for RF decoding firmware!

Quote from: dhouston on June 26, 2015, 01:11:09 PM
M is frequently the housecode reported as evidence of phantom codes from collisions. So much so that I suspect it's a firmware or SDK issue.

Quote from: dhouston on June 26, 2015, 02:57:39 PM
One more clarification. When capturing/decoding low level X10 RF, I've never seen anything suggesting a collision resulting in a phantom code. I've only seen the valid codes expected during my extensive testing. That's the reason I suspect the reports of such collisions are due to faulty logic in X10 firmware/software.

I'm seeing one more reason not to use housecode M for anything important. I already had two others:

Absence of a house code wheel equates to house code M, so modules with dirty contacts on the code wheels are probably more likely to send commands for house code M than any other errant house code.
The raw sequence for start code + house code M on PLC is 111010101010, so if there is continuous noise around 120kHz only on the half-cycle, all it takes is for one of those empty (0) frames to be filled in with another errant noise spike, and the noise becomes a valid sequence for house code M.

Quote from: dhouston on June 26, 2015, 06:07:51 PM
Was the M1 ON reported as RF, PLC or both?

It was definitely reported as RF because I was filtering to only "recvrf" events from the SDK. With that in mind, it is possible that there was a matching "recvplc" event but I would not have seen it.

dhouston · June 28, 2015, 08:21:56 AM

Quote from: toasterking on June 27, 2015, 10:54:54 PM
The raw sequence for start code + house code M on PLC is 111010101010, so if there is continuous noise around 120kHz only on the half-cycle, all it takes is for one of those empty (0) frames to be filled in with another errant noise spike, and the noise becomes a valid sequence for house code M.

I think M is 01010101 at the PLC level. Changing any of those 0 half-bits to 1 results in a 1110 start sequence (which will be followed by an invalid code).

It also requires that a 1 half-bit be changed to a 0 half-bit. Changing a 0 half-bit to 1 is simple; changing the adjacent 1 half-bit to 0 is not. IOW, it requires changing a 10 sequence to 01 or vice versa.

The NEC protocol used for RF follows each byte with its complement so there a 0 bit changed to 1 requires a corresponding 1 to 0 change in the following byte. That seems even harder to imagine but, when you consider the physical layer where a 1 bit and 0 bit have different times between rising edges, it becomes extremely difficult to transpose corresponding bits in adjacent bytes without some shift in the time/space continuum.

At a higher logic level where 01010101=M becomes 0000=M it appears far more doable without calling on Dr. Who.

toasterking · June 28, 2015, 01:27:46 PM

Quote from: dhouston on June 28, 2015, 08:21:56 AM
I think M is 01010101 at the PLC level. Changing any of those 0 half-bits to 1 results in a 1110 start sequence (which will be followed by an invalid code).

You are correct as usual. I was thinking that the complementary half-bit came before the actual half-bit, but it is after. That means that 111010101010 is valid for house code J, not M. It is disappointing to me that I have been avoiding the wrong house code for all this time and have 16 devices on house code J!

Quote from: dhouston on June 28, 2015, 08:21:56 AM
It also requires that a 1 half-bit be changed to a 0 half-bit. Changing a 0 half-bit to 1 is simple; changing the adjacent 1 half-bit to 0 is not. IOW, it requires changing a 10 sequence to 01 or vice versa.

I was envisioning a noise source that only spews its 120kHz burst on the AC half-cycle, i.e. only on a rising ZC or only on a falling ZC. Wouldn't that be interpreted to an X10 receiver as 101010101010? Then if only one of those 0 half-bits gets changes to a 1 half-bit, you get 111010101010, which is a valid start code and house code J (not M; my mistake). My thinking was that if you have a valid start code and house code composed of just line noise, that just raises the probability, with only 5 bits to go, that eventually, you could have an entire valid X10 PLC command composed of line noise.

This is a tangent because we were initially discussing the RF protocol, not the PLC one, but still an interesting tangent!

dhouston · June 28, 2015, 03:21:36 PM

Quote from: toasterking on June 28, 2015, 01:27:46 PM
I was envisioning a noise source that only spews its 120kHz burst on the AC half-cycle, i.e. only on a rising ZC or only on a falling ZC. Wouldn't that be interpreted to an X10 receiver as 101010101010? Then if only one of those 0 half-bits gets changes to a 1 half-bit, you get 111010101010, which is a valid start code and house code J (not M; my mistake). My thinking was that if you have a valid start code and house code composed of just line noise, that just raises the probability, with only 5 bits to go, that eventually, you could have an entire valid X10 PLC command composed of line noise.

That is a bit more plausible. While most switching power supplies use full-wave rectifiers, there may be some that use half-wave and they might spew harmonics during half cycles. Still, I find it difficult to envision how to get a housecode and function code pair, especially those that are frequently reported. It's much easier to envisage them as noise sources blocking X10 signals.

Also, it begs the question, "Why does it seem to show up when you are trying to generate collisions but not at other times?"

toasterking · July 02, 2015, 12:28:29 PM

Quote from: dhouston on June 28, 2015, 03:21:36 PM
Still, I find it difficult to envision how to get a housecode and function code pair, especially those that are frequently reported. It's much easier to envisage them as noise sources blocking X10 signals.

I checked the PLC protocol specification, and after the 4 bits for the house code, a 1 means it's a command code and 1 for each of the remaining 4 bits is "Status Request". So if the aforementioned scenario occurrs in which there is noise only on the AC half-cycle and one 0 half-bit is changed to 1 by another random noise spike, an X10 PLC receiver will see 1110101010101010101010, which translates to "J STATUSREQUEST". So at least in theory, that is the most likely command to be received errantly as the result of line noise and it's a rather innocuous one, since it won't signal any module to change its state nor affect the perceived state of any module nor select a different unit code. In theory, it would elicit a status response from whichever unit code was last selected if that unit has 2-way capabilities, but nothing else. This could be a coincidental assignment in the protocol, but I choose to believe that it was carefully selected by whatever engineer chose the assignment for 1111.

However, this doesn't give us any clues as to why M is the most frequently reported house code resulting from signal collisions. I've only seen it on RF so far, which uses a different protocol than the PLC one I've been discussing here. But if it's a bug in the SDK or PC interface firmware, it's possible that the bug is at a higher level than the processing of either protocol.

toasterking · April 30, 2019, 12:53:46 PM

I just wanted to post a follow up on this. 4 years later, I am still using mostly the same setup in my office and still using the same algorithm for collision avoidance and detection. However, I have tweaked some values:

I check to see if any commands were received in the last 750 ms (instead of 200 ms). I changed that because if an RF transmitter key is held down, the SDK waits up to 728 ms to begin showing repeats of a code, and I need to also be able to detect if a code has recently begun repeating.
Commands are now transmitted with a duration of 90 (instead of 1). This seems the lowest value that will cause the CM19A to transmit the command twice rather than once. Very rarely, a command would not get received by both RR501s when transmitted only once.
After transmitting, I delay the next transmission by at least 1200 ms (instead of 1100 ms).

So the full algorithm with its current values is:

Quote

Every time any copy of an RF command is received, reset a global timer.
Transmission sequence:
COLLISION AVOIDANCE: Before transmitting, check to see if any commands were received in the last 750ms (using the timer defined above). If so, delay the transmission until 750ms has elapsed since the last received command. (This accounts for the delay between reception of repeated commands, including repeating ones.)
Transmit the command with a duration of 90 (transmits 2 copies).
After transmitting, wait 1200ms before allowing the next transmission to be sent.
COLLISION DETECTION: If a command is received during the ("non-deaf" portion of the) 1200ms window, check to see what the command was.
If it was the same command that was just sent, ignore it because it is probably an echo.
If it was any other command, treat it as a collision and immediately repeat the entire transmission sequence.

This got me to about 95% reliability. But on rare occasions, the CM19A's receiver seemed "deaf" for much longer than usual after the transmission window. I was not able to predict the occurrence nor duration of this "deafness window" with any consistency. I think, at this point, that it is a race condition within the SDK that rarely gets triggered. And I wasn't happy with 95%, so eventually, reluctantly, I added one more tweak: I added an X10 MR26A receiver and a USB serial adapter (with PL-2303 chipset). So just to be clear, I have both a CM19A RF transceiver and an MR26A RF receiver connected to the same PC, and every X10 RF message is transmitted once (by the CM19A) and received twice (once by each receiver).

The advantage (sadly) of the old fashioned MR26A receiver is that no SDK functionality is offered. I had to write my own code to decode the X10 messages from the serial stream, but I am not subject to the software bugs in the SDK. This is probably the best tweak I have made so far! It works with the existing algorithm I had in place for handling collisions and greatly boosts the reception range for most devices. The exception is the SS13A Stick-a-Switch; that doesn't work more than about 4 inches away. I tried with two MR26As! (I assume its frequency is poorly tuned, but I have no way to test.) I could actually lower some of my values and get the system working faster with the MR26A, but I'm leaving it as is to err on the side of reliability. The only disadvantage of the MR26A is that it does not recognize security codes. Those codes are still received by the CM19A, but they don't get the extra edge on range or collision handling from the MR26A. (I'm not using any so-equipped devices, so it's irrelevant to me.)

EDIT 8/22/2019: Fixed values in the quote block that did not get updated when this message was posted originally. Weird.

News:

Collision management in X10 pure RF