I really do not know enough about Linux, C#, etc. so I'm guessing a bit that it's the additional overhead of the various systems that are creating the delays. Here's a quote from the wikipedia entry for C#...
Although C# applications are intended to be economical with regard to memory and processing power requirements, the language was not intended to compete directly on performance and size with C or assembly language.
My programming, whether for Windows, Linux, OSX, Android, iOS or embedded (PIC, Atmel) has always used various versions of Basic (PureBasic, ZBasic, PicBasicPro, Basic4android, etc.) that compile to either C or Assembly which are usually competitive in both size and speed with more
advanced languages used by professional programmers. And, after reading Jan Axelson's excellent books on USB and Embedded USB programming, I have tried to avoid dealing with the complexity of USB by opting for USB-Serial converter chips wherever possible.
In my own efforts, I've tried to get as close to the powerline as possible to get around X10's censorship (i.e. only reporting valid X10 codes) and the delayed reports. I designed a daughterboard that replaced the microcontroller in the RR501 turning it into a CM15A-like interface but with a serial port which reported all activity. And, I designed a daughterboard using an LM567 tone decoder that could be added to a PL513, turning it into a two-way TTL interface that reported all activity in realtime. Unfortunately my health and X10's health put an end to my projects and to the RR501 and PL513.