We have moved at community.getvera.com

Author Topic: lu_block_jobs  (Read 701 times)

Offline Buxton

  • Full Member
  • ***
  • Posts: 209
  • Karma: +12/-0
lu_block_jobs
« on: January 20, 2019, 11:29:12 pm »
Hey AK,

Did you ever come across the titled command when you wrote openLuup?    http://wiki.micasaverde.com/index.php/Luup_Plugins_ByHand#.3Crun.2Fjob.2Fincoming.2Ftimeout.3E

I'm asking because I've been trying to solve a dropped connection problem that I keep having with a plugin I have installed, and it occurred to me that it may be the result of a race condition of some sort.  Basically, anytime the connection drops (IP TCP socket), the plugin tries to re-open the connection with an io.open to no avail.  Here's the code:
Code: [Select]
    luup.io.open(lul_device, PLUGIN_SETTINGS.PANEL_IP_ADDRESS, PLUGIN_SETTINGS.PANEL_IP_PORT)
luup.io.intercept(lul_device)
local wStatus = luup.io.write("^01,00$\r\n", lul_device)
local pData = luup.io.read(2, lul_device)

A  luup reload will often, but not always, clear the problem.  The drop happens most often after long periods of connectivity, or if another plugin in behaving badly (per error messages in the log.)  I can also clear the problem by logging into the exterior device via putty. Once cleared, the device will once again connect with openLuup on reload.

Anyway, just looking for advice and not any sort of action or investigation as part of what's going on here is probably related to the way the exterior device handles repeated connections......

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #1 on: January 21, 2019, 06:01:27 am »
That link you gave was the most read Vera Wiki page ever when I was developing openLuup in the first place!

This is, undeniably, the worst implemented, worst documented, least understood, and error prone part of the Vera/MiOS Luup architecture.

Countless experiments by myself and others have failed to converge on the 'correct' implementation for openLuup.  Here are a few:


It's worth, perhaps, reading at least of bit of those threads to understand some of the issues.

One basic problem is lack of a luup.io.close() function, another is the detailed understanding of what luup.io.intercept() actually does, yet another is the differing requirements of devices once the connection is lost and needs to be re-established.

The solution generally adopted by plugin developers has been to abandon any attempt to use the luup.io module in favour of returning to the LuaSocket library and the functions available there.  You still have to overcome the semantic issue of what the particular device protocol requires.

Have you actually tried this on Vera and compared to openLuup?  I'm really interested in any observed discrepancies.  My expectation is that repeated calls to luup.io.open() will invariably fail.  The fact that a reload does not clear the problem, but external intervention with the device does, indicates that this is not perhaps an entirely openLuup issue?

Not an easy one.

______________

Edit: I should also note that I abandoned (early on) trying to emulate Vera exactly in the I/O implementation, witness the User Guide caveat that's been there from day one:

Quote
Doesn't support the <incoming> or <timeout> action tags in service files, 
but does support the device-level <incoming> tag.
« Last Edit: January 21, 2019, 06:06:19 am by akbooer »
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.

Offline Buxton

  • Full Member
  • ***
  • Posts: 209
  • Karma: +12/-0
Re: lu_block_jobs
« Reply #2 on: January 22, 2019, 11:47:24 pm »
OK, read through those very informative posts.  And will probably read through again as there were some insights in the comms protocol that might be at work here.  In the meantime, what do you make of the following.  These are sequential log entries that show the socket not being released--I'm assuming this as the same socket id is used.  Openluup does have an explicit close within the open function (I bracketed the close with log calls to get a sense of what's happening as well as creating a return variable for the close.)
Code: [Select]
_log("Now closing the socket for ".. tostring(device).. " ".. tostring(sock))
        local tempClose  = sock: close ()            -- close our end
_log("should now be closed for ".. tostring(device).. " ".. tostring(ip) .." ".. tostring(port) .." " .. tostring(tempClose))
        scheduler.socket_unwatch (sock)
        _log ("socket connection closed " .. tostring (sock))

Here are the relevant log entries:
Code: [Select]
2019-01-22 20:18:53.373   luup.io.open:: connecting to 10.17.2.30:4025, using crlf protocol tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:18:53.373   openLuup.io:: connect OK

2019-01-22 20:19:03.143   luup.io.incoming:: bytes received: 6, status: OK tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.143   luup_log:7: (EVL3VistaAlarmPanel::processLogin) EVL3 Interface requesting Login.
2019-01-22 20:19:03.143   luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.Plugin_Status was: Startup completed. now: Start Login #hooks:0
2019-01-22 20:19:03.143   luup.io.write:: message length: 6, bytes sent: 6, status: OK tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.144   luup.io.incoming:: bytes received: 10, status: OK tcp{client}: 0xaaab0c36f2a8

2019-01-22 20:19:03.151   openLuup.io:: Now closing the socket for 7 tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.151   openLuup.io:: should now be closed for 7 10.17.2.30 4025 1
2019-01-22 20:19:03.152   openLuup.io:: socket connection closed tcp{client}: 0xaaab0c36f2a8

and then minutes later  after the explicit close, there is an attempt to open the non-existent socket

2019-01-22 20:22:04.556   luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.ConnectionDown was: 0 now: 1 #hooks:1
2019-01-22 20:22:09.562   luup.io.write:: error: timeout tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:22:11.565   luup.io.read:: error: timeout tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:22:11.566   luup_log:7: EVL Data for io read while down nil iostatus false lul_device 7 lug_device 7

The attempt to open the non-existent socket is part of the code I posted in my opening question.

Is the release and re-acquire of A socket supposed to be handled in the device code??  If so, how to do so using luup.io.  I'm asking in the sense that I cannot envision doing anything different if I were to create my own socket connection.  I would not proceed differently as far as flags, timeouts etc ---from what is now being done internally by openluup.

As always, your insights greatly appreciated.
« Last Edit: January 22, 2019, 11:50:37 pm by Buxton »

Offline Buxton

  • Full Member
  • ***
  • Posts: 209
  • Karma: +12/-0
Re: lu_block_jobs
« Reply #3 on: January 23, 2019, 12:03:16 am »
And here's an example where the socket close occurs directly after a luup reload and what seems to be a successful data read:

Code: [Select]
2019-01-22 20:57:32.122   luup.io.incoming:: bytes received: 6, status: OK tcp{client}: 0xaaab0ce3aa18
2019-01-22 20:57:32.122   luup_log:7: (EVL3VistaAlarmPanel::processLogin) EVL3 Interface requesting Login.
2019-01-22 20:57:32.122   luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.Plugin_Status was: Startup completed. now: Start Login #hooks:0
2019-01-22 20:57:32.123   luup.io.write:: message length: 6, bytes sent: 6, status: OK tcp{client}: 0xaaab0ce3aa18
2019-01-22 20:57:32.124   luup.io.incoming:: bytes received: 10, status: OK tcp{client}: 0xaaab0ce3aa18
2019-01-22 20:57:32.124   openLuup.http:: GET /data_request?id=lu_status2&output_format=json&DataVersion=219291509&Timeout=60&MinimumDelay=1500&_=1548216570158 HTTP/1.1 tcp{client}: 0xaaab0dad8d28
2019-01-22 20:57:32.125   openLuup.http:: GET /console?page=log HTTP/1.1 tcp{client}: 0xaaab0ddb9858
2019-01-22 20:57:32.125   openLuup.wsapi:: using openLuup/console.lua for console
2019-01-22 20:57:32.125   openLuup.wsapi:: using REQUIRE to load .lua CGI
2019-01-22 20:57:32.143   openLuup.http:: request completed (57475 bytes, 0 chunks, 18 ms) tcp{client}: 0xaaab0ddb9858
2019-01-22 20:57:32.144   openLuup.io:: Now closing the socket for 7 tcp{client}: 0xaaab0ce3aa18
2019-01-22 20:57:32.144   openLuup.io:: should now be closed for 7 10.17.2.30 4025 1
2019-01-22 20:57:32.144   openLuup.io:: socket connection closed tcp{client}: 0xaaab0ce3aa18

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #4 on: January 23, 2019, 10:39:46 am »
The attempt to open the non-existent socket is part of the code I posted in my opening question.

Yes.  As I said...

My expectation is that repeated calls to luup.io.open() will invariably fail.

Regarding this...

And here's an example where the socket close occurs directly after a luup reload and what seems to be a successful data read:

...it would be the expected behaviour if the remote device closed the socket at its end after sending.  Given that this whole transaction happens in milliseconds, then I'm pretty sure that's what it's doing.

Let me ask again...

Have you actually tried this on Vera and compared to openLuup? 

...because I'm imagining that you will find the same behaviour there.

Perhaps what you are asking for is a DIFFERENT behaviour from Vera, wherein you can re-open a socket?
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.

Offline Buxton

  • Full Member
  • ***
  • Posts: 209
  • Karma: +12/-0
Re: lu_block_jobs
« Reply #5 on: January 23, 2019, 03:12:55 pm »
Yes, sorry I didn't answer about a Vera install.  Ironically, this very program that is giving me grief is the plugin that made me buy a Veralite in the first place as I wanted integration with my alarm system, and at the time, the EVL plugin was the only thing available.  With further irony, it is this plugin that led me to openLuup as when the plugin was loaded on my Vera, the machine would do a luup restart several times a day. Not to mention some oddities with Vera itself about reporting sensor states.  So it is impossible to tell what kind of behavior was occurring because the entire machine was unstable.  Needless to say, it worked out for the best as openLuup had been a lot of fun and has led me to all kinds of interesting integrations! That being said, I believe there are others using this plugin on Vera (it may work better on the faster machines with more memory.)

To the problem at hand--- one thing I'm not clear on is whether the same socket ID (in this case 0xaaab0ddb9858) is assigned to a socket within a given instance of openLuup regardless of whether it has been previously closed.  I'm guessing that the ID is a handle to the socket and not a hard identifier of an ip address + port combination as the id changes with a luup reload.  So part of the problem may be that a new socket handle needs to be created in the event of a socket close (closed internally by openLuup) that is then followed by a an open attempt. 

Per the millsecond timeout, this may also be the result of not supplying a valid login on a retry of the open.  That is certainly a code bug that I can fix with a hack so I need to look closer at the retries.  The EVL will immediately kick the connection if you try to write to the device while not logged in. 

One thing that I did find with testing is that if I increase the value of the IO timeout constants from 5 to 15, the plugin loads more reliably on a luup reload.  So maybe a partial fix would be to either allow a timeout value to be passed to the IO.open as an extra parameter, or to expose the timeouts so that they can be set dynamically in the luup variables.  The first approach would allow more customization per plugin.....

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #6 on: January 23, 2019, 05:59:22 pm »
So it is impossible to tell what kind of behavior was occurring because the entire machine was unstable.

Ah.

Quote
To the problem at hand--- one thing I'm not clear on is whether the same socket ID (in this case 0xaaab0ddb9858) is assigned to a socket within a given instance of openLuup regardless of whether it has been previously closed.  I'm guessing that the ID is a handle to the socket and not a hard identifier of an ip address + port combination as the id changes with a luup reload.  So part of the problem may be that a new socket handle needs to be created in the event of a socket close (closed internally by openLuup) that is then followed by a an open attempt. 

Yes, this is exactly right.  The whole implementation was based on my understanding of Vera's behaviour, that a single socket is assigned to the device on startup (if an IP is given) or on io.open() and is not subsequently changed.  This is what I meant when I said that there is no luup.io.close().  Having come to a point where the socket has timed out or been closed from the remote end, there IS no routine in the defined API to close  the socket and allow another call to luup.io.open() and without doing that explicitly, there no way to handle any initial login/handshake that a new TCP connection may require.

Having said that, since the Vera implementation is so poor, and therefore the openLuup copy equally so, I'm not beyond changing the way it works for the better.  I just never got my head around HOW that would exactly work.  This may indeed be the very test case that I was looking for.

Quote
So maybe a partial fix would be to either allow a timeout value to be passed to the IO.open as an extra parameter, or to expose the timeouts so that they can be set dynamically in the luup variables.  The first approach would allow more customization per plugin.....

Yes, possibly.  I'm mostly against adding extra parameters to the Luup API (in case they add ones in the future) but, on the other hand, I have done that in extremis a couple of times.

I'm willing to take a look at this if you're able to test out some of my attempts.
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.

Offline Buxton

  • Full Member
  • ***
  • Posts: 209
  • Karma: +12/-0
Re: lu_block_jobs
« Reply #7 on: January 23, 2019, 06:06:53 pm »
Yes, I have a secondary linux machine that I use for testing so there's no problem with running untested openLuup versions.

Offline reneboer

  • Beta Testers
  • Hero Member
  • *****
  • Posts: 1574
  • Karma: +110/-31
Re: lu_block_jobs
« Reply #8 on: January 24, 2019, 05:19:33 am »
Hi,

I am currently thinking of testing the waters with luup.io and the incoming routine for a websocket connection rather then how I done it now with a socket and a one second call_delay checking for waiting data using sock.select. The remote side will close the connection after one minute idle time, so I can test what works. Last time I tried it seems that the raw protocol is not 100% raw causing some problems on Vera at least. Also using incoming for all incoming data proved to be sooo slow it was useless. So I want to try to use luup.io.read after receiving the first byte (or two). Testing a new luup.io.open is needed I would make depended on the result of luup.io.is_connected.

Once I have results for both Vera and openLuup I'll let you know.

Cheers Rene
2xVeraLite, VeraEdge, openLuup, ALTUI, 20 switches, 10 dimmers, 20 sensors, 10 scene controllers, 1 Harmony Hub, many plug-ins. Not enough time.

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #9 on: January 24, 2019, 06:47:54 am »
Excellent!  Many thanks for the offer.
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.

Offline rigpapa

  • Beta Testers
  • Hero Member
  • *****
  • Posts: 1121
  • Karma: +187/-3
Re: lu_block_jobs
« Reply #10 on: January 24, 2019, 01:55:08 pm »
I'm following this conversation as well and working in the background on making WebSocket work with the Emby plugin, and a non-WebSocket continuous feed for Yeelight. I had both working with direct socket calls, and polling the socket, same as reneboer. Now I'm converting Emby to <incoming> and luup.io, to see if I can get something more like async behavior.

One thing I've noticed right away that does not work: if you have a child device, and handleChildren=1, and the children have their own sockets for talking to different endpoints (so you call luup.io.open with the child device number), you can luup.io.write() and luup.io..read() no problem, but I have not been successful getting incoming data routed through the parent implementation's <incoming> block to the child--I can see the data coming in (log level 52), but the incoming block is never called. If I make all the luup.io calls using the root device number, the incoming block starts working, but that's not a solution because it seems only one socket can be bound to a device in the luup.io model, so the parent can't maintain sockets for more than one child.  I would call this a (Vera) bug (perhaps incomplete implementation), because everything else on children seems to work, including the use of the <protocol> and <ioPort> tags in the device files for the child types.

I've got to run off to do some errands this afternoon, but when I return I'm going to see if <incoming> blocks on jobs for child devices work. That could be a workaround.
Author of Reactor, DelayLight, SiteSensor, Rachio, Deus Ex Machina II, Intesis WMP Gateway, Auto Virtual Thermostat and VirtualSensor plugins. Vera Plus w/100+ Z-wave devices. Vera3, Lite. Hassio, Slapdash.

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #11 on: January 24, 2019, 03:37:58 pm »
Not surprised it doesn't work for the child devices... no one ever told me that it should!  Are you sure that's the way it works on Vera?  Where. I GHG that be documented?

I lost the will to live trying to reverse engineer the Vera I/O, so I really stopped at the basics, but even then got frustrated by the lack of accurate documentation.  I ended up doing a fair amount of experimentation.

With your help, though, I?m happy to try and improve.!
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.

Offline rigpapa

  • Beta Testers
  • Hero Member
  • *****
  • Posts: 1121
  • Karma: +187/-3
Re: lu_block_jobs
« Reply #12 on: January 24, 2019, 04:37:02 pm »
Not surprised it doesn't work for the child devices... no one ever told me that it should!  Are you sure that's the way it works on Vera?  Where. I GHG that be documented?

No, that's not openLuup. Sorry if I wasn't clear about that. It's Vera that's broken. Vera can't route for child devices through the top-level <incoming>. That's a real pain in the backside. Everything else for the child works--they can have different protocols and ports, etc. All fine. It's just the <incoming> routing that fails. So, for a solo device like an alarm panel, where there's only one endpoint to talk to, that's no problem, the parent can do all of the talking and everything works. But if you need a dozen sockets because you have a dozen children connected to a dozen endpoints... out of luck. The only thing that might save it is if the device reply includes some identifying information that would let you figure out which endpoint the message came from and dispatch it to the correct device, but for Yeelights, for example, no such luck. If Vera had even included the IP address of the peer in the incoming callback... hmmm... I wonder if that's hiding in there somewhere... time to go dump _G in the callback and see what it's got...

Quote
I lost the will to live trying to reverse engineer the Vera I/O, so I really stopped at the basics, but even then got frustrated by the lack of accurate documentation.  I ended up doing a fair amount of experimentation.

 ;D ;D ;D ;D ;D

It's not good. Especially with the above.

My bigger-scope prediction is that the bulk of IoT devices are going to be WiFi, and Z-Wave and its ilk will slowly be relegated to a very specific subset of devices. Time to market and cost will be drivers away from protocols like Z-Wave, not to mention the ubiquity of WiFi. So, for that to be at all sane, Vera needs to get its act together on sockets. LuaSocket under Luup doesn't cut it, and luup.io isn't showing well either. This all needs fresh eyes on it.
Author of Reactor, DelayLight, SiteSensor, Rachio, Deus Ex Machina II, Intesis WMP Gateway, Auto Virtual Thermostat and VirtualSensor plugins. Vera Plus w/100+ Z-wave devices. Vera3, Lite. Hassio, Slapdash.

Offline rigpapa

  • Beta Testers
  • Hero Member
  • *****
  • Posts: 1121
  • Karma: +187/-3
Re: lu_block_jobs
« Reply #13 on: January 24, 2019, 05:57:06 pm »
So, good news for me, but probably not for you, @akbooer...

If, instead of trying to use the "global" incoming, I use <incoming> in a job action that runs on the child, the received data is dispatched correctly (on Vera) to the job (and therefore the child).

Code: [Select]
        <action>
            <serviceId>urn:toggledbits-com:serviceId:EmbyServer1</serviceId>
            <name>Update</name>
            <job>
                return 5,30 -- waiting for callback, 30 second timeout
            </job>
            <incoming>
                EmbyPlugin.handleIncoming( lul_data, lul_device )
                return 5,30 -- waiting for callback, 30-second timeout
            </incoming>
        </action>
Author of Reactor, DelayLight, SiteSensor, Rachio, Deus Ex Machina II, Intesis WMP Gateway, Auto Virtual Thermostat and VirtualSensor plugins. Vera Plus w/100+ Z-wave devices. Vera3, Lite. Hassio, Slapdash.

Offline akbooer

  • Moderator
  • Master Member
  • *****
  • Posts: 6387
  • Karma: +290/-70
  • "Less is more"
Re: lu_block_jobs
« Reply #14 on: January 25, 2019, 05:26:59 am »
This is for a luup.io.open() issued on from the parent with the child device number?
3x Vera Lite-UI5/Edge-UI7, 25x Fibaro, 23x TKB, 9x MiniMote, 2x NorthQ Power, 2x Netatmo, 1x Foscam FI9831P, 9x Philips Hue,
Razberry, MySensors Arduino, HomeWave, AltUI, AltHue, DataYours, Grafana, openLuup, ZWay, ZeroBrane Studio.