Ticket #133 (new enhancement)
debathena-zephyr-config should kill off zhm initscript
Reported by: | broder | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | Current Semester |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: |
Description
Since zhm doesn't handle not having networking very well, instead of hacking an if-up.d script into place, Karl suggests that we disable his initscript entirely on systems pre-dating squeeze, and instead provide if-up.d and if-down.d scripts that start and stop zhm.
We should be sure to get the patch to Karl, as well, so he can include it in squeeze.
Change History
comment:1 Changed 15 years ago by broder
- Priority changed from major to minor
- Component set to --
- Milestone set to The Distant Future
comment:2 Changed 14 years ago by jdreed
- Milestone changed from The Distant Future to Fall 2010
We should maybe look at this sooner rather than later? I ended up with mmanley's "two zhms" problem today on Lucid (-workstation). (See the 3/17 zlogs, currently debathena.172) It was fine on Friday, but it did take some updates over the weekend and rebooted. Of course, now that I've added some logging, I can't reproduce it. But there is at least one other workstation in N42 experiencing these transient symptoms.
I'll also note that the "restart" option to the initscript causes zhm to be restarted with the "-N" option, which does not exist according to the man page. So the initscript is vaguely broken anwyay:
jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm stop Stopping zephyr host manager: zhm.- jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm jdreed 2555 2035 0 10:02 pts/0 00:00:00 grep -i zhm jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm start Starting zephyr host manager: zhm. jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm root 2562 1 0 10:02 ? 00:00:00 /usr/sbin/zhm -f jdreed 2565 2035 0 10:02 pts/0 00:00:00 grep -i zhm jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm restart Restarting zephyr host manager: zhm. jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm root 2573 1 0 10:02 ? 00:00:00 /usr/sbin/zhm -N -f jdreed 2575 2035 0 10:02 pts/0 00:00:00 grep -i zhm jdreed@INFINITE-LOOP:~$
comment:3 Changed 14 years ago by jdreed
- Priority changed from minor to major
I encountered this on a cluster machine too, which is a first. There were two zhm processes, 2018 and 2019. 2018 was the one referenced in /var/run/zhm.pid. So there's clearly a race condition here, and I think I blame start-stop-daemon, but I'm not sure.
We should pursue this upstream, but as a short term fix, about a "sleep 1" in /etc/network/if-up.d/debathena-zephyr-config, before it restarts zhm?