Ticket #1305 (closed enhancement: fixed)

Opened 11 years ago

Last modified 8 years ago

auto-upgrade should support re-installation

Reported by: jdreed Owned by:
Priority: normal Milestone: Current Semester
Component: -- Keywords:
Cc: Fixed in version: debathena-auto-update 1.43
Upstream bug:

Description

auto-upgrade takes great pains to ensure it can't auto-upgrade to the current version. However, I think we want bulk reinstall functionality, for, e.g., this summer, to pick up a new disk layout.

My current naive plan is to add support for a new flag in clusterinfo, 'r' (for 'reinstall'). I'm going to test this in -bleeding.

Change History

comment:1 Changed 11 years ago by jdreed

  • Milestone changed from The Distant Future to Current Semester

comment:2 Changed 11 years ago by jdreed

  • Milestone changed from Current Semester to Summer 2013

comment:3 Changed 11 years ago by jdreed

Yeah, uh, we don't actually want a flag on version data, because then machines will reinstall endlessly. Instead, we support a new clusterinfo flag, "reinstall_at", which is a value in seconds since the epoch. If /var/log/athena-install.log's last mod time is less than that value, the machine will be reinstalled at the current release.

comment:4 Changed 11 years ago by jdreed

  • Status changed from new to committed
  • Fixed in version set to debathena-auto-update 1.43

comment:5 follow-up: ↓ 6 Changed 11 years ago by jweiss

Are there any cases where the install can fail before it starts, but still update the (timestamp on) the log file? Are we worried about the race condition where machines start a small update before the magic timestamp, but finish afterwards? Have we thought about what other screw cases might exist?

comment:6 in reply to: ↑ 5 Changed 11 years ago by jdreed

Replying to jweiss:

Are there any cases where the install can fail before it starts, but still update the (timestamp on) the log file?

I'm not sure what you mean by "fail before it starts", but that log file is created the instant the Debathena installer is launched in the postinstall. If the postinstall fails at any point, the machine comes up with the "Call hotline" greeter. Reinstallation is no more likely to fail than a release upgrade, as they are fundamentally the same operation.

Are we worried about the race condition where machines start a small update before the magic timestamp, but finish afterwards?

I'm not sure what you mean by "small update". Nothing updates the timestamp other than the PXE installation. auto-upgrade runs independently of auto-update. There could be a "race condition" in that a machine is installed (or re-installed by hotline) between when we set the timestamp in Moira, and when it propagates to Hesiod, but we can fix that by setting the timestamp to take into account the next scheduled DCM and the TTL. We could also update the value later.

Have we thought about what other screw cases might exist?

Yes. No new ones were introduced with this. If there are no new releases, then we check if we should be inrestalled. We ensure that the "reinstall_at" value is numeric. stat can only return a numeric value or the empty string, we test for the latter and force it to the number 0. The numeric comparison cannot fail at this point, barring severe internal RAM corruption. If the machine was installed before the timestamp, it will reinstall the current release and bypass the sanity check that prevents "upgrades" to the current release. If not, nothing happens. I have cleaned up the logic a bit.

comment:7 Changed 11 years ago by jweiss

Actually, I think my first two questions don't make sense. I missed that it was looking at athena-install.log, and not update.log or upgrade.log. So, eh "nevermind".

comment:8 Changed 8 years ago by andersk

  • Status changed from committed to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.