OS X 10.9.5 stops responding with OpenZFS on OS X 1.4.5

http://web.archive.org/web/20151107061126/https://openzfsonosx.org/forum/viewtopic.php?f=26&t=2373 for a snapshot of the topic that preceded this post. (Some of that content disappeared, so I continued here.)

2009 MacBookPro5,2 with 8 GB memory and Mavericks:

Maybe related: ARC Duelling with OS X File Cache · Issue #292 · openzfsonosx/zfs · GitHub

Also remarkable, although I never experimented with a ZVOL: After heavy reads on zvols and datasets, the system hangs with 1.4.5 · Issue #419 · openzfsonosx/zfs

Re https://openzfsonosx.org/wiki/Performance I experimented with a tune away from the defaults. If I recall correctly:

  • 2 GB ARC
  • 1 GB ARC meta.

Then, for a pool named zhandy on a Transcend StoreJet 25M (25 mobile) (probably TS640GSJ25M) with a USB 2.0 connection, I used PC-BSD 11.0-CURRENTNOV2015 on a different notebook to:

  • begin a scrub
  • unmount the one file system of the pool
  • export the pool.

Saturday 2015-11-07 sometime around 06:30 I reconnected that mobile hard disk drive to the MacBookPro5,2 and commanded:

sudo zpool import zhandy

Soon after entering the password, I could no longer see an on-screen pointer so, suspecting a problem, I keyed Control-T (SIGINFO) then, at 06:39:11, the chord for sysdiagnose(1).

The on-screen clock stopped at 06:40:54 but activity (the scrub, presumably) remained visible at the mobile hard disk drive. So at the nearby notebook with PC-BSD, I began writing these notes and attempted a connection to the Mac:

ssh bbsadmin-l@192.168.1.3

The GUI of the Mac was almost completely at a standstill – very slow blinking of the rectangular cursor at the command line – and at 07:30 in real time, the clock in the menu bar of the Mac appeared stuck at 06:43:40.

No response to the ssh command so I could not tell, remotely, whether spindump had begun.

At the mobile HDD, activity remained visible.

At the command line, still, the result of SIGINFO:

load: 130.55  cmd: zpool 8006 running 0.00u 45.04s

I keyed Control-T once more, no response. I continued drafting this post, then with the real time around 07:32 I saw that the clock on the Mac had progressed to 07:31:45.

Around 08:00, with the clock still stuck at 07:31:45, I keyed Command-Control-Power to force a restart of the OS.

Soon afterwards, mobile HDD activity ceased to be visible but the forced restart was unsuccessful, so after 08:05 I forced off the Mac and then used single user mode, hoping to find a directory –

/private/var/tmp/sysdiagnose_…

– if found, I would have moved it to a non-volatile area before proceeding with multi-user mode.

Unfortunately, the unresponsive state of the OS had not allowed the stackshot(1) daemon to run the sysdiagnose tool – so there was no spindump, and so on. A single user mode view of /private/var/log showed that the most recent write was to system.log at 06:38 – before the key chord for sysdiagnose was used.

Alongside system.log one other file – org.openzfsonosx.zed.err – was also modified at 06:38.

Later, I made a copy of that .err file but there are no time stamps; I can’t guess which lines were the last before the OS stopped responding. I might paste the content of that file to a pastebin (and link from this post), but I doubt that it’ll help towards a bug report …

Advertisements