Jump to content

HDD warming: Global data threat?


Matsuda

Recommended Posts

hdd_veri_kurtarma.jpg

If there's one thing I hate, it's unsettled science. For instance: the effect of temperature on disk drives. Shorten their life or not? Most studies say no - including a new one - but Microsoft researchers disagree. Can’t we all just get along?

The folks at Backblaze published a detailed blog post on observed effects of temperature on disk drives. Like most studies, they didn't find one: After looking at data on over 34,000 drives, I found that overall there is no correlation between temperature and failure rate.

But then they ruined it - damn you, Backblaze! - by linking to a study by Microsoft and UVA researchers who DID find an issue. That blew my day as I had to, you know, look at the data and THINK. Hate that. But here goes.

The Backblaze data

Backblaze looked at 17 drive models from Seagate, WD, Hitachi and Toshiba. Author Brian Beach used a point-biserial correlation coefficient on drive average temperatures and whether drives failed.

He found one drive - a Seagate 1.5TB Barracuda LP - that had a weak but statistically significant correlation between failure rate and higher temperature. The Annual Failure Rate (AFR) doubled from cool drives to warm (above average temperature) drives. But because so many continued to work fine at any temperature, the correlation was weak.

Two more models, a Seagate Barracuda 3TB and a Hitachi Deskstar, showed weaker correlations - but in opposite directions. The Hitachi failed slightly more often at 21°C than at 31°C, while the Seagate failed slightly more often at the higher temperature.

Oh great! Now too cold is bad too.

Microsoft/UVA study

The 2010 Microsoft study, Datacenter Scale Evaluation of the Impact of Temperature on Hard Disk Drive Failures by Sriram Sankar, Mark Shaw and Kushagra Vaid of Microsoft and Sudhanva Gurumurthi, U of Virginia, came to very different conclusions:

1) We show strong correlation between temperature observed at different location granularities and failures observed. . . .

2) Although average temperature shows a correlation to disk failures, we show that variations in temperature or workload changes do not show significant correlation to failures observed in drive locations.

3) We . . . show that Chassis design knobs (disk placement, fan speeds) have a larger impact than tuning Workload knobs (intensity, different workload patterns), on disk temperature.

4) With the help of Arrhenius based temperature models and the datacenter cost model, we . . . show that datacenter temperature control has a significant cost advantage over increased fan speeds.

Here's a couple of relevant tables:

ms-uvatempvsafr-597x502.png?hash=MTEyLwHms-uvatempaccelerationfactor-458x511.png

Drive vendors have their say

Most drives today are spec'd at a 60°C (140°F) or even 70°C (158°F) operating temperature. Per the MS-UVA study, it is the average temperature, not variations in temperature, that affect drive life the most. If drives get really hot once in a while, not a big deal.

And hey, they say they'll operate, not that they'll last.

Reconciliation, to a point
Look at the data: Backblaze temps stop at 31°C while the MS/UVA study showed that AFR's are relatively flat up to 33°C and then start climbing. Not much disagreement between Backblaze and MS/UVA.

The Storage Bits take
One of the most popular myths about disk drives is that they are very sensitive to temperature. That may have been true 20 years ago, but it is clearly less so now. The drive vendors seem unconcerned as well.

Given that most users have a few dozen mixed age/vendor/chassis at most, these statistical musings have little predictive value. If you are running a data center and have thousands of drives, you should do a more careful analysis of the tradeoff between energy costs and increased disk failures.

The hidden storage market - between the 3 drive vendors and 8 or so Internet giants - is driving storage requirements now, not PCs or the enterprise. These warehouse scale systems are designed to tolerate drives failures gracefully, much more so than most enterprise infrastructures.

Eyeballing the stats from these and other studies, most enterprises should aim for about 35°C (95°F) disk temps in temperature controlled data centers. Save money and reduce global warming.

Comments welcome, as always. Scientists, always picking at each other: feature, not a bug. People who say "settled science" don't understand science.



search_button.gifSource

Link to comment
Share on other sites


  • Replies 3
  • Views 1.3k
  • Created
  • Last Reply

Top Posters In This Topic

  • jtmulc

    1

  • Matsuda

    1

  • MidnightDistortions

    1

  • The Owl

    1

Top Posters In This Topic

MidnightDistortions

Basically i think it's the extremes in temps and the constant fluctuations, the bigger the fluctuation the greater chance your drive could fail. If you got a 10C temp reading drive and turn it on and get it to 60C you could be looking at a failure. At least that's what i believe. It's hard to tell though because these tests generally push the drives instead of normal operation. The idea is to either put the drives on a fan or don't. If your doing both then you're just asking for a paperweight. Probably a good reason why you shouldn't buy refurb or used drives because that person might have had it in hot conditions while others may have had fans on it keeping it at normal temps.

I also think that power fluctuations by a bad PSU or flickering lights (having a surge protector or a UPS would counteract that) could slowly kill a HDD, depending on the environment and the integrity of the HDD components.

Link to comment
Share on other sites


two things are the biggest destroyer of hard drives

Heat

turning on and off.

My own "Store" is running 24 x 7 x 52 gets re booted without actually being shut off once a day, that machine has 1 x 128Gb 830 Samsung Pro , 5 x 2 Tb Samsung Spinrite's and 1 x 3Tb Seagate, I will be adding one more of those as I am running out of storage.

I hoover off the various covers once a fortnight and blow out all the dust with a clean air line weekly.

Link to comment
Share on other sites


I am not going to comment on my setup at home out of fear that as soon as I do, Murphy's Law will kick in and all my drives will die.

Link to comment
Share on other sites


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...