Original Link: https://www.anandtech.com/show/6884/crucial-micron-m500-review-960gb-480gb-240gb-120gb
The Crucial/Micron M500 Review (960GB, 480GB, 240GB, 120GB)
by Anand Lal Shimpi on April 9, 2013 9:59 AM ESTThis is probably the most excited I've been about any SSD launch in quite a while. At CES this year, Crucial announced its M500 SSD - the world's first to use Micron's new 128Gbit MLC NAND die. Courtesy of the cost savings and density increase associated with this new 128Gbit NAND, the M500 would be available in a 960GB capacity, priced at $599. That works out to be around $0.62 per GB for a truly gigantic drive by today's standards. It's exciting. For the past five years I've been learning to live off of less storage that I thought I needed, but the M500 had the potential to spoil me once again.
The M500 starts out with a familiar refrain: a Marvell controller with custom firmware from Crucial/Micron and of course, Micron NAND. All of these parts get updated though, some in more interesting ways than others. The controller is now Marvell’s 88SS9187, an updated version of the 9174 used in the m4. The 9187 is a speed/feature bump over the 9174 and is also used in Plextor’s M5 Pro. I should note that this time around both the Crucial (end user) and Micron (OEM) drives will feature the same M500 branding.
One of the benefits of Marvell’s 9187 is the support for DDR3 memory, which we see exercised on the M500. In its largest configuration, the M500 features 1GB of DDR3-1600. Crucial claims only 2 - 4MB of user data ever ends up in this DRAM, the overwhelming majority of the DRAM is used to cache the page/indirection table that maps logical block addresses to pages in NAND. Like most SSD makers, Crucial won’t talk about the structure of its mapping table but given the size of the DRAM I think it’s safe to assume that we’re looking at a relatively flat structure that should be easy to manage (more on this later).
Crucial / Micron M500 Specifications | |||||||||||
120GB | 240GB | 480GB | 960GB | ||||||||
Controller | Marvell 88SS9187 | ||||||||||
NAND | Micron 20nm 2bpc MLC NAND (128Gbit die) | ||||||||||
Form Factor | 2.5" 7mm/9.5mm, mSATA, M.2 | 2.5" 7mm/9.5mm, mSATA, M.2 | 2.5" 7mm/9.5mm, mSATA, M.2 | 2.5" 7mm/9.5mm | |||||||
Sequential Read |
500MB/s
|
500MB/s
|
500MB/s
|
500MB/s
|
|||||||
Sequential Write |
130MB/s
|
250MB/s
|
400MB/s
|
400MB/s
|
|||||||
4KB Random Read |
62K IOPS
|
72K IOPS
|
80K IOPS
|
80K IOPS
|
|||||||
4KB Random Write |
35K IOPS
|
60K IOPS
|
80K IOPS
|
80K IOPS
|
|||||||
Drive Lifetime | 72TB Writes (90% full, 25/75% sequential/random IO - 50% 4KB, 40% 64KB, 10% 128KB) | ||||||||||
Warranty | 3 years |
While the M500’s controller is nothing new, its NAND is. The M500 is the first drive to ship with the latest version of IMFT’s 20nm MLC NAND, featuring 128Gbit die. All previous NAND devices from IMFT (as well as its competitors) top out at 64Gbit (8GB) per 2-bit MLC NAND die. The move to larger die decreases the number of die/devices needed to hit each capacity point, and it also makes 1TB SSDs cost effective for the first time ever.
The cost savings come from the fact that these 128Gbit die aren’t simple doublings of last year’s 64Gbit devices; they include a few changes. The most prominent is a shift in page size from 8KB to 16KB. Larger page sizes are more desirable to implement at smaller NAND geometries, which is why you normally see these page size transitions with major shifts in process technology (e.g. 4KB to 8KB page size transition back at 25nm). The good news is that larger page sizes increase sequential throughput, but at the expense of latency. Given that NAND program times increase with smaller NAND geometries, once again the deck is stacked against manufacturers looking to increase performance as they exploit the benefits of Moore’s Law.
The other big change with the 128Gbit implementation of IMFT’s 20nm process is the inclusion of ONFI 3.0 support. There are some power savings courtesy of ONFI 3.0 (lower voltages, on-die termination), but the big news here is an increase in max interface speed. The previous ONFI interface standard (2.x) topped out at around 200MB/s, while ONFI 3.0 kicks that up to 400MB/s. Crucial’s implementation seems to be limited to around 330MB/s, but the drive isn’t anywhere close to saturating that. Remember the interface speed governs the maximum rate at which you can transfer data to/from a NAND device. Most NAND devices are capable of dual-channel operation so in the higher capacity implementations we’re talking about a maximum NAND-to-controller transfer rate of over 600MB/s. There’s more than enough headroom here.
Supporting the new controller, new NAND die, larger page sizes and ONFI 3.0 obviously require a new firmware, so the M500 ships with an evolution of what Crucial developed for the m4. The end result is vastly improved performance across the board, the big question being how well does it compare to the rest of the world given how much has changed since the m4 first arrived on the market.
The 20nm 128Gbit NAND: Larger Pages, Larger Blocks, Lower Performance & Cost?
Intel/Micron NAND Evolution | |||||||||||
50nm | 34nm | 25nm | 20nm | 20nm | |||||||
Single Die Max Capacity | 16Gbit | 32Gbit | 64Gbit | 64Gbit | 128Gbit | ||||||
Page Size | 4KB | 4KB | 8KB | 8KB | 16KB | ||||||
Pages per Block | 128 | 128 | 256 | 256 | 512 | ||||||
Read Page (max) | - | - | 75 µs | 100 µs | 115 µs | ||||||
Program Page (typical) | 900 µs | 1200 µs | 1300 µs | 1300 µs | 1600 µs | ||||||
Erase Block (typical) | - | - | 3 ms | 3 ms | 3.8 ms | ||||||
Die Size | - | 172mm2 | 167mm2 | 118mm2 | 202mm2 | ||||||
Gbit per mm2 | - | 0.186 | 0.383 | 0.542 | 0.634 | ||||||
Rated Program/Erase Cycles | 10000 | 5000 | 3000 | 3000 | 3000 |
There's a lot of data in the table above, but if you look closely you'll see a couple of trends. The obvious ones are increasing page and block size over time. NAND program latency has also climbed steadily over the years, while endurance decreased. All in all, the picture looks pretty bleak. It's impressive that performance keeps going up each generation given how much the deck is stacked against seeing continued performance improvements. The increase in program time gives you a preview of what we're going to see in the performance pages. Small writes will take longer. Garbage collection routines on a full drive will also take longer to run as each block that needs to be recycled for use has more pages and more data to deal with. Although Crucial uses a faster controller in the M500 vs. m4, the internal housekeeping it has to do goes up tremendously as well. The M500 isn't a drive that was built in pursuit of peak performance. Instead this drive targets the mainstream.
Looking at the difference in density between the two 20nm NAND devices, there's nearly a 17% increase in density from moving to the larger page/block sizes. It's a remarkable improvement especially when you consider the gains are decoupled from a new process node. Ultimately this is Micron's answer to TLC for the time being. Rather than sacrificing endurance to get to lower price points, the 20nm 128Gbit 2bpc MLC NAND device at mature yields should deliver competitive pricing at higher endurance. Indeed this is the message behind Crucial's M500. The company isn't targeting Samsung's SSD 840 Pro, but rather the TLC based 840.
Price Comparison | |||||||||||
120/128GB | 240/256GB | 480/512GB | 960GB | ||||||||
Crucial M500 | $129 ($129) | $219 ($202) | $399 ($442) | $599 ($570) | |||||||
Intel SSD 335 | $181 | $220 | - | - | |||||||
Samsung SSD 840 | $100 | $169 | $333 | - | |||||||
Samsung SSD 840 Pro | $139 | $229 | $463 | - |
The reality of it all is the M500's MSRPs are closer to the 840 Pro's street prices than the 840's. MSRPs tend to run a bit high on SSDs, so I wouldn't be too surprised to see the M500 eventually settle down closer to the 840 (remember the MSRP for the 840/840 Pro at 250/256GB are $199 and $269, respectively). It's definitely a different approach to driving costs down vs. going to TLC, and it's one that can't necessarily be repeated each generation, but for now the answer works. I'm not sure how meaningful the added endurance is for most client users, although you could make an interesting case for the M500 in some enterprise workloads that the TLC 840 wouldn't be able to make it into.
The big news is of course the 960GB capacity point. At $599 the 960GB M500 is by far the cheapest drive available at anywhere that capacity. A quick search on Newegg reveals a $1000 Mushkin 960GB drive and a $3000 1TB OCZ Octane. At $599, the 960GB is a steal at $0.62/GB. Even the Phison based 960GB BP4 from MyDigitalSSD weighs in at $799, and OWC's Mercury Electra MAX (3Gbps SATA) is still over $1000. To put the drive's excellent price in perspective, the 960GB M500 has roughly the same MSRP as Intel's 80GB X25-M had back in 2008. That's an order of magnitude more storage capacity at the same price in 5 years time. Moore's Law makes me happy.
Encryption Done Right?
Arguably one of the most interesting features of the M500 is its hardware encryption engine. Like many modern drives, the M500 features 256-bit AES encryption engine - all data written to the drive is stored encrypted. By default you don't need to supply a password to access the data, the key is just stored in the controller and everything is encrypted/decrypted on the fly. As with most SSDs with hardware encryption, if you set an ATA password you'll force the generation of a new key and that'll ensure no one gets access to your data.
Unfortunately, most ATA passwords aren't very secure so the AES-256 engine ends up being a bit overkill when used in this way. Here's where the M500 sets itself apart from the pack. The M500's firmware is TCG Opal 2.0 and IEEE-1667 compliant. The TCG Opal support alone lets you leverage third party encryption tools to more securily lock down your system. The combination of these two compliances however makes the M500 compatible with Microsoft's eDrive standard.
In theory, Windows 8's BitLocker should leverage the M500's hardware encryption engine instead of using a software encryption layer on top of it. The result should be better performance and power consumption. Simply enabling BitLocker didn't seem to work for me (initial encryption time should take a few seconds not 1+ hours if it's truly leveraging the M500's hardware encryption), however according to Crucial it's a matter of making sure both my test platform and the drive support the eDrive spec. There's hardly any good info about this online so I'm still digging on how to make it work. Once I figure it out I'll update this post. Update: It works!
Assuming this does work however, the M500 is likely going to be one of the first drives that's a must have if you need to run with BitLocker enabled on Windows 8. The performance impact of software encryption isn't huge on non-SandForce drives, but minimizing it to effectively nothing would be awesome.
Crucial is also printing a physical security ID on all M500 drives. The PSID is on the M500's information label and is used in the event that you have a password protected drive that you've lost the auth code for. In the past you'd have a brick on your hand. With the M500 and its PSID, you can do a PSID revert using 3rd party software and at least get your drive back. The data will obviously be lost forever but the drive will be in an unlocked and usable state. I'm also waiting to hear back from Crucial on what utilities can successfully do a PSID reset on the M500.
NAND Configurations, Spare Area & DRAM
I've got the full lineup of M500s here for review. All of the drives are 2.5" 7mm form factor designs, but they all ship with a spacer you can stick on the drive for use in trays that require a 9.5mm drive (mSATA and M.2/NGFF versions will ship in Q2). The M500 chassis is otherwise a pretty straightforward 8 screw design (4 hold the chassis together, 4 hold the PCB in place). There's a single large thermal pad that covers both the Marvell 9187 controller and DDR3-1600 DRAM, allowing them to use the metal chassis for heat dissipation. The M500 is thermally managed. Should the controller temperature exceed 70C, the firmware will instruct the drive to reduce performance until it returns to normal operating temperature. The drive reduces speed without changing SATA PHY rate, so it should be transparent to the host.
The M500 is Crucial's first SSD to use 20nm NAND, which means this is the first time it has had to deal with error and defect rates at 20nm. For the most part, really clever work at the fabs and on the firmware side keeps the move to 20nm from being a big problem. Performance goes down but endurance stays constant. According to Crucial however, defects are more prevalent at 20nm - especially today when the process, particularly for these new 128Gbit die parts, is still quite new. To deal with potentially higher defect rates, Crucial introduced RAIN (Redundant Array of Independent NAND) support to the M500. We've seen RAIN used on Micron's enterprise SSDs before, but this is the first time we're seeing it used on a consumer drive.
You'll notice that Crucial uses SandForce-like capacity points with the M500. While the m4/C400 had an industry standard ~7% of its NAND set aside as spare area, the M500 roughly doubles that amount. The extra spare area is used exclusively for RAIN and to curb failure due to NAND defects, not to reduce write amplification. Despite the larger amount of spare area, if you want more consistent performance you're going to have to overprovision the M500 as if it were a standard 7% OP drive.
The breakdown of capacities vs. NAND/DRAM on-board is below:
Crucial M500 NAND/DRAM Configuration | |||||||||||
# of NAND Packages | # of Die per Package | Total NAND on-board | DRAM | ||||||||
960GB | 16 | 4 | 1024GB | 1GB | |||||||
480GB | 16 | 2 | 512GB | 512MB | |||||||
240GB | 16 | 1 | 256GB | 256MB | |||||||
120GB | 8 | 1 | 128GB | 256MB |
As with any transition to higher density NAND, there's a reduction in the number of individual NAND die and packages in any given configuration. The 9187 controller has 8 NAND channels and can interleave requests on each channel. In general we've seen the best results when 16 or 32 devices are connected to an 8-channel controller. In other words, you can expect a substantial drop off in performance when going to the 120GB M500. Peak performance will come with the 480GB and 960GB drives.
You'll also note the lack of a 60GB offering. Given the density of this NAND, a 60GB drive would only populate four channels - cutting peak sequential performance in half. Crucial felt it would be best not to come out with a 60GB drive at this point, and simply release a version that uses 64Gbit die at some point in the future.
The heavy DRAM requirements point to a flat indirection table, similar to what we saw Intel move to with the S3700. Less than 5MB of user data is ever stored in the M500's DRAM at any given time, the bulk of the DRAM is used to cache the drive's OS, firmware and logical to physical mapping (indirection) table. Relatively flat maps should be easy to defragment, but that's assuming the M500's garbage collection and internal defragmentation routines are optimal.
Performance Consistency
In our Intel SSD DC S3700 review I introduced a new method of characterizing performance: looking at the latency of individual operations over time. The S3700 promised a level of performance consistency that was unmatched in the industry, and as a result needed some additional testing to show that. The reason we don't have consistent IO latency with SSDs is because inevitably all controllers have to do some amount of defragmentation or garbage collection in order to continue operating at high speeds. When and how an SSD decides to run its defrag and cleanup routines directly impacts the user experience. Frequent (borderline aggressive) cleanup generally results in more stable performance, while delaying that can result in higher peak performance at the expense of much lower worst case performance. The graphs below tell us a lot about the architecture of these SSDs and how they handle internal defragmentation.
To generate the data below I took a freshly secure erased SSD and filled it with sequential data. This ensures that all user accessible LBAs have data associated with them. Next I kicked off a 4KB random write workload across all LBAs at a queue depth of 32 using incompressible data. I ran the test for just over half an hour, no where near what we run our steady state tests for but enough to give me a good look at drive behavior once all spare area filled up.
I recorded instantaneous IOPS every second for the duration of the test. I then plotted IOPS vs. time and generated the scatter plots below. Each set of graphs features the same scale. The first two sets use a log scale for easy comparison, while the last set of graphs uses a linear scale that tops out at 40K IOPS for better visualization of differences between drives.
The high level testing methodology remains unchanged from our S3700 review. Unlike in previous reviews however, I did vary the percentage of the drive that I filled/tested depending on the amount of spare area I was trying to simulate. The buttons are labeled with the advertised user capacity had the SSD vendor decided to use that specific amount of spare area. If you want to replicate this on your own all you need to do is create a partition smaller than the total capacity of the drive and leave the remaining space unused to simulate a larger amount of spare area. The partitioning step isn't absolutely necessary in every case but it's an easy way to make sure you never exceed your allocated spare area. It's a good idea to do this from the start (e.g. secure erase, partition, then install Windows), but if you are working backwards you can always create the spare area partition, format it to TRIM it, then delete the partition. Finally, this method of creating spare area works on the drives we've tested here but not all controllers may behave the same way.
The first set of graphs shows the performance data over the entire 2000 second test period. In these charts you'll notice an early period of very high performance followed by a sharp dropoff. What you're seeing in that case is the drive allocating new blocks from its spare area, then eventually using up all free blocks and having to perform a read-modify-write for all subsequent writes (write amplification goes up, performance goes down).
The second set of graphs zooms in to the beginning of steady state operation for the drive (t=1400s). The third set also looks at the beginning of steady state operation but on a linear performance scale. Click the buttons below each graph to switch source data.
Corsair Neutron 240GB | Crucial m4 256GB | Crucial M500 960GB | Plextor M5 Pro Xtreme 256GB | Samsung SSD 840 Pro 256GB | |||||
Default | |||||||||
25% Spare Area |
Like most consumer drives, the M500 exhibits the same pattern of awesome performance for a short while before substantial degradation. The improvement over the m4 is just insane though. Whereas the M500 sees its floor at roughly 2600 IOPS, the m4 will drop down to as low as 28 IOPS. That's slower than mechanical hard drive performance and around the speed of random IO in an mainstream ARM based tablet. To say that Crucial has significantly improved IO consistency from the m4 to the M500 would be an understatement.
Plextor's M5 Pro is an interesting comparison because it uses the same Marvell 9187 controller. While both drives attempt to be as consistent as possible, you can see differences in firmware/gc routines clearly in these charts. Plextor's performance is more consistent and higher than the M500 as well.
The 840 Pro comparison is interesting because Samsung manages better average performance, but has considerably worse consistency compared to the M500. The 840 Pro does an amazing job with 25% additional spare area however, something that can't be said for the M500. Although performance definitely improves with 25% spare area, the gains aren't as dramatic as what happens with Samsung. Although I didn't have time to run through additional spare are points, I do wonder if we might see better improvements with even more spare area when you take into account that ~7% of the 25% spare area is reserved for RAIN.
Corsair Neutron 240GB | Crucial m4 256GB | Crucial M500 960GB | Plextor M5 Pro Xtreme 256GB | Samsung SSD 840 Pro 256GB | |||||
Default | |||||||||
25% Spare Area |
I am relatively pleased by the M500's IO consistency without any additional over provisioning. I suspect that anyone investing in a 960GB SSD would want to use as much of it as possible. At least in the out of box scenario, the M500 does better than the 840 Pro from a consistency standpoint. None of these drives however holds a candle to Corsair's Neutron however. The Neutron's LAMD controller shows its enterprise roots and delivers remarkably high and consistent performance out of the box.
Corsair Neutron 240GB | Crucial m4 256GB | Crucial M500 960GB | Plextor M5 Pro Xtreme 256GB | Samsung SSD 840 Pro 256GB | |||||
Default | |||||||||
25% Spare Area |
A Preview of The Destroyer, Our 2013 Storage Bench
When I built the AnandTech Heavy and Light Storage Bench suites in 2011 I did so because we didn't have any good tools at the time that would begin to stress a drive's garbage collection routines. Once all blocks have a sufficient number of used pages, all further writes will inevitably trigger some sort of garbage collection/block recycling algorithm. Our Heavy 2011 test in particular was designed to do just this. By hitting the test SSD with a large enough and write intensive enough workload, we could ensure that some amount of GC would happen.
There were a couple of issues with our 2011 tests that I've been wanting to rectify however. First off, all of our 2011 tests were built using Windows 7 x64 pre-SP1, which meant there were potentially some 4K alignment issues that wouldn't exist had we built the trace on a system with SP1. This didn't really impact most SSDs but it proved to be a problem with some hard drives. Secondly, and more recently, I've shifted focus from simply triggering GC routines to really looking at worst case scenario performance after prolonged random IO. For years I'd felt the negative impacts of inconsistent IO performance with all SSDs, but until the S3700 showed up I didn't think to actually measure and visualize IO consistency. The problem with our IO consistency tests are they are very focused on 4KB random writes at high queue depths and full LBA spans, not exactly a real world client usage model. The aspects of SSD architecture that those tests stress however are very important, and none of our existing tests were doing a good job of quantifying that.
I needed an updated heavy test, one that dealt with an even larger set of data and one that somehow incorporated IO consistency into its metrics. I think I've come up with the test, but given the short timeframe for this review (I only got my M500 drives a few days ago) I couldn't get a ton of data ready for you all today. The new benchmark doesn't even have a name, I've just been calling it The Destroyer (although AnandTech Storage Bench 2013 is likely a better fit for PR reasons).
Everything about this new test is bigger and better. The test platform moves to Windows 8 Pro x64. The workload is far more realistic. Just as before, this is an application trace based test - I record all IO requests made to a test system, then play them back on the drive I'm measuring and run statistical analysis on the drive's responses.
Imitating most modern benchmarks I crafted the Destroyer out of a series of scenarios. For this benchmark I focused heavily on Photo editing, Gaming, Virtualization, General Productivity, Video Playback and Application Development. Rough descriptions of the various scenarios are in the table below:
AnandTech Storage Bench 2013 Preview - The Destroyer | ||||||||||||
Workload | Description | Applications Used | ||||||||||
Photo Sync/Editing | Import images, edit, export | Adobe Photoshop CS6, Adobe Lightroom 4, Dropbox | ||||||||||
Gaming | Download/install games, play games | Steam, Deus Ex, Skyrim, Starcraft 2, BioShock Infinite | ||||||||||
Virtualization | Run/manage VM, use general apps inside VM | VirtualBox | ||||||||||
General Productivity | Browse the web, manage local email, copy files, encrypt/decrypt files, backup system, download content, virus/malware scan | Chrome, IE10, Outlook, Windows 8, AxCrypt, uTorrent, AdAware | ||||||||||
Video Playback | Copy and watch movies | Windows 8 | ||||||||||
Application Development | Compile projects, check out code, download code samples | Visual Studio 2012 |
While some tasks remained independent, many were stitched together (e.g. system backups would take place while other scenarios were taking place). The overall stats give some justification to what I've been calling this test internally:
AnandTech Storage Bench 2013 Preview - The Destroyer, Specs | |||||||||||||
The Destroyer (2013) | Heavy 2011 | ||||||||||||
Reads | 38.83 million | 2.17 million | |||||||||||
Writes | 10.98 million | 1.78 million | |||||||||||
Total IO Operations | 49.8 million | 3.99 million | |||||||||||
Total GB Read | 1583.02 GB | 48.63 GB | |||||||||||
Total GB Written | 875.62 GB | 106.32 GB | |||||||||||
Average Queue Depth | ~5.5 | ~4.6 | |||||||||||
Focus | Worst case multitasking, IO consistency | Peak IO, basic GC routines |
SSDs have grown in their performance abilities over the years, so I wanted a new test that could really push high queue depths at times. The average queue depth is still realistic for a client workload, but the Destroyer has some very demanding peaks. When I first introduced the Heavy 2011 test, some drives would take multiple hours to complete it - today most high performance SSDs can finish the test in under 90 minutes. The Destroyer? So far the fastest I've seen it go is 10 hours. Most high performance I've tested seem to need around 12 - 13 hours per run, with mainstream drives taking closer to 24 hours. The read/write balance is also a lot more realistic than in the Heavy 2011 test. Back in 2011 I just needed something that had a ton of writes so I could start separating the good from the bad. Now that the drives have matured, I felt a test that was a bit more balanced would be a better idea.
Despite the balance recalibration, there's just a ton of data moving around in this test. Ultimately the sheer volume of data here and the fact that there's a good amount of random IO courtesy of all of the multitasking (e.g. background VM work, background photo exports/syncs, etc...) makes the Destroyer do a far better job of giving credit for performance consistency than the old Heavy 2011 test. Both tests are valid, they just stress/showcase different things. As the days of begging for better random IO performance and basic GC intelligence are over, I wanted a test that would give me a bit more of what I'm interested in these days. As I mentioned in the S3700 review - having good worst case IO performance and consistency matters just as much to client users as it does to enterprise users.
Given the sheer amount of time it takes to run through the Destroyer, and the fact that the test was only completed a little over a week ago, I don't have many results to share. I'll be populating this database over the coming weeks/months. I'm still hunting for any issues/weirdness with the test so I'm not ready to remove the "Preview" label from it just yet. But the results thus far are very telling.
I'm reporting two primary metrics with the Destroyer: average data rate in MB/s and average service time in microseconds. The former gives you an idea of the throughput of the drive during the time that it was running the Destroyer workload. This can be a very good indication of overall performance. What average data rate doesn't do a good job of is taking into account response time of very bursty (read: high queue depth) IO. By reporting average service time we heavily weigh latency for queued IOs. You'll note that this is a metric I've been reporting in our enterprise benchmarks for a while now. With the client tests maturing, the time was right for a little convergence.
I'll also report standard deviation for service times to give you some idea of IO consistency.
Average data rates already show us something very surprising. The Corsair Neutron, which definitely places below Samsung's SSD 840 Pro in our Heavy 2011 test, takes second place here. If you look at the IO consistency graphs from the previous page however, this shouldn't come as a huge shock. Without additional spare area, the 840 Pro can definitely back itself into a corner - very similar to the old m4 in fact. The M500 dramatically improves IO consistency and worst case scenario IO performance, and it shows.
The SF-2281 based Vertex 3 does extremely well, taking the crown. SandForce's real time compression/de-dupe engine has always given it wonderful performance, even when running these heavy workloads as long as there's some portion of data that's compressible. The problem with SandForce wasn't performance, it was always a reliability concern that drove us elsewhere.
The results are echoed here, and exaggerated quite significantly. The SF-2281 based Vertex 3 does very well as it's able to work as if it has more spare area thanks to the fact that some of the workload can be compressed in real time. I did fill all drives with incompressible data at first, but given that not all parts of the workload are incompressible the SandForce drive gets a bit of an advantage - similar to what would happen in the real world.
Note that the Vertex 3 and Neutron swap spots as we look at average service time. This is exactly what I was talking about earlier. Here we're looking more at how a drive handles bursty (high queue depth) workloads vs. overall performance in our suite. Both metrics are important, but this one is likely more relevant to how fast your system feels.
Although the Neutron clearly has the response time advantage, the M500 delivers a remarkably competitive consistency story. Absolute performance may not be great in its lowest performing state, but the M500 keeps things consistent. Comparing to the old m4 we see just how bad things used to be.
Random Read/Write Speed
The four corners of SSD performance are as follows: random read, random write, sequential read and sequential write speed. Random accesses are generally small in size, while sequential accesses tend to be larger and thus we have the four Iometer tests we use in all of our reviews.
Our first test writes 4KB in a completely random pattern over an 8GB space of the drive to simulate the sort of random access that you'd see on an OS drive (even this is more stressful than a normal desktop user would see). I perform three concurrent IOs and run the test for 3 minutes. The results reported are in average MB/s over the entire time. We use both standard pseudo randomly generated data for each write as well as fully random data to show you both the maximum and minimum performance offered by SandForce based drives in these tests. The average performance of SF drives will likely be somewhere in between the two values for each drive you see in the graphs. For an understanding of why this matters, read our original SandForce article.
Random read performance starts out quite nicely. There's a good improvement over the old m4 and the M500 lineup finds itself hot on the heels of the Samsung SSD 840. There's not much variance between the various capacities here.
It's with the random write performance that we get some insight into how write parallelism works on the M500. The 480GB and 960GB drives deliver roughly the same performance, so all you really need to saturate the 9187 is 32 NAND die. The 240GB sees a slight drop in performance, but the 120GB version with only 8 NAND die sees the biggest performance drop. This is exactly why we don't see a 64GB M500 at launch using 128Gbit die.
Ramping up queue depth causes some extra scaling on the 32/64 die drives, but the 240GB and 120GB parts are already at their limits. There physically aren't enough NAND die to see any tangible gains in performance between high and low queue depths here on the smaller drives. This is going to be a problem that everyone will have to deal with ultimately, the M500 just encounters it first.
Sequential Read/Write Speed
To measure sequential performance I ran a 1 minute long 128KB sequential test over the entire span of the drive at a queue depth of 1. The results reported are in average MB/s over the entire test length.
Low queue depth sequential read performance looks ok but the M500 is definitely not class leading here.
There's pretty much the same story when we look at sequential writes, although once again the 120GB M500 shows its limits very openly. The 840 and M500 have similar performance levels at the same capacity point, but the M500 is significantly behind the higher end offerings as you'd expect.
AS-SSD Incompressible Sequential Read/Write Performance
The AS-SSD sequential benchmark uses incompressible data for all of its transfers. The result is a pretty big reduction in sequential write speed on SandForce based controllers.
Ramping up queue depth we see a substantial increase in sequential read performance, but there's still a big delta between the M500 and all of the earlier drives.
The high-queue depth sequential write story is a bit better for the M500. It's tangibly quicker than the 840 here.
Performance vs. Transfer Size
ATTO is a useful tool for quickly measuring the impact of transfer size on performance. You can get the complete data set in Bench.
These charts give us a great look at the various graduations of performance as we scale up NAND die count within the M500 family. The 480/960GB drives perform identically, while the 120/240GB drives show significant steps down in max sequential read performance.
Write speed is a bit closer between all of the M500 capacities, but none approach the peak performance of Samsung's 840 Pro.
AnandTech Storage Bench 2011
Two years ago we introduced our AnandTech Storage Bench, a suite of benchmarks that took traces of real OS/application usage and played them back in a repeatable manner. I assembled the traces myself out of frustration with the majority of what we have today in terms of SSD benchmarks.
Although the AnandTech Storage Bench tests did a good job of characterizing SSD performance, they weren't stressful enough. All of the tests performed less than 10GB of reads/writes and typically involved only 4GB of writes specifically. That's not even enough exceed the spare area on most SSDs. Most canned SSD benchmarks don't even come close to writing a single gigabyte of data, but that doesn't mean that simply writing 4GB is acceptable.
Originally I kept the benchmarks short enough that they wouldn't be a burden to run (~30 minutes) but long enough that they were representative of what a power user might do with their system.
Not too long ago I tweeted that I had created what I referred to as the Mother of All SSD Benchmarks (MOASB). Rather than only writing 4GB of data to the drive, this benchmark writes 106.32GB. It's the load you'd put on a drive after nearly two weeks of constant usage. And it takes a *long* time to run.
1) The MOASB, officially called AnandTech Storage Bench 2011 - Heavy Workload, mainly focuses on the times when your I/O activity is the highest. There is a lot of downloading and application installing that happens during the course of this test. My thinking was that it's during application installs, file copies, downloading and multitasking with all of this that you can really notice performance differences between drives.
2) I tried to cover as many bases as possible with the software I incorporated into this test. There's a lot of photo editing in Photoshop, HTML editing in Dreamweaver, web browsing, game playing/level loading (Starcraft II & WoW are both a part of the test) as well as general use stuff (application installing, virus scanning). I included a large amount of email downloading, document creation and editing as well. To top it all off I even use Visual Studio 2008 to build Chromium during the test.
The test has 2,168,893 read operations and 1,783,447 write operations. The IO breakdown is as follows:
AnandTech Storage Bench 2011 - Heavy Workload IO Breakdown | ||||
IO Size | % of Total | |||
4KB | 28% | |||
16KB | 10% | |||
32KB | 10% | |||
64KB | 4% |
Only 42% of all operations are sequential, the rest range from pseudo to fully random (with most falling in the pseudo-random category). Average queue depth is 4.625 IOs, with 59% of operations taking place in an IO queue of 1.
Many of you have asked for a better way to really characterize performance. Simply looking at IOPS doesn't really say much. As a result I'm going to be presenting Storage Bench 2011 data in a slightly different way. We'll have performance represented as Average MB/s, with higher numbers being better. At the same time I'll be reporting how long the SSD was busy while running this test. These disk busy graphs will show you exactly how much time was shaved off by using a faster drive vs. a slower one during the course of this test. Finally, I will also break out performance into reads, writes and combined. The reason I do this is to help balance out the fact that this test is unusually write intensive, which can often hide the benefits of a drive with good read performance.
There's also a new light workload for 2011. This is a far more reasonable, typical every day use case benchmark. Lots of web browsing, photo editing (but with a greater focus on photo consumption), video playback as well as some application installs and gaming. This test isn't nearly as write intensive as the MOASB but it's still multiple times more write intensive than what we were running in 2010.
As always I don't believe that these two benchmarks alone are enough to characterize the performance of a drive, but hopefully along with the rest of our tests they will help provide a better idea.
The testbed for Storage Bench 2011 has changed as well. We're now using a Sandy Bridge platform with full 6Gbps support for these tests.
AnandTech Storage Bench 2011 - Heavy Workload
We'll start out by looking at average data rate throughout our new heavy workload test:
Our heavy workload from 2011 illustrates the culmination of everything we've shown thus far: the M500 can even be slower than the outgoing m4. There's no doubt in my mind that this is a result of the tradeoffs associated with moving to 128Gbit NAND die. The M500's performance is by no means bad, but it's definitely below what we've come to expect from Intel and Samsung flagships.
The next three charts just represent the same data, but in a different manner. Instead of looking at average data rate, we're looking at how long the disk was busy for during this entire test. Note that disk busy time excludes any and all idles, this is just how long the SSD was busy doing something:
AnandTech Storage Bench 2011 - Light Workload
Our new light workload actually has more write operations than read operations. The split is as follows: 372,630 reads and 459,709 writes. The relatively close read/write ratio does better mimic a typical light workload (although even lighter workloads would be far more read centric).
The I/O breakdown is similar to the heavy workload at small IOs, however you'll notice that there are far fewer large IO transfers:
AnandTech Storage Bench 2011 - Light Workload IO Breakdown | ||||
IO Size | % of Total | |||
4KB | 27% | |||
16KB | 8% | |||
32KB | 6% | |||
64KB | 5% |
The story in the light workload looks a bit better. While the M500 still pulls up the rear, the margin of victory for the 840 and other drives is much smaller.
Power Consumption
The M500 supports the new Device Sleep standard which will see platform support with Haswell this year. Crucial claims DIPM enabled idle power as low as 80mW, however even with DIPM enabled on our testbed we weren't able to get anything south of ~1W at idle. I'm digging to see if this is a M500 issue or one specific to our testbed, but Crucial is confident that in a notebook you'd see very little idle power consumption with the M500. Supporting DevSleep is important as that'll quickly become a must have feature for Haswell notebooks.
Load power looks excellent, which gives me hope that Crucial's idle power is indeed as good as they claim. The M500 is a direct competitor to Samsung's SSD 840 Pro when it comes to power consumption under load. Given how power efficient the 840 Pro is, the M500 is in good company.
Final Words
For SSDs to become more cost effective they need to implement higher density NAND, which is often at odds with performance, endurance or both. Samsung chose the endurance side of the equation, but kept performance largely intact with the vanilla 840. Given that most client workloads aren't write heavy, the tradeoff made a lot of sense. With the M500, Crucial came at the problem from the performance angle. Keep endurance the same, but sacrifice performance in order to hit the right cost target. In the long run I suspect it'll need to be a combination of both approaches, but for now that leaves us in a unique position with the M500.
The M500's performance is by no means bad, but it's definitely slower than the competition. Crucial targeted Samsung's SSD 840, but in most cases the TLC based 840 is faster than the M500. There's probably some room for improvement in the M500's firmware, but there's no escaping the fact that read, program and erase latencies are all higher as a result of the move to larger pages/blocks with the drive's 128Gbit NAND die. The benefit to all of this should be cost, but we'll have to wait and see just how competitive the smaller capacities of the M500 are on cost.
The saving grace when it comes to the M500's performance, at least compared to Samsung's offerings, is worst case IO consistency in a full drive state. If you have the luxury of keeping around 20% of your drive free, Samsung maintains its performance advantage. If, on the other hand, you plan on using almost all of your drive's capacity - the M500 does have better behavior than even the 840 Pro. It's an interesting tradeoff, but going forward I feel like we're going to have to start distinguishing between both usage models. The M500 definitely isn't the best when it comes to delivering both high performance and consistent IO, that title continues to belong to Corsair with its Link_A_Media based Neutron drive. But among the current crop of non-SandForce tier 1 SSD manufacturers, the M500 does reasonably well.
The encryption story on the M500 is potentially very interesting. Assuming the drive is indeed fully supported as a Windows 8 eDrive like Crucial claims, the M500 would be the obvious choice for anyone who had to run with BitLocker enabled. The prospect of seeing more SSDs with hardware encryption that can be leveraged by the OS is downright exciting. Honestly I wasn't aware of the eDrive spec until testing the M500, but now I want to see something similar from Apple as well.
Power consumption is another potentially good story from Crucial, assuming idle power in a notebook is truly as low as it claims. Power under load is competitive with Samsung's SSD 840 Pro, and actually even lower than the vanilla 840. Given that neither of those drives is particularly power hungry, the M500 does well there. Support for DevSleep is a nice addition. The combination of the M500's encryption support and DevSleep give us a good idea of two platform features that we should hope to see from all modern drives during this next generation.
All of this brings us to recommendation time. The easiest of the M500 drives to recommend and dismiss are the highest and lowest capacity versions, respectively. The 960GB M500 is the cheapest 1TB-class SSD I've seen to date, and it's likely the best buy if you need that much storage in a single drive. Performance still falls short of the fastest drives in this space, but if you need the capacity and plan on using all of it the M500 is really the only game in town. I've been hammering on the 960GB very hard over the past few days and while it hasn't been long enough to clear the drive as reliable, so far it's handled everything I've thrown at it very well (including our new Destroyer benchmark). I know I've personally been waiting for a good, high-capacity SSD for notebook use and based on my options today, I'd have no issues going with the 960GB M500.
On the other side of the fence, the 120GB version sacrifices a lot of performance as a result of only using a total of 8 NAND die within the drive. Unless its street price is significantly more attractive than its MSRP, I don't see a reason to choose the 120GB M500.
Recommending the two middle capacities (240/480GB) will really depend on street pricing. Based on their MSRPs, the M500 doesn't appear to be any more competitive here. I suspect that we will see closer-to-840 pricing after a few weeks of being in the channel, at which point they may be worth another look. For now, we play the waiting game.