As processors have become faster and faster, the time it takes to go off-chip to access data has become more and an issue. This is where on-chip high-bandwidth caches come in, and even why some have been adding DRAM to a chip's packaging. The problem is DRAM is significantly different from the SRAM typically used for on-chip caches, which is why MIT researchers have developed a new cache management system.
The critical difference between SRAM and DRAM concerns how the two memory technologies store data and the impact this has on locating specific data. All data is tagged with a piece of metadata identifying where it is also located in the system's main memory and these tags are run through a hash function. The purpose of this hashing is to produce very different values for actually similar pieces of information, as this will prevent bottlenecking at specific locations. The outputs of the hash function is stored in a hash table, and sometimes multiple data items are referenced by one entry, if they all share the same hash output, but checking these few items is still more efficient than going through the entire tag list. This is where the difference between SRAM and DRAM comes out though, as SRAM uses six transistors for each bit of data while DRAM uses only one. This does give DRAM an advantage in space efficiency, but SRAM has some processing capability, allowing it to search the hash table for the desired information, while the processor needs to do this for DRAM-stored data, which takes time and bandwidth.
The solution from MIT, which has been dubbed Banshee, adds three bits to each entry in the hash table, with one identifying if it can be found in the DRAM cache, and the other two giving a location relative to the other data items sharing the same hash index. As the entry in the table is already around 100 bits, this is not much overhead especially as it can increase the data rate of on-chip DRAM by 33 to 50%. Banshee also adds a tag buffer to address issues of one processor core not knowing when another has data put in the DRAM cache. The buffer is only 5 KB, so it does not take up much, and when it is full all of the cores have their virtual-memory tables updated, allowing the buffer to clear and start fresh.
Back to original news post