Publications by Samira Manabi Khan | Publications

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

Detecting and mitigating data-dependent DRAM failures by exploiting current memory content

Samira Manabi Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, Donghyuk Lee, and Onur Mutlu

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, MA, USA, October 2017

DRAM cells in close proximity can fail depending on the data content in neighboring cells. These failures are called data-dependent failures. Detecting and mitigating these failures online, while the system is running in the field, enables various optimizations that improve reliability, latency, and energy efficiency of the system. For example, a system can improve performance and energy efficiency by using a lower refresh rate for most cells and mitigate the failing cells using higher refresh rates or error correcting codes. All these system optimizations depend on accurately detecting every possible data-dependent failure that could occur with any content in DRAM. Unfortunately, detecting all data-dependent failures requires the knowledge of DRAM internals specific to each DRAM chip. As internal DRAM architecture is not exposed to the system, detecting data-dependent failures at the system-level is a major challenge. In this paper, we decouple the detection and mitigation of data-dependent failures from physical DRAM organization such that it is possible to detect failures without knowledge of DRAM internals. To this end, we propose MEMCON, a memory content-based detection and mitigation mechanism for data-dependent failures in DRAM. MEMCON does not detect every possible data-dependent failure. Instead, it detects and mitigates failures that occur only with the current content in memory while the programs are running in the system. Such a mechanism needs to detect failures whenever there is a write access that changes the content of memory. As detection of failure with a runtime testing has a high overhead, MEMCON selectively initiates a test on a write, only when the time between two consecutive writes to that page (i.e., write interval) is long enough to provide significant benefit by lowering the refresh rate during that interval. MEMCON builds upon a simple, practical mechanism that predicts the long write intervals based on our observation that the write intervals in real workloads follow a Pareto distribution: the longer a page remains idle after a write, the longer it is expected to remain idle. Our evaluation shows that compared to a system that uses an aggressive refresh rate, MEMCON reduces refresh operations by 65--74%, leading to a 10%/17%/40% (min) to 12%/22%/50% (max) performance improvement for a single-core and 10%/23%/52% (min) to 17%/29%/65% (max) performance improvement for a 4-core system using 8/16/32 Gb DRAM chips.

@inproceedings{abc,
abstract = {DRAM cells in close proximity can fail depending on the data content in neighboring cells. These failures are called data-dependent failures. Detecting and mitigating these failures online, while the system is running in the field, enables various optimizations that improve reliability, latency, and energy efficiency of the system. For example, a system can improve performance and energy efficiency by using a lower refresh rate for most cells and mitigate the failing cells using higher refresh rates or error correcting codes. All these system optimizations depend on accurately detecting every possible data-dependent failure that could occur with any content in DRAM. Unfortunately, detecting all data-dependent failures requires the knowledge of DRAM internals specific to each DRAM chip. As internal DRAM architecture is not exposed to the system, detecting data-dependent failures at the system-level is a major challenge.

In this paper, we decouple the detection and mitigation of data-dependent failures from physical DRAM organization such that it is possible to detect failures without knowledge of DRAM internals. To this end, we propose MEMCON, a memory content-based detection and mitigation mechanism for data-dependent failures in DRAM. MEMCON does not detect every possible data-dependent failure. Instead, it detects and mitigates failures that occur only with the current content in memory while the programs are running in the system. Such a mechanism needs to detect failures whenever there is a write access that changes the content of memory. As detection of failure with a runtime testing has a high overhead, MEMCON selectively initiates a test on a write, only when the time between two consecutive writes to that page (i.e., write interval) is long enough to provide significant benefit by lowering the refresh rate during that interval. MEMCON builds upon a simple, practical mechanism that predicts the long write intervals based on our observation that the write intervals in real workloads follow a Pareto distribution: the longer a page remains idle after a write, the longer it is expected to remain idle.

Our evaluation shows that compared to a system that uses an aggressive refresh rate, MEMCON reduces refresh operations by 65--74\%, leading to a 10\%/17\%/40\% (min) to 12\%/22\%/50\% (max) performance improvement for a single-core and 10\%/23\%/52\% (min) to 17\%/29\%/65\% (max) performance improvement for a 4-core system using 8/16/32 Gb DRAM chips.},
author = {Samira Manabi Khan and Chris Wilkerson and Zhe Wang and Alaa R. Alameldeen and Donghyuk Lee and Onur Mutlu},
booktitle = {Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture},
title = {Detecting and mitigating data-dependent DRAM failures by exploiting current memory content},
venue = {Cambridge, MA, USA},
year = {2017}
}

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.

Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, and Onur Mutlu

Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 2017

@inproceedings{abc,
	author = {Donghyuk Lee and Samira Manabi Khan and Lavanya Subramanian and Saugata Ghose and Rachata Ausavarungnirun and Gennady Pekhimenko and Vivek Seshadri and Onur Mutlu},
	booktitle = {Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA},
	title = {Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.},
	url = {http://doi.acm.org/10.1145/3078505.3078533},
	year = {2017}
}

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.

Hasan Hassan, Nandita Vijaykumar, Samira Manabi Khan, Saugata Ghose, Kevin K. Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, and Onur Mutlu

2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 2017

@inproceedings{abc,
	author = {Hasan Hassan and Nandita Vijaykumar and Samira Manabi Khan and Saugata Ghose and Kevin K. Chang and Gennady Pekhimenko and Donghyuk Lee and Oguz Ergin and Onur Mutlu},
	booktitle = {2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA},
	title = {SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.},
	url = {https://doi.org/10.1109/HPCA.2017.62},
	year = {2017}
}

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.

Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, and Onur Mutlu

POMACS, January 2017

@article{abc,
	author = {Donghyuk Lee and Samira Manabi Khan and Lavanya Subramanian and Saugata Ghose and Rachata Ausavarungnirun and Gennady Pekhimenko and Vivek Seshadri and Onur Mutlu},
	journal = {POMACS},
	title = {Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.},
	url = {http://doi.acm.org/10.1145/3084464},
	year = {2017}
}

2016

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation.

Kevin Hsieh, Samira Manabi Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu

34th IEEE International Conference on Computer Design, ICCD 2016, Scottsdale, AZ, USA, October 2016

@inproceedings{abc,
	author = {Kevin Hsieh and Samira Manabi Khan and Nandita Vijaykumar and Kevin K. Chang and Amirali Boroumand and Saugata Ghose and Onur Mutlu},
	booktitle = {34th IEEE International Conference on Computer Design, ICCD 2016, Scottsdale, AZ, USA},
	title = {Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation.},
	url = {http://dx.doi.org/10.1109/ICCD.2016.7753257},
	year = {2016}
}

Zorua: A holistic approach to resource virtualization in GPUs.

Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Manabi Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, and Onur Mutlu

49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 2016

@inproceedings{abc,
	author = {Nandita Vijaykumar and Kevin Hsieh and Gennady Pekhimenko and Samira Manabi Khan and Ashish Shrestha and Saugata Ghose and Adwait Jog and Phillip B. Gibbons and Onur Mutlu},
	booktitle = {49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan},
	title = {Zorua: A holistic approach to resource virtualization in GPUs.},
	url = {http://dx.doi.org/10.1109/MICRO.2016.7783718},
	year = {2016}
}

Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.

Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Manabi Khan, and Onur Mutlu

Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, Antibes Juan-Les-Pins, France, June 2016

@inproceedings{abc,
	author = {Kevin K. Chang and Abhijith Kashyap and Hasan Hassan and Saugata Ghose and Kevin Hsieh and Donghyuk Lee and Tianshi Li and Gennady Pekhimenko and Samira Manabi Khan and Onur Mutlu},
	booktitle = {Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, Antibes Juan-Les-Pins, France},
	title = {Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.},
	url = {http://doi.acm.org/10.1145/2896377.2901453},
	year = {2016}
}

PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM.

Samira Manabi Khan, Donghyuk Lee, and Onur Mutlu

46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016, Toulouse, France, June 2016

@inproceedings{abc,
	author = {Samira Manabi Khan and Donghyuk Lee and Onur Mutlu},
	booktitle = {46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016, Toulouse, France},
	title = {PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM.},
	url = {http://dx.doi.org/10.1109/DSN.2016.30},
	year = {2016}
}

Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost.

Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Manabi Khan, and Onur Mutlu

TACO, January 2016

@inproceedings{abc,
	author = {Donghyuk Lee and Saugata Ghose and Gennady Pekhimenko and Samira Manabi Khan and Onur Mutlu},
	booktitle = {TACO},
	title = {Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost.},
	url = {http://doi.acm.org/10.1145/2832911},
	year = {2016}
}

Adaptive-Latency DRAM (AL-DRAM).

Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Manabi Khan, Vivek Seshadri, Kevin Kai-Wei Chang, and Onur Mutlu

CoRR, January 2016

@article{abc,
	author = {Donghyuk Lee and Yoongu Kim and Gennady Pekhimenko and Samira Manabi Khan and Vivek Seshadri and Kevin Kai-Wei Chang and Onur Mutlu},
	journal = {CoRR},
	title = {Adaptive-Latency DRAM (AL-DRAM).},
	url = {http://arxiv.org/abs/1603.08454},
	year = {2016}
}

Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.

Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, Saugata Ghose, and Onur Mutlu

CoRR, January 2016

@article{abc,
	author = {Donghyuk Lee and Samira Manabi Khan and Lavanya Subramanian and Rachata Ausavarungnirun and Gennady Pekhimenko and Vivek Seshadri and Saugata Ghose and Onur Mutlu},
	journal = {CoRR},
	title = {Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.},
	url = {http://arxiv.org/abs/1610.09604},
	year = {2016}
}

2015

The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory.

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Manabi Khan, and Onur Mutlu

Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA, December 2015

@inproceedings{abc,
	author = {Lavanya Subramanian and Vivek Seshadri and Arnab Ghosh and Samira Manabi Khan and Onur Mutlu},
	booktitle = {Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA},
	title = {The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory.},
	url = {http://doi.acm.org/10.1145/2830772.2830803},
	year = {2015}
}

ThyNVM: enabling software-transparent crash consistency in persistent memory systems.

Jinglei Ren, Jishen Zhao, Samira Manabi Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu

Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA, December 2015

@inproceedings{abc,
	author = {Jinglei Ren and Jishen Zhao and Samira Manabi Khan and Jongmoo Choi and Yongwei Wu and Onur Mutlu},
	booktitle = {Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA},
	title = {ThyNVM: enabling software-transparent crash consistency in persistent memory systems.},
	url = {http://doi.acm.org/10.1145/2830772.2830802},
	year = {2015}
}

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems.

Moinuddin K. Qureshi, Dae-Hyun Kim, Samira Manabi Khan, Prashant J. Nair, and Onur Mutlu

45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2015, Rio de Janeiro, Brazil, June 2015

@inproceedings{abc,
	author = {Moinuddin K. Qureshi and Dae-Hyun Kim and Samira Manabi Khan and Prashant J. Nair and Onur Mutlu},
	booktitle = {45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2015, Rio de Janeiro, Brazil},
	title = {AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems.},
	url = {http://dx.doi.org/10.1109/DSN.2015.58},
	year = {2015}
}

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case.

Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Manabi Khan, Vivek Seshadri, Kevin Kai-Wei Chang, and Onur Mutlu

21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA, February 2015

@inproceedings{abc,
	author = {Donghyuk Lee and Yoongu Kim and Gennady Pekhimenko and Samira Manabi Khan and Vivek Seshadri and Kevin Kai-Wei Chang and Onur Mutlu},
	booktitle = {21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA},
	title = {Adaptive-latency DRAM: Optimizing DRAM timing for the common-case.},
	url = {http://dx.doi.org/10.1109/HPCA.2015.7056057},
	year = {2015}
}

Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.

Donghyuk Lee, Gennady Pekhimenko, Samira Manabi Khan, Saugata Ghose, and Onur Mutlu

CoRR, January 2015

@article{abc,
	author = {Donghyuk Lee and Gennady Pekhimenko and Samira Manabi Khan and Saugata Ghose and Onur Mutlu},
	journal = {CoRR},
	title = {Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.},
	url = {http://arxiv.org/abs/1506.03160},
	year = {2015}
}

2014

The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study.

Samira Manabi Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, Chris Wilkerson, and Onur Mutlu

ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, Austin, TX, June 2014

@inproceedings{abc,
	author = {Samira Manabi Khan and Donghyuk Lee and Yoongu Kim and Alaa R. Alameldeen and Chris Wilkerson and Onur Mutlu},
	booktitle = {ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS {\textquoteright}14, Austin, TX},
	title = {The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study.},
	url = {http://doi.acm.org/10.1145/2591971.2592000},
	year = {2014}
}

Improving cache performance using read-write partitioning.

Samira Manabi Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez

20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, Orlando, FL, USA, February 2014

@inproceedings{abc,
	author = {Samira Manabi Khan and Alaa R. Alameldeen and Chris Wilkerson and Onur Mutlu and Daniel A. Jim{\'e}nez},
	booktitle = {20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, Orlando, FL, USA},
	title = {Improving cache performance using read-write partitioning.},
	url = {http://dx.doi.org/10.1109/HPCA.2014.6835954},
	year = {2014}
}