Publications by Chita R. Das | Publications

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 2016

@inproceedings{abc,
	author = {Ashutosh Pattnaik and Xulong Tang and Adwait Jog and Onur Kayiran and Asit K. Mishra and Mahmut T. Kandemir and Onur Mutlu and Chita R. Das},
	booktitle = {Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel},
	title = {Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.},
	url = {http://doi.acm.org/10.1145/2967938.2967940},
	year = {2016}
}

Î¼C-States: Fine-grained GPU Datapath Power Management.

Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 2016

@inproceedings{abc,
	author = {Onur Kayiran and Adwait Jog and Ashutosh Pattnaik and Rachata Ausavarungnirun and Xulong Tang and Mahmut T. Kandemir and Gabriel H. Loh and Onur Mutlu and Chita R. Das},
	booktitle = {Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel},
	title = {{\^I}{\textonequarter}C-States: Fine-grained GPU Datapath Power Management.},
	url = {http://doi.acm.org/10.1145/2967938.2967941},
	year = {2016}
}

Exploiting Core Criticality for Enhanced GPU Performance.

Adwait Jog, Onur Kayiran, Ashutosh Pattnaik, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das

Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, Antibes Juan-Les-Pins, France, June 2016

@inproceedings{abc,
	author = {Adwait Jog and Onur Kayiran and Ashutosh Pattnaik and Mahmut T. Kandemir and Onur Mutlu and Ravishankar Iyer and Chita R. Das},
	booktitle = {Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, Antibes Juan-Les-Pins, France},
	title = {Exploiting Core Criticality for Enhanced GPU Performance.},
	url = {http://doi.acm.org/10.1145/2896377.2901468},
	year = {2016}
}

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Saugata Ghose, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita R. Das, Mahmut T. Kandemir, Todd C. Mowry, and Onur Mutlu

CoRR, January 2016

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.

@article{abc,
abstract = {Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.
This work describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency.
CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps.
We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7\% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.},
author = {Nandita Vijaykumar and Gennady Pekhimenko and Adwait Jog and Saugata Ghose and Abhishek Bhowmick and Rachata Ausavarungnirun and Chita R. Das and Mahmut T. Kandemir and Todd C. Mowry and Onur Mutlu},
journal = {CoRR},
title = {A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.},
url = {http://arxiv.org/abs/1602.01348},
year = {2016}
}

2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.

Rachata Ausavarungnirun, Saugata Ghose, Onur Kayiran, Gabriel H. Loh, Chita R. Das, Mahmut T. Kandemir, and Onur Mutlu

2015 International Conference on Parallel Architecture and Compilation, PACT 2015, San Francisco, CA, USA, October 2015

@inproceedings{abc,
	author = {Rachata Ausavarungnirun and Saugata Ghose and Onur Kayiran and Gabriel H. Loh and Chita R. Das and Mahmut T. Kandemir and Onur Mutlu},
	booktitle = {2015 International Conference on Parallel Architecture and Compilation, PACT 2015, San Francisco, CA, USA},
	title = {Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.},
	url = {http://dx.doi.org/10.1109/PACT.2015.38},
	year = {2015}
}

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita R. Das, Mahmut T. Kandemir, Todd C. Mowry, and Onur Mutlu

Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 2015

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, diUerent bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available on-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate “assist warps” that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.

@inproceedings{abc,
	abstract = {Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, diUerent bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the 
 cores. For example, when a GPU is bottlenecked by the available on-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate {\textquotedblleft}assist warps{\textquotedblright} that execute on GPU cores to perform specific tasks that can improve GPU performance and 
 efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7\% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.},
	author = {Nandita Vijaykumar and Gennady Pekhimenko and Adwait Jog and Abhishek Bhowmick and Rachata Ausavarungnirun and Chita R. Das and Mahmut T. Kandemir and Todd C. Mowry and Onur Mutlu},
	booktitle = {Proceedings of the 42nd Annual International Symposium on Computer Architecture},
	title = {A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.},
	url = {http://doi.acm.org/10.1145/2749469.2750399},
	venue = {Portland, OR, USA},
	year = {2015}
}

2014

Managing GPU Concurrency in Heterogeneous Architectures.

Onur Kayiran, Nachiappan Chidambaram Nachiappan, Adwait Jog, Rachata Ausavarungnirun, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das

47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014, Cambridge, United Kingdom, December 2014

@inproceedings{abc,
	author = {Onur Kayiran and Nachiappan Chidambaram Nachiappan and Adwait Jog and Rachata Ausavarungnirun and Mahmut T. Kandemir and Gabriel H. Loh and Onur Mutlu and Chita R. Das},
	booktitle = {47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014, Cambridge, United Kingdom},
	title = {Managing GPU Concurrency in Heterogeneous Architectures.},
	url = {http://dx.doi.org/10.1109/MICRO.2014.62},
	year = {2014}
}

2013

Orchestrated scheduling and prefetching for GPGPUs.

Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das

The 40th Annual International Symposium on Computer Architecture, ISCA'13, Tel-Aviv, Israel, June 2013

@inproceedings{abc,
	author = {Adwait Jog and Onur Kayiran and Asit K. Mishra and Mahmut T. Kandemir and Onur Mutlu and Ravishankar Iyer and Chita R. Das},
	booktitle = {The 40th Annual International Symposium on Computer Architecture, ISCA{\textquoteright}13, Tel-Aviv, Israel},
	title = {Orchestrated scheduling and prefetching for GPGPUs.},
	url = {http://doi.acm.org/10.1145/2485922.2485951},
	year = {2013}
}

A heterogeneous multiple network-on-chip design: an application-aware approach.

Asit K. Mishra, Onur Mutlu, and Chita R. Das

The 50th Annual Design Automation Conference 2013, DAC '13, Austin, TX, USA, May 2013

@inproceedings{abc,
	author = {Asit K. Mishra and Onur Mutlu and Chita R. Das},
	booktitle = {The 50th Annual Design Automation Conference 2013, DAC {\textquoteright}13, Austin, TX, USA},
	title = {A heterogeneous multiple network-on-chip design: an application-aware approach.},
	url = {http://doi.acm.org/10.1145/2463209.2488779},
	year = {2013}
}

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.

Adwait Jog, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das

Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, Houston, TX, March 2013

@inproceedings{abc,
	author = {Adwait Jog and Onur Kayiran and Nachiappan Chidambaram Nachiappan and Asit K. Mishra and Mahmut T. Kandemir and Onur Mutlu and Ravishankar Iyer and Chita R. Das},
	booktitle = {Architectural Support for Programming Languages and Operating Systems, ASPLOS {\textquoteright}13, Houston, TX},
	title = {OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.},
	url = {http://doi.acm.org/10.1145/2451116.2451158},
	year = {2013}
}

2012

Application-aware prefetch prioritization in on-chip networks.

Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Onur Mutlu, and Chita R. Das

International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, September 2012

@inproceedings{abc,
	author = {Nachiappan Chidambaram Nachiappan and Asit K. Mishra and Mahmut T. Kandemir and Anand Sivasubramaniam and Onur Mutlu and Chita R. Das},
	booktitle = {International Conference on Parallel Architectures and Compilation Techniques, PACT {\textquoteright}12, Minneapolis, MN},
	title = {Application-aware prefetch prioritization in on-chip networks.},
	url = {http://doi.acm.org/10.1145/2370816.2370886},
	year = {2012}
}

2011

Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.

Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das

IEEE Micro, January 2011

@article{abc,
	author = {Reetuparna Das and Onur Mutlu and Thomas Moscibroda and Chita R. Das},
	journal = {IEEE Micro},
	title = {A{\'e}rgia: A Network-on-Chip Exploiting Packet Latency Slack.},
	url = {http://dx.doi.org/10.1109/MM.2010.98},
	year = {2011}
}

2010

Aérgia: exploiting packet latency slack in on-chip networks.

Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das

37th International Symposium on Computer Architecture (ISCA 2010), Saint-Malo, France, June 2010

@inproceedings{abc,
	author = {Reetuparna Das and Onur Mutlu and Thomas Moscibroda and Chita R. Das},
	booktitle = {37th International Symposium on Computer Architecture (ISCA 2010)},
	title = {A{\'e}rgia: exploiting packet latency slack in on-chip networks.},
	url = {http://doi.acm.org/10.1145/1815961.1815976},
	venue = {Saint-Malo, France},
	year = {2010}
}

2009

Application-aware prioritization mechanisms for on-chip networks.

Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das

42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), New York, New York, USA, January 2009

@inproceedings{abc,
	author = {Reetuparna Das and Onur Mutlu and Thomas Moscibroda and Chita R. Das},
	booktitle = {42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009)},
	title = {Application-aware prioritization mechanisms for on-chip networks.},
	url = {http://doi.acm.org/10.1145/1669112.1669150},
	venue = {New York, New York, USA},
	year = {2009}
}