Publications by Timo%20Schneider

×

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

Proceedings of the 25th Annual Symposium on High-Performance Interconnects (HOTI'17), Santa Clara, CA, USA, August 2017
The advent of non-volatile memory (NVM) technologies has added an interesting nuance to the node level memory hierarchy. With modern 100 Gb/s networks, the NVM tier of storage can often be slower than the high performance network in the system; thus, a new challenge arises in the datacenter. Whereas prior efforts have studied the impacts of multiple sources targeting one node (i.e., incast) and have studied multiple flows causing congestion in inter-switch links, it is now possible for a single flow from a single source to overwhelm the bandwidth of a key portion of the memory hierarchy. This can subsequently spread to the switches and lead to congestion trees in a flow-controlled network or excessive packet drops without flow control. In this work we describe protocols which avoid overwhelming the receiver in the case of a source/sink rate mismatch. We design our protocols on top of Portals 4, which enables us to make use of network offload. Our protocol yields up to 4× higher throughput in a 5k node Dragonfly topology for a permutation traffic pattern in which only 1% of all nodes have a memory write-bandwidth limitation of 1/8th of the network bandwidth.
@inproceedings{abc,
	abstract = {The advent of non-volatile memory (NVM) technologies has added an interesting nuance to the node level memory hierarchy. With modern 100 Gb/s networks, the NVM tier of storage can often be slower than the high performance network in the system; thus, a new challenge arises in the datacenter. Whereas prior efforts have studied the impacts of multiple sources targeting one node (i.e., incast) and have studied multiple flows causing congestion in inter-switch links, it is now possible for a single flow from a single source to overwhelm the bandwidth of a key portion of the memory hierarchy. This can subsequently spread to the switches and lead to congestion trees in a flow-controlled network or excessive packet drops without flow control. In this work we describe protocols which avoid overwhelming the receiver in the case of a source/sink rate mismatch. We design our protocols on top of Portals 4, which enables us to make use of network offload. Our protocol yields up to 4{\texttimes} higher throughput in a 5k node Dragonfly topology for a permutation traffic pattern in which only 1\% of all nodes have a memory write-bandwidth limitation of 1/8th of the network bandwidth.},
	author = {Timo Schneider and James Dinan and Mario Flajslik and Keith D. Underwood and Torsten Hoefler},
	booktitle = {Proceedings of the 25th Annual Symposium on High-Performance Interconnects (HOTI{\textquoteright}17)},
	title = {Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches},
	venue = {Santa Clara, CA, USA},
	year = {2017}
}
PVLDB, January 2017
@inproceedings{abc,
	author = {Claude Barthels and Gustavo Alonso and Torsten Hoefler and Timo Schneider and Ingo M{\"u}ller},
	booktitle = {PVLDB},
	title = {Distributed Join Algorithms on Thousands of Cores.},
	url = {http://www.vldb.org/pvldb/vol10/p517-barthels.pdf},
	year = {2017}
}

2016

Proceedings of the IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI), Santa Clara, CA, USA, December 2016
Lossless networks, such as InfiniBand use flow-control to avoid packet-loss due to congestion. This introduces dependencies between input and output channels, in case of cyclic dependencies the network can deadlock. Deadlocks can be resolved by splitting a physical channel into multiple virtual channels with independent buffers and credit systems. Currently available routing engines for InfiniBand assign entire paths from source to destination nodes to different virtual channels. However, InfiniBand allows changing the virtual channel at every switch. We developed fast routing engines which make use of that fact and map individual hops to virtual channels. Our algorithm imposes a total order on virtual channels and increments the virtual channel at every hop, thus the diameter of the network is an upper bound for the required number of virtual channels. We integrated this algorithm into the InfiniBand software stack. Our algorithms provide deadlock free routing on state-of-the-art low-diameter topologies, using fewer virtual channels than currently available practical approaches, while being faster by a factor of four on large networks. Since low-diameter topologies are common among the largest supercomputers in the world, to provide deadlock-free routing for such systems is very important.
@inproceedings{abc,
	abstract = {Lossless networks, such as InfiniBand use flow-control to avoid packet-loss due to congestion. This introduces dependencies between input and output channels, in case of cyclic dependencies the network can deadlock. Deadlocks can be resolved by splitting a physical channel into multiple virtual channels with independent buffers and credit systems. Currently available routing engines for InfiniBand assign entire paths from source to destination nodes to different virtual channels. However, InfiniBand allows changing the virtual channel at every switch. We developed fast routing engines which make use of that fact and map individual hops to virtual channels. Our algorithm imposes a total order on virtual channels and increments the virtual channel at every hop, thus the diameter of the network is an upper bound for the required number of virtual channels. We integrated this algorithm into the InfiniBand software stack. Our algorithms provide deadlock free routing on state-of-the-art low-diameter topologies, using fewer virtual channels than currently available practical approaches, while being faster by a factor of four on large networks. Since low-diameter topologies are common among the largest supercomputers in the world, to provide deadlock-free routing for such systems is very important.},
	author = {Timo Schneider and Otto Bibartiu and Torsten Hoefler},
	booktitle = {Proceedings of the IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)},
	title = {Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks},
	venue = {Santa Clara, CA, USA},
	year = {2016}
}