Jon Crowcroft - iCore Inaugural Workshop Speaker and Panellist
Jon Crowcroft has been the Marconi Professor of Communications Systems
in the Computer Laboratory since October 2001. He has worked in the
area of Internet support for multimedia communications for over 30
years. Three main topics of interest have been scalable multicast
routing, practical approaches to traffic management, and the design of
deployable end-to-end protocols. Current active research areas are
Opportunistic Communications, Social Networks, and techniques and
algorithms to scale infrastructure-free mobile systems. He leans
towards a "build and learn" paradigm for research.
He graduated in Physics from Trinity College, University of Cambridge
in 1979, gained an MSc in Computing in 1981 and PhD in 1993, both from
UCL. He is a Fellow the Royal Society, a Fellow of the ACM, a Fellow of
the British Computer Society, a Fellow of the IET and the
Royal Academy of Engineering and a Fellow of the IEEE.
He likes teaching, and has published a few books based on learning materials.
Talk Title: "Data Center Networks for the Application"
Abstract: Much work in Data Center Networking has been about the need for speed.
Research, and development, have concentrated on raw capacity, in transmission,
switching, topology management. There are occasional bursts of work addressing
a particular problem (TCP Incast, Outcast, load balancing) but in general the
work proceeds along the same path, which is quantitatively important, but has
not changed the nature of the data center network qualitatively for some time.
In this talk, I will discuss three pieces of work that we have been carrying
out in Cambridge to address three different aspects of application needs directly.
Firstly, we have devised an extremely simple scheme, Qjjump, to provided hard bounded latency in the network,
which is important for applications whose processing is typically "round trip time" bottlenecked (e.g. whose next
step is determined by results of an RPC), but also can be used for more interesting services. The QJump system is
simpler than any of the approaches for bounding latency that we have seen in the literature to date, and relies
on the observation that we know a lot about the traffic sources and traffic matrix in a data center, and can
reasonably rely on the absence of mis-use of the scheme, since the data center is, to some extent, a cooperative
environment, and is, in any case, managed by a single organisation who can detect and remove mis-behaving
applications.
Secondly, the Qjump system allows the possibility to provide very high probability, low latency, failure
detectors. This means that distributed fault-tolerant applications (for example) that require this to complete
and majority consensus algorithm, can now avoid the fate of the CAP theorem.
The possibility to provide consistency, availability and partition tolerance is useful in fault tolerant
computing in general, but its immediate application in data center networks could be to provide a simple way for
SDN to update openflow rules simultaneously across the network, in the presence of faults.
Thirdly, we go further, and move some application code into switch processing resources within the Data Center
Network. For applications that generate pathological traffic patterns that could disturb the admission control
system (shuffle phase of map/reduce) this can help reduce the problem considerably, but many other simple stages
of applications are also possible. Part of the challenge here is to cope with the switch processor heterogeneity.