Jon Crowcroft - iCore Inaugural Workshop Speaker and Panellist

Jon Crowcroft has been the Marconi Professor of Communications Systems in the Computer Laboratory since October 2001. He has worked in the area of Internet support for multimedia communications for over 30 years. Three main topics of interest have been scalable multicast routing, practical approaches to traffic management, and the design of deployable end-to-end protocols. Current active research areas are Opportunistic Communications, Social Networks, and techniques and algorithms to scale infrastructure-free mobile systems. He leans towards a "build and learn" paradigm for research.

He graduated in Physics from Trinity College, University of Cambridge in 1979, gained an MSc in Computing in 1981 and PhD in 1993, both from UCL. He is a Fellow the Royal Society, a Fellow of the ACM, a Fellow of the British Computer Society, a Fellow of the IET and the Royal Academy of Engineering and a Fellow of the IEEE.

He likes teaching, and has published a few books based on learning materials.

Talk Title: "Data Center Networks for the Application"

Abstract: Much work in Data Center Networking has been about the need for speed. Research, and development, have concentrated on raw capacity, in transmission, switching, topology management. There are occasional bursts of work addressing a particular problem (TCP Incast, Outcast, load balancing) but in general the work proceeds along the same path, which is quantitatively important, but has not changed the nature of the data center network qualitatively for some time.

In this talk, I will discuss three pieces of work that we have been carrying out in Cambridge to address three different aspects of application needs directly.

Firstly, we have devised an extremely simple scheme, Qjjump, to provided hard bounded latency in the network, which is important for applications whose processing is typically "round trip time" bottlenecked (e.g. whose next step is determined by results of an RPC), but also can be used for more interesting services. The QJump system is simpler than any of the approaches for bounding latency that we have seen in the literature to date, and relies on the observation that we know a lot about the traffic sources and traffic matrix in a data center, and can reasonably rely on the absence of mis-use of the scheme, since the data center is, to some extent, a cooperative environment, and is, in any case, managed by a single organisation who can detect and remove mis-behaving applications.

Secondly, the Qjump system allows the possibility to provide very high probability, low latency, failure detectors. This means that distributed fault-tolerant applications (for example) that require this to complete and majority consensus algorithm, can now avoid the fate of the CAP theorem. The possibility to provide consistency, availability and partition tolerance is useful in fault tolerant computing in general, but its immediate application in data center networks could be to provide a simple way for SDN to update openflow rules simultaneously across the network, in the presence of faults.

Thirdly, we go further, and move some application code into switch processing resources within the Data Center Network. For applications that generate pathological traffic patterns that could disturb the admission control system (shuffle phase of map/reduce) this can help reduce the problem considerably, but many other simple stages of applications are also possible. Part of the challenge here is to cope with the switch processor heterogeneity.