|
|
|
Router Reliability Research |
|
|
Tomorrow's high-performance Internet routers are being designed as massive clusters and hence share a great deal with HPC computing platforms; indeed, a future router will be a kind of HPC data center specialized in the control of optical switching components that will run at data rates of 40Gb, 100Gb or higher. They will need to carry a wide range of network traffic, including traditional Internet content but also voice over IP, streaming media and real-time content. They will need to achieve telephony levels of availability: "five-nines" reliability or better, and to shape traffic dynamically so as to achieve QoS objectives, repel DDoS attacks and apportion router resources efficiently. Can these tasks be accomplished? What hardware and software architectures will be required? In collaboration with Cisco and partner universities in the USA, we are developing a fault-tolerant framework for running the BGP routing protocol on these future routers. In particular, we develop FTSS, the fault-tolerant storage system for the fault-tolerant BGP.
|
|
FTSS is a fault-tolerant storage solution that saves and replicates state, so that in the event of a failure, a state-dependent component can recover its previous configuration. In our architecture, the FTSS is used to store the state of the wrapped BGP daemon (BGPD), incoming and outgoing BGP updates, the routing information table, and a small amount of additional state associated with a reliable TCP implementation, TCP-R. FTSS runs on all nodes within the router; in our target setting, this would range from a few dozen nodes to several hundred. |
|
|
Part of our activities has been organizing workshops on the topic: |
|