Title

Robust distributed cooperation in the presence of quantified adversity

Date of Completion

January 2003

Keywords

Computer Science

Degree

Ph.D.

Abstract

The ability to cooperatively perform a collection of tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search to distributed simulation and multi-agent collaboration. Do-All, an abstraction of such cooperative activity, is the problem of using p processors to cooperatively perform n independent and idempotent tasks in the presence of adversity. The Do-All problem can be used to identifying the trade-offs between efficiency and fault-tolerance in distributed cooperative computing. Solutions for Do-All may yield insight leading to efficient and fault-tolerant algorithms for distributed co-operation. Although significant research was dedicated to studying Do-All, prior work offers only a partial understanding of this problem. In particular, while prior work shows how to achieve fault-tolerance in the presence of adversity, it does not adequately teach how the adverse environment affects the efficiency of Do-All solutions. This thesis substantially increases this understanding. One of the contributions includes failure sensitive upper and lower bounds for Do-All in certain models of computation, that show how failures affect the efficiency of Do-All solutions. The upper/lower bounds are given as functions of n, p and f, the number of failures caused by the adverse environment. Another contribution of the thesis is the definition and analysis of the iterative Do-All problem, that models the repetitive use of Do-All algorithms, such as found in typical algorithm simulations. ^ This thesis also studies the distributed cooperation problem in partitionable networks, where partitions may interfere with the progress of the computation. Group communication services are used to develop robust algorithms for this settings. Moreover, it is shown that it is possible to obtain optimally-competitive scheduling algorithms in partitionable networks by proving upper and lower bound results. These results demonstrate precisely how partitions affect the efficiency of computation. ^ Overall, the thesis is substantially contributing to the study of the trade-offs between efficiency and fault-tolerance in cooperative computing and is advancing the state-of-the-art in principles of robust distributed computing. ^