Title

Complexity of resource discovery in network computing

Date of Completion

January 2008

Keywords

Computer Science

Degree

Ph.D.

Abstract

Distributed systems provide powerful platforms for implementing cooperative computation where large subsets of networked machines work together to accomplish a common task. For example, the networked machines may need to perform a large collection of tasks, to implement a distributed file system, or to implement replicated consistent object services. A necessary first step in any such application is to discover the available relevant resources that are distributed across the network. This problem is often referred to as the resource discovery in networks. Here the resources can be computing nodes or file servers, and the networks can include various types of extant network systems, such as LAN, WAN, Internet, and ad hoc wireless networks. ^ Once the resources are discovered, the next step is to harness the computing power of the gathered resources to carry out some large scale computational task of interest. Both the discovery and utilization problems are made more challenging due to the failures that are present in any realistic networked system. Nodes in a network routinely fail, and even if they do not they cannot always be trusted to faithfully carry out the required tasks. A very good example of such a situation is the Internet, Supercomputer available today that is comprised of massive numbers of computers around the globe. Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources, e.g.. SETI@home, PrimeNet Server. However, one of the major concerns involved in such computing environments is the reliability of the results obtained by the participants that may accidentally or maliciously return incorrect results. ^ In this dissertation we examine several related aspects of collaboration in distributed computing: (1) discovery of computing resources in networks under a variety of conditions, (2) reliable collaborative computation in the setting of an Internet Supercomputer, and (3) self-discovery of communication efficient quorums in ad hoc wireless networks. ^ We study the resource discovery problem in a few different settings. We provide algorithms in the asynchronous setting and consider a dynamic setting in the presence of faulty nodes and new nodes joining; also derive analytical results on some performance measures. We also consider the case of synchronous setting without the presence of failures but with different bandwidth limitations. We propose both deterministic and randomized algorithms in these scenarios followed by complexity bounds on time, message and communication complexities. ^ Another direction of our research is performing task in master-worker settings on Internet Supercomputers. Often in such settings the reliability of computation crucially depends on the ability of the master to depend on the computation performed by the workers. We consider a system consisting of a master process and a collection of worker processes that can execute tasks on behalf of the master and that may act maliciously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. The master is charged one work unit for each task performed by a worker. We provide several randomized algorithms that enables the master to determine the correct result with high probability and detect the faulty workers, at a low computational cost. We also provide some interesting lower bounds on the cost of computation. ^ The final topic deals with the discovery and use of communication-efficient quorum systems in ad hoc wireless networks with limited computing power and battery resources. The goal is to design quorums that have low communication cost per node per quorums access. We propose a randomized algorithm that improves the previously known bound on communication complexity. ^