Research Papers

Gateways To Joy

›

Computer Science

›

Research Papers

Gateways To Joy

Computer Science

Research Papers

18 Dec 2010

Massive Datasets and Data Streams

RadixZip: Linear Time Compression of Token Streams

by B D Vo and G S Manku

VLDB 2007 (33rd International Conference on Very Large Data Bases), p 1162-1172, Sep 2007

Detecting Near-Duplicates for Web Crawling

by G S Manku, A Jain and A D Sarma

WWW 2007 (16th International World Wide Web Conference), p 141-149, May 2007

Approximate Counts and Quantiles over Sliding Windows

by A Arasu and G S Manku

PODS 2004 (23rd ACM Symposium on Principles of Database Systems), p 286-296, June 2004

Query Processing, Resource Management and Approximation in a Data Stream Management System

by R Motwani, J Widom, A Arasu, B Babcock, S Babu, M Datar, G S Manku, C Olston, J Rosenstein and R Varma

CIDR 2003 (1st Biennial Conference On Innovative Data Systems Research), p 245-254, Jan 2003

Approximate Frequency Counts over Data Streams

by G S Manku and R Motwani

VLDB 2002 (28th VLDB), p 346-357, August 2002

Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets

by G S Manku, S Rajagopalan and B G Lindsay

SIGMOD 1999 (1999 ACM SIGMOD), p 251-62, June 1999

Approximate Medians and other Quantiles in One Pass and with Limited Memory

by G S Manku, S Rajagopalan and B G Lindsay

SIGMOD 1998 (1998 ACM SIGMOD), p 426-35, June 1998

Peer to Peer Systems -- Distributed Hash Tables

Brief Announcement: Papillon: Greedy Routing in Rings

by I Abraham, D Malkhi and G S Manku

DISC 2005 (19th International Symposium on Distributed Computing), p 514-515, September 2005

Decentralized Algorithms using Both Local and Random Probes for P2P Load Balancing

by K Kenthapadi and G S Manku

SPAA 2005 (17th ACM Symposium on Parallelism in Algorithms and Architectures), p 135-144, July 2005

Abstract: We study randomized algorithms for placing a sequence of n nodes on a circle with unit perimeter. Nodes divide the circle into disjoint arcs. We desire that a newly-arrived node (which is oblivious of its index in the sequence) choose its position on the circle by learning the positions of as few existing nodes as possible. At the same time, we desire that that the variation in arc-lengths be small. To this end, we propose a new algorithm that works as follows: The k-th node chooses r random points on the circle, inspects the sizes of v arcs in the vicinity of each random point, and places itself at the mid-point of the largest arc encountered. We show that for any combination of r and v satisfying rv ≥ c log k, where c is a small constant, the ratio of the largest to the smallest arc-length is at most eight w.h.p., for an arbitrarily long sequence of n nodes. This strategy of node placement underlies a novel decentralized load-balancing algorithm that we propose for Distributed Hash Tables (DHTs) in peer-to-peer environments.

Underlying the analysis of our algorithm is Structured Coupon Collection over n/b disjoint cliques with b nodes per clique, for any n, b ≥ 1. Nodes are initially uncovered. At each step, we choose d nodes independently and uniformly at random. If all the nodes in the corresponding cliques are covered, we do nothing. Otherwise, from among the chosen cliques with at least one uncovered node, we select one at random and cover an uncovered node within that clique. We show that as long as bd ≥ c log n, O(n) steps are sufficient to cover all nodes w.h.p. and each of the first Ω(n) steps succeeds in covering a node w.h.p. These results are then utilized to analyze a stochastic process for growing binary trees that are highly balanced -- the leaves of the tree belong to at most four different levels with high probability.

Balanced Binary Trees for ID Management and Load Balance in Distributed Hash Tables

by G S Manku

PODC 2004 (23rd ACM Symposium on Principles of Distributed Computing), p 197-205, July 2004

Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks

by G S Manku, M Naor and U Wieder

STOC 2004 (36th ACM Symposium on Theory of Computing), p 54-63, June 2004

Optimal Routing in Chord

by P Ganesan and G S Manku

SODA 2004 (15th Annual ACM-SIAM Symposium on Discrete Algorithms), p 169-178, January 2004

Routing Networks for Distributed Hash Tables

by G S Manku

PODC 2003 (22nd ACM Symposium on Principles of Distributed Computing), p 133-142, June 2003

Symphony: Distributed Hashing in a Small World

by G S Manku, M Bawa and P Raghavan

USITS 2003 (4th USENIX Symposium on Internet Technologies and Systems), p 127-140, March 2003

SETS: Search Enhanced by Topic Segmentation

by M Bawa, G S Manku, and P Raghavan

SIGIR 2003 (26th International ACM SIGIR 2003), p 306-313, July 2003

Miscellaneous Subjects

A Loop-free Gray Code for Minimal Signed-Binary Representations

by G S Manku and J Sawada

ESA 2005 (13th Annual European Symposium on Algorithms), p 438-447, Oct 2005

Abstract: A string

...a

₂a₁a₀ over an alphabet {-1, 0, 1} is said to be a minimal signed-binary representation of an integer n if

n    =   ∑  a

_k2^k for

k  ≥ 0

and the number of non-zero digits is minimal. We present a loopless (and hence a Gray code) algorithm for generating all minimal signed-binary representations of a given integer n.

Self-Similarity in File-System Traffic

by S D Gribble, G S Manku, D Roselli, E A Brewer, T J Gibson and E L Miller

SIGMETRICS 1998 (ACM SIGMETRICS 1998), p 141-150, June 1998

Structural Symmetry and Model Checking

by G S Manku, R Hojati and R K Brayton

CAV 1998 (10th International Conference on Computer-Aided Verification), p p 159-171, July 1998

Object Tracking using Affine Structure for Point Correspondences

by G S Manku, P Jain, A Aggarwal, L Kumar and S Banerjee

CVPR 1997 (IEEE Conf. for Computer Vision and Pattern Recognition), p 704-709, June 1997

A New Voting Based Hardware Data Prefetch Scheme

by G S Manku, M R Prasad and D A Patterson

HiPC 1997 (Fourth International Conference on High Performance Computing), p 100-105, December 1997

A Linear Time Algorithm for the Bottleneck Biconnected Spanning Subgraph Problem

by G S Manku

IPL 1996 (Information Processing Letters), p 1-7, July 1996

Circuit Partitioning with Partial Order for Mixed Simulation Emulation Environment

by G S Manku, A Kumar and S Kumar

RSP 1995 (Sixth Intl. Conf. on Rapid System Prototyping), p 201-207, June 1995

Patents

System and Method for Searching Peer-to-Peer Computer Networks by Selecting a Computer Based on At Least a Number of Files Shared by the Computer

by W J Labio, G T Nguyen, W W Liu, G S Manku (assignee: Napster Inc.)

US Patent #07089301 (Issued: Aug 8, 2006), p 1-14, August 2006

Single Pass Space Efficient System and Method for Generating an Approximate Quantile in a Data Set Having an Unknown Size

by B G Lindsay, G S Manku, S Rajagopalan (assignee: IBM Corp.)

US Patent #06343288 (Issued: Jan 29, 2002), p 1-20, January 2002

Single Pass Space Efficient System and Method for Generating Approximate Quantiles Satisfying an Apriori User-Defined Approximation Error

by B G Lindsay, G S Manku, S Rajagopalan (assignee: IBM Corp.)

US Patent #06108658 (Issued: Aug 22, 2000), p 1-17, August 2000

Theses

Dipsea: A Modular Distributed Hash Table

by G S Manku

Ph. D. Thesis (Stanford University), p 1-198, August 2004

Structural Symmetries and Model Checking

by G S Manku

M. S. Thesis (U C Berkeley, Tech Report UCB/ERL M97/92), p 1-76, December 1997

Object Tracking using Affine Multiple Views Geomtry

by G S Manku and H Nautiyal

B. Tech. Thesis (IIT Delhi (won the Best B.Tech. Project Award)), p 1-56, May 1995

gurmeet@gmail.com