Computer Science Department,
Indiana University Bloomington,
Lindley Hall, 430A, 150 S. Woodlawn Ave.,
Bloomington, IN 47405, USA
Before joining IU, I spent a couple of great years at Theory Group, IBM Almaden Research Center,
and Center for Massive Data Algorithmics, Aarhus University.
I obtained my PhD at Department of Computer Science and Engineering, HKUST.
[Home] [CV] [Publication] [Activities]
I am organizing Theory Seminar. If you wish to give a talk at our seminar, or to join our mailing list please contact me.
Algorithms for Big Data: data stream algorithms; sublinear algorithms, algorithms on distributed data; I/O-efficient algorithms; data structures; database algorithms.
Complexity: communication complexity.
- Program Committee:
PODS 2017, MASSIVE 2016, NIPS 2016 (reviewer),
Some Papers ( Full List )
Edit Distance: Sketching, Streaming and Document Exchange (preliminary full version, 30 pages)
with D. Belazzougui
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 16), to appear. New Brunswick, NJ, October, 2016.
- We achieve nearly optimal communication with almost linear time recovery for document exchange.
- We obtain the first exact sketching and streaming algorithms for edit distance with sublinear size/space under small distance thresholds.
- Our technique of aligning strings using multiple random walks may be of independent interest.
On Sketching Quadratic Forms (preliminary full version, 47 pages)
with A. Andoni, J. Chen, R. Krauthgamer, B. Qin and D. P. Woodruff
Proc. Innovations in Theoretical Computer Science (ITCS 16), pages 311-319. Cambridge, MA, USA, January 2016.
- We undertake a systematic study of sketching a quadratic form.
- Our results provide the first separation between the sketch size needed for the "for all" and
"for each" guarantees for Laplacian matrices.
Communication-Efficient Computation on Distributed Noisy Datasets [conf. talk]
Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 15), pages 313-322. Portland, Oregon, U.S.A., June 2015.
- We give a first theoretical investigation to the general question: How can we process distributed noisy datasets communication-efficiently?.
This follow-up work (SIGMOD 16) extends the study to the data stream model.
The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication
with D. Van Gucht, R. Williams and D. P. Woodruff
Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 15), pages 199-212. Melbourne, VIC, Australia, May-June 2015.
- We settle the communication complexity of a set of basic distributed set-join operations.
- Our new algorithmic techniques also lead to an improved algorithm for computing transitive closure of a directed graph in terms of running time.
Subspace Embeddings and Lp Regression Using Exponential Random Variables (preliminary full version, 27 pages) [conf. talk]
with D. P. Woodruff
Proc. Conference on Learning Theory (COLT 13), pages 546-567. Princeton, NJ, U.S.A, June 2013.
- We propose a unified algorithmic framework that improves the running time of Lp-regression for all p = [1, ∞)\2.
Tight Bounds for Distributed Functional Monitoring (preliminary full version, 50 pages) [conf. talk]
with D. P. Woodruff
Proc. ACM Symposium on Theory of Computing (STOC 12), pages 941-960. New York, NY, U.S.A., May 2012.
- We resolve several fundamental questions in the area of distributed functional monitoring (a.k.a. distributed streaming).
- Surprisingly, the total communication required to keep monitoring a function is often similar to the corresponding one-shot computation!
- A new technique called "composition" is proposed for randomized multiparty communication complexity.
This follow-up work (SODA 14) resolves the communication complexity of distinct elements in all parameters.
Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy (preliminary full version, 22 pages) [conf. talk]
with J. M. Phillips and E. Verbin
Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA 12), pages 486-501. Kyoto, Japan. January 2012.
Journal version in SIAM Journal of Computing (SICOMP), volume 45, issue 1, pages 174-196, February 2016
- A new technique called "symmetrization" is proposed for randomized multiparty communication complexity.
Optimal Sampling From Distributed Streams (preliminary full version, 25 pages) [talk]
with G. Cormode, S. Muthukrishnan and K. Yi
Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 10), pages 77-86. Indianapolis, IN, U.S.A. June 2010.
Invited to Journal of the ACM (JACM), volume 59, issue 2, pages 10:1-10:25, April 2012 [Link].
- In the distributed streaming model, random sampling is even easier than counting.
The Limits of Buffering: A Tight Lower Bound for Dynamic Membership in the External Memory Model (preliminary full version, 18 pages) [talk]
with E. Verbin
Proc. ACM Symposium on Theory of Computing (STOC 10), pages 447-456. Cambridge, MA, U.S.A., June 2010.
Journal version in SIAM Journal of Computing (SICOMP), volume 42, issue 1, pages 212-229, January 2013
- We resolve several fundamental problems in the area of external memory data structures.
- For many basic problems, buffering is impossible to achieve in the external memory model with sublogarithmic query time.
- The external memory model is clearner than the RAM model in certain perspectives.
Multi-Dimensional Online Tracking (preliminary full version, 18 pages) [conf. talk]
with K. Yi
Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA 09), pages 1098-1107. New York, NY, U.S.A., January 2009.
Journal version in ACM Transactions on Algorithms (TALG), volume 8, issue 2, pages 12:1-12:16, April 2012 [Link].
Note: In all papers above, authors are ordered alphabetically.
- A new and very natural class of online problems is studied.