Algorithms for Big Data
Teaching assistance, Bachelor's course, USTC, 2025
This course introduces theoretical topics including random sampling, dimensionality reduction, data compression, distributed computing, streaming algorithms, clustering, classification, and stochastic optimization. It also covers the underlying theory and mathematical tools such as probabilistic methods, VC dimension, communication complexity, and learning theory in machine learning (statistical learning theory).
Syllabus
Probability Review and Concentration Bounds
- Dimension Reduction
- Johonson-Linenstrauss Lemma
- Nearest Neighbor Search
- Locality Sensitive Hashing
- Streaming and Sketching Algorithms
- Probabilistic Counting, Reservoir Sampling
- Estimating the Number of Distinct Elements
- Quantiles
- Heavy Hitters: Misra-Gries Algorithm, Count-Min Sketch, Count Sketch
- Clustering
- k-means/median/center
- Coreset for Clustering
- Hierarchical Clustering
- Graph-Structured Data
- Basics of Spectral Graph Theory
- Random Walks and Markov Chains
- Sublinear-Time Algorithms for Graphs
- Graph Sparsification
