Submodular Dictionary Selection for Sparse Representation Volkan Cevher [email protected] Laboratory for Information and Inference Systems - LIONS & Idiap Research Institute Joint work with Andreas Krause [email protected] Learning and Adaptive Systems Group

Sparse Representations Applications du jour: Compressive sensing, inpainting, denoising, deblurring, Sparse Representations Which dictionary should we use? This session / talk: Learning D

Which dictionary should we use? Suppose we have training data Existing Solutions Dictionary design functional space assumptions <> C2, Besov, Sobolev, ex. natural images <> smooth regions + edges induced norms/designed bases & frames ex. wavelets, curvelets, etc. Dictionary learning regularization

clustering <> identify clustering of data Bayesian <> non-parametric approaches Dictionary Selection ProblemDiSP Given candidate columns sparsity and dictionary size training data Choose D to maximize variance reduction:

reconstruction accuracy: var. reduct. for s-th data: overall var. reduction: Want to solve: Combinatorial Challenges in DiSP 1. Evaluation of Sparse reconstruction problem! 2. Finding D* NP-hard!

Combinatorial Challenges in DiSP 1. Evaluation of Sparse reconstruction problem! 2. Finding D* (NP-hard) Key observations: F(D)

<> approximately submodular submodularity <> efficient algorithms with provable guarantees (Approximate) Submodularity Set function F submodular, if D

D D D + v Large improvement + v

Small improvement Set function F approximately submodular, if A Greedy Algorithm for Submodular Opt. Greedy algorithm: Start with For i = 1:k do Choose Set

This greedy algorithm empirically performs surprisingly well! Submodularity and the Greedy Algorithm Theorem [Nemhauser et al, 78] For the greedy solution AG, it holds that Krause et al 08: For approximately submodular F: Key question: How can we show that F is approximately submodular? A Sufficient Condition: Incoherence Incoherence of columns:

Define: (modular approximation) Proposition: Thus is (sub)modular! Furthermore, is approximately submodular! An Algorithm for DiSP: SDSOMP

Algorithm SDSOMP Use Orthogonal Matching Pursuit to evaluate F for fixed D Use greedy algorithm to select columns of D Theorem: SDSOMP will produce a dictionary such that Need n and k to be much less than d SDS: Sparsifying Dictionary Selection Improved Guarantees: SDSMA Algorithm SDSMA Optimize modular approximation

instead of F Use greedy algorithm to select columns of D Theorem: SDSMA will produce a dictionary such that SDSMA <> faster + better guarantees SDSOMP

<> empirically performs better (in particular when is small) Improved Guarantees for both SDSMA & SDSOMP incoherence <> restricted singular value Theorem [Das & Kempe 11]:

SDSMA will produce a dictionary such that explains the good performance in practice Experiment: Finding a Basis in a Haystack V union of 8 orthonormal, 64-dimensional bases (Discrete Cosine Transform, Haar, Daubechies 4 & 8, Coiflets 1, 3, 5 and Discrete Meyer) Pick dictionary of size n=64 at random Generate 100 sparse (k=5) signals at random

Use SDS to pick dictionary of increasing size Evaluate fraction of correctly recovered columns variance reduction Reconstruction Performance SDSOMP <>

perfect reconstruction accuracy SDSMA <> comparable the variance reduction Battle of Bases on Natural Image Patches Seek a dictionary among existing bases discrete cosine transform (DCT), wavelets (Haar, Daub4), Coiflets, noiselets, and Gabor (frame)

SDSOMP prefers DCT+Gabor SDSMA chooses Gabor (predominantly) Optimized dictionary improves compression Sparsity on Average hard sparsity <> cardinality constraint <> group sparsity

matroid constraint Conclusions Dictionary Selection <> new problem dictionary learning + design dictionary Incoherence/ <> approximate / restricted singular value multiplicative

submodularity Two algorithms <> SDSMA and SDSOMP with guarantees Extensions to structured sparsity in ICML 2010 / J-STSP 2011 Novel connection between sparsity and submodularity Experiment: Inpainting Dictionary selection from dimensionality-reduced measurements Take Barbara with 50% pixels missing at random Partition the image into 8x8 patches

Optimize dictionary based on observed pixel values Inpaint the missing pixels via sparse reconstruction Caveat: This is not a representation problem. Results: Inpainting Comparable to state-of-the art nonlocal TV; but faster!