License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2016.44
URN: urn:nbn:de:0030-drops-63246
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6324/
Blasiok, Jaroslaw ;
Nelson, Jelani
An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm
Abstract
In dictionary learning we observe Y = AX + E for some Y in R^{n*p}, A in R^{m*n}, and X in R^{m*p}, where p >= max{n, m}, and typically m >=n. The matrix Y is observed, and A, X, E are unknown. Here E is a "noise" matrix of small norm, and X is column-wise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e. columns of Y , the goal is to learn the dictionary A up to small error, as well as the coefficient matrix X. In applications one could for example think of each column of Y as a distinct image in a database. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g. images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications.
Recently, the work of [Spielman/Wang/Wright, COLT'12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E = 0 and m = n. That work showed that if X has independent entries with an expected Theta n non-zeroes per column for 1/n <~ Theta <~ 1/sqrt(n), and with non-zero entries being subgaussian, then for p >~ n^2 log^2 n with high probability ER-SpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp. rows) of A (resp. X). They conjectured that p >~ n log n suffices, which they showed was information theoretically necessary for any algorithm to succeed when Theta =~ 1/n. Significant progress toward showing that p >~ n log^4 n might suffice was later obtained in [Luh/Vu, FOCS'15].
In this work, we show that for a slight variant of ER-SpUD, p >~ n log(n/delta) samples suffice for successful recovery with probability 1 - delta. We also show that without our slight variation made to ER-SpUD, p >~ n^{1.99} samples are required even to learn A, X with a small success probability of 1/ poly(n). This resolves the main conjecture of [Spielman/Wang/Wright, COLT'12], and contradicts a result of [Luh/Vu, FOCS'15], which claimed that p >~ n log^4 n guarantees high probability of success for the original ER-SpUD algorithm.
BibTeX - Entry
@InProceedings{blasiok_et_al:LIPIcs:2016:6324,
author = {Jaroslaw Blasiok and Jelani Nelson},
title = {{An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm}},
booktitle = {43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
pages = {44:1--44:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-013-2},
ISSN = {1868-8969},
year = {2016},
volume = {55},
editor = {Ioannis Chatzigiannakis and Michael Mitzenmacher and Yuval Rabani and Davide Sangiorgi},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6324},
URN = {urn:nbn:de:0030-drops-63246},
doi = {10.4230/LIPIcs.ICALP.2016.44},
annote = {Keywords: dictionary learning, stochastic processes, generic chaining}
}
Keywords: |
|
dictionary learning, stochastic processes, generic chaining |
Collection: |
|
43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016) |
Issue Date: |
|
2016 |
Date of publication: |
|
23.08.2016 |