My Favorite Papers at NeurIPS 2021

Dec 10, 2021 · Timo Denk

In December 2021 I had to pleasure to attended NeurIPS (the thirty-fifth Conference on Neural Information Processing Systems). For the second time the conference was held remotely, not in Sydney as originally planned. A total of 2334 papers were accepted to the conference, the full list of which can be found here. I worked my way through the papers and read a number of them. My favorites are in this blog post. The choice is very subjective, largely depending on my research interests and prior knowledge. Others might still benefit from the filtering I have done, so here is the compilation:

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers (outstanding paper award) by Krishna Pillutla et al.
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals by Lang Liu et al.
The two papers (from the same lab) build on top of divergence frontiers: Precision-Recall Curves Using Information Divergence Frontiers by Djolonga et al. (2020). Practically, it allows to quantify how well a generative model (e.g., a GAN or a language model) matches the distribution of a dataset (e.g., the one it was trained on). MAUVE specifically deals with NLP while the second paper is concerned with theoretical analyses. I find the work phenomenal, in how it starts by outlining what one would like to achieve: determining $\operatorname{KL}\left(P\lVert Q\right)$ where $P$ is a generative model’s distribution and $Q$ a dataset’s generative distribution. They then slowly make it computationally feasible using quantization and an auxiliary language model.

Alias-Free Generative Adversarial Networks (oral presentation) by Tero Karras et al.
The NVIDIA team points out a problem with state-of-the-art GANs: texture sticking, that is details in generated images appear to be glued to image coordinates. They then explain from a signal processing perspective how the current GAN architectures, specifically nonlinearities, upsampling and downsampling, facilitate the issue. The authors use the analysis to build an improved GAN architecture. I like how principled the paper is and for me there was a lot of signal processing knowledge to be taken away.

Intriguing Properties of Contrastive Losses by Ting Chen et al.
Contrastive losses are widely used and exhibit interesting dynamics. Among other contributions, the Google researchers look into which features get omitted by contrastive learning, if multiple redundant ones are present. This phenomenon is called feature suppression. For example, an image embedding model may embed mainly the textures of an image or the dominant colors, rather than the type of animal present in it. The authors study this behavior empirically by artificially creating datasets which contain multiple kinds of features and measuring which of them can still be found in the learned embeddings. This research might help choosing better augmentation methods (also for audio).

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems by Menoua Keshishian et al.
Speech-processing systems learn invariance with respect to different speaking tempos. In the LibriSpeech corpus, for example, utterances of the word “before” vary between 200 and 800 ms in duration. The authors from Columbia University analyze a model’s behavior for different durations using the temporal context invariance paradigm which they “aim to introduce to the machine learning community”. I find the concept interesting and relevant, as there is no known theoretical way of determining a model’s effective receptive field. This method does it empirically.

Unsupervised Speech Recognition (oral presentation) by Alexei Baevski et al.
The Facebook paper deals with unsupervised speech recognition. Frankly, I was astonished that speech recognition could be approached in an unsupervised manner at all. There are a lot of tweaks needed to make it work, but it is still impressive and seems to be the first work of its kind to achieve reasonably good speech recognition results.

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning (oral presentation) by Yangsibo Huang et al.
I looked into this paper from Princeton University out of curiosity about federated learning attacks. The idea here is to reconstruct training examples (which are supposed to be 100% private here) based on gradients. The paper discusses some possible attacks, points out assumptions they rely on, and suggests defenses. In general, a large batch size renders the attacks impractical.

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research (best paper award in the datasets & benchmarks track) by Bernard Koch et al.
This paper from UCLA and Google researchers is a bit of an unusual one. On the meta-level it analyzes how datasets are used in ML research. It looks into questions along the lines of “How many different datasets are used for evaluation in image segmentation now and five years ago?”, “From how many different institutions are they?”, “How does that compare to NLP?”, etc. It is the first study I have seen of this kind. The data source is the well-structured website Papers with Code.

Robust Auction Design in the Auto-bidding World by Santiago Balseiro et al.
This Google paper is my first contact with auction design / bidding; I read it mainly out of curiosity about this field, stemming from my friends’ side hustle of running an online shop with high quality gentlemen’s gear (cardholders, weekenders, etc.) for which they run online ads. The evaluation in the paper is based on data from the search engine advertisement bidding.

UniDoc: Unified Pretraining Framework for Document Understanding by Jiuxiang Gu et al.
Since I previously worked on document intelligence I found this Adobe paper interesting. A document’s visual and textual features are embedded and then processed by an attention-based model.

My previous “favorite papers” posts are available for COLING'2020 and NeurIPS 2019. The Sydney photo at the top of this post is from Jørn Utzon.