Reading list¶
There are two main academic papers for understanding Modula. The first is called “Scalable optimization in the modular norm”. In this paper, we construct a recursive procedure for assigning a norm to the weight space of general neural architectures. Neural networks are automatically Lipschitz and (when possible) Lipschitz smooth in this norm with respect to their weights. The construction also provides means to track input-output Lipschitz properties. The paper is available here:
Tim Large, Yang Liu, Minyoung Huh, Hyojin Bahng, Phillip Isola & Jeremy BernsteinNeurIPS 2024
The second paper builds on the first and is called “Modular duality in deep learning”. In this paper, we take the modular norm and use it to derive optimizers via a procedure called “modular dualization”. Modular dualization chooses a weight update \(\Delta w\) to minimize the linearization of the loss \(\mathcal{L}(w)\) subject to a constraint on the modular norm \(\|\Delta w\|_{M}\) of the weight update. In symbols, we solve:
where \(\eta\) sets the learning rate. Due to the structure of the modular norm, this duality procedure can be solved recursively leveraging the modular structure of the neural architecture. This procedure leads to modular optimization algorithms, where different layer types can have different optimization rules depending on which norm is assigned to that layer. The paper is available here:
Jeremy Bernstein & Laker NewhousearXiv 2024
There are many other papers by myself and other authors that I feel contain important ideas on this topic. Here are some of them: