We organize a biweekly seminar on machine learning, every second Tuesday at 3pm (GMT+2). We discuss papers on ML, often (but not always) with connections to Earth science, climate and weather and materials science.
The seminar also allows members of the Hamburg machine learning community to connect and present their ongoing work. We meet in person at HZG, but we also welcome remote online participants and stream the meeting live on our YouTube channel.
To get updates about each meeting or suggest a topic, please join our mailing list.
“Smooth Criminals” 15. 22.09.20
This week we discuss “Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation” by Chu et al. 2020. pdf 
They address the issue of GANs for video editing tasks usually would induce flickering or other artifacts. Thanks to the spatio-temporal discriminator together with their “Ping-Pong loss” they outperform many previous approaches.
The results are presented on the tasks of unpaired video translation, as well as video super resolution. In our seminar we will discuss how these techniques can be used in GANs for Earth-Science tasks.
 Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. (2020). Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics (TOG), 39(4), 75-1.
14. “When are we?” 08.09.20
We discuss “Viewing Forced Climate Patterns Through an AI Lens,” Barnes et al. 2019 pdf
This paper takes up the task of finding features of meteorological fields (in this case temperature and precipitation) that be used to identify climate forcing (such as anthropogenic carbon emissions, or natural forcing due to volcanoes etc.).
The paper uses extremely small and simple feedforward neural networks, but with a clever trick – it trains these networks to predict the year of a climate simulation with simulated anthropogenic forcing from the meteorological fields. Remarkably, when trained in the right way these same networks then perform well at identifying the year from these same fields in historical observational datasets! The simple neural networks are then analyzed to determine which features they have learned.
13. “Model Data for the Data Models” 25.08.20
We discuss “Purely data-driven medium-range weather forecasting achieves comparable skill to physical models at similar resolution,” Rasp et al. 2020. This recent work builds machine learning tools for weather prediction to compete with physics based approaches. Stephan Rasp will return to the seminar to discuss his recent work on predicting temperature, geopotential and precipitation up to 5 days in the future, published just today on arXiv!
The Weatherbench benchmark for evaluating these techniques was proposed early this year, also by Rasp et al. A key contributor to the performance increases in the current work was expanding the training available data beyond the weather that has actually occurred in the recorded past, by including other hypothetical situations realized through global climate modeling.
We discuss many conceptual and practical questions that arise when attempting to predict weather in this way, and consider what the future might hold for data-driven weather prediction.
12. “The Big Picture” 11.08.20
We discuss “Adversarial Super-resolution of Climatological Wind and Solar Data” , a recent study using Generative Adversarial Networks (GANs) with convolutional layers to increase the resolution of wind and irradiance fields output by climate models. This study uses high-resolution data to train a neural network to generate high-res from low-res data.
This week’s topic is related to several previous themes: we covered GANs in episode 5 (“Real Fake Clouds”), and addressed a related problem of filling in missing data using convolutional networks in episode 8 (“Uncharted History”).
We’ll discuss the approach taken in this paper, describe the Machine Learning tool SRGAN which it uses , and debate the conceptual issues that arise when using ML to “invent” new pixel outputs for your model. We’ll also mention how GAN-based superresolution can introduce bias into results , and what this could mean for climate and earth science applications.
Main paper:  Stengel et al., “Adversarial super-resolution of climatological wind and solar data,” PNAS July, 2020. https://www.pnas.org/content/117/29/16805
Technical Background:  Ledi et al., “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, arXiv 2016. https://arxiv.org/abs/1609.04802
Blog post on bias in GANs for SR:  https://www.theverge.com/21298762/face-depixelizer-ai-machine-learning-tool-pulse-stylegan-obama-bias
11. “Teach Yourself Physics in 2 Million Easy Steps” 28.07.20
In this episode we discuss “Learning to Simulate Complex Physics with Graph Networks” , a recent study using deep learning to emulate physical dynamical systems.
We have already encountered several approaches to predicting physical systems in previous seminars (episodes 3, 4, 7 & 9). Typically, a machine learning model is trained on data generated with a numerical solver to predict a (partial) system state many time steps ahead, using the current (partial) system state as input. The network solves the task in a way that bears little, if any resemblance to the numerical solver used to generate its training data.
Here, the authors follow another approach , where the machine learning model is trained explicitly to reproduce or “emulate” the behavior of the numerical solver over individual numerical integration steps. We will discuss how learning to carry out single-time-step updates offers certain advantages over learning to predict the future directly from the present as the learning problem is very well posed conceptually and mathematically. However, a critical concern is whether the solutions of this ‘learned simulation’ stay realistic over many integration steps when using the emulator network instead of the numerical solver.
Another twist for the upcoming session will be that the dynamical systems studied are multi-body systems, i.e. they consist of a fixed number of discrete interacting objects. To learn state predictions on this kind of data, the authors designed their own Interaction Network , which is a subclass of so-called Graph Neural Networks. GNNs  have become a very active and vast topic of research over the past years that we can only briefly touch upon in our session.
Main paper:  A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia, “Learning to Simulate Complex Physics with Graph Networks,” arXiv:2002.09405 [physics, stat], Feb. 2020, Accessed: Jun. 27, 2020. http://arxiv.org/abs/2002.09405.
Additional Background:  R. Grzeszczuk, D. Terzopoulos, and G. E. Hinton, “Fast Neural Network Emulation of Dynamical Systems for Computer Animation,” in Advances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn, Eds. MIT Press, 1999, pp. 882–888. https://papers.nips.cc/paper/1562-fast-neural-network-emulation-of-dynamical-systems-for-computer-animation
 P. W. Battaglia, R. Pascanu, M. Lai, D. Rezende, and K. Kavukcuoglu, “Interaction Networks for Learning about Objects, Relations and Physics,” arXiv:1612.00222 [cs], Dec. 2016, Accessed: Jul. 16, 2020. [Online]. Available: http://arxiv.org/abs/1612.00222.
Further reading:  J. Zhou et al., “Graph Neural Networks: A Review of Methods and Applications,” arXiv:1812.08434 [cs, stat], Jul. 2019, Accessed: Jul. 16, 2020. [Online]. Available: http://arxiv.org/abs/1812.08434.
10. “Try to look like a little black cloud” 14.07.20
In light of recent meteorological events in Hamburg, the next ML@HZG seminar will focus on clouds.
We will begin with a well-written 3-page review that discusses how cloud resolving models (CRMs) can play an important role in our understanding of our climate and its potential changes in the future, but impose immense computational demands.
We’ll then discuss how small-scale CRMs can be used as cloud parameterizations for large-scale climate models, focusing on the Superparameterized Community Atmosphere Model (SPCAM) . This approach aims to capture the two-way interactions between cloud physics and coarser-scale meteorological variables without paying the cost of a huge CRM simulation, but instead embedding a small CRM into each grid cell. Further work showed how the embedded CRMs can be simplified without compromising accuracy.
Finally, we’ll discuss how machine learning can be used to imitate the effect of the miniature CRMs used in SPCAM, which in turn aims to imitate what a large-scale CRM might look like. Recent work has shown the neural networks can be trained reproduce the feedback between coarse-scale climate model variables and each grid cell’s CRM, with a considerable reduction of computational.
As we’ll discuss, often the true test of these techniques is their ability to match observed phenomena in large, long simulations!
Postscript: as discussed in the seminar, it’s not totally clear why the MJO moves east, but there are some interesting theories as to why (thanks to Eduardo Zorita for the reference).
 T. Schneider et al., “Climate goals and computing the future of clouds,” Nature Clim Change, vol. 7, no. 1, pp. 3–5, Jan. 2017, doi: 10.1038/nclimate3190.
 M. Khairoutdinov, D. Randall, and C. DeMott, “Simulations of the Atmospheric General Circulation Using a Cloud-Resolving Model as a Superparameterization of Physical Processes,” J. Atmos. Sci., vol. 62, no. 7, pp. 2136–2154, Jul. 2005, doi: 10.1175/JAS3453.1.
 M. S. Pritchard, C. S. Bretherton, and C. A. DeMott, “Restricting 32–128 km horizontal scales hardly affects the MJO in the Superparameterized Community Atmosphere Model v.3.0 but the number of cloud-resolving grid columns constrains vertical mixing,” Journal of Advances in Modeling Earth Systems, vol. 6, no. 3, pp. 723–739, 2014, doi: 10.1002/2014MS000340.
 S. Rasp, M. S. Pritchard, and P. Gentine, “Deep learning to represent subgrid processes in climate models,” PNAS, vol. 115, no. 39, pp. 9684–9689, Sep. 2018, doi: 10.1073/pnas.1810286115.
 B. Wang, F. Liu, and G. Chen, “A trio-interaction theory for Madden–Julian oscillation,” Geosci. Lett., vol. 3, no. 1, p. 34, Dec. 2016, doi: 10.1186/s40562-016-0066-z.
9. “The Best of All Possible Worlds” 30.06.20
We consider the critically important and monstrously difficult problem of tuning climate model parameters to match observations (reviewed in Hourdin et al., 2017).
This process is quite challenging, because:
- Testing new parameter combinations through simulation incurs an immense computational cost.
- The aspects of the data we wish to match (warming trends, long-term means and variances) require long, global simulations.
We discuss several approaches to this problem:
- Gradient-based Optimization attempts to adjust model parameters by following the gradients, or derivatives of climate model outputs with respect to parameters. A major challenge for this approach is that we usually lack the ability to calculate or even approximate these derivatives. Tett et al., 2017 get around this problem by using finite differencing, where derivatives are approximated using small perturbations to the parameters.
- History matching is a technique where nonlinear regression is used to learn an “emulator” or “metamodel” that maps directly between multiple tunable model parameters and real-world observables we’d like the original model to reproduce. Having estimated this parameter-observable relationship using a finite number of simulations, we can then identify all regions of parameter space for which the predicted model output is close to observations. Williamson et al., 2013 and Bellprat et al. 2012 use polynomial functions to build emulators for global and regional climate models resepctively. We also consider the more recent Li et al., 2019, which replaces the polynomial functions with simple neural networks.
- To demonstrate validation of a tuning scheme, Bellprat et al. 2016 use history matching on regional climate models for two different regions, and compare the results.
8. “Uncharted History” 16.06.20
We are very happy to have the first-author of the paper with us to present the study!
The computer vision field of image inpainting paper uses several techniques to reconstruct broken images, paintings, etc. In recent years, more and more diverse machine learning techniques have boosted the field. A major step was taken by Liu et al. 2018 paper video in using partial convolutions in a CNN. The study shown here will transfer the technology to climate research. The presentation will show the journey of changing and applying the NVIDIA technique to one of the big obstacles in climate research: missing climate information of the past. Therefore a transfer learning approach is set up using climate model data. After evaluating test-suites, a reconstruction of HadCRUT4 - one of the most important climate data sets - is shown and analyzed.
7. “Compressed Pressure”, 02.06.20
The main paper for this session will be Latent Space Physics: Towards Learning the Temporal Evolution of Fluid Flow, Wiewel et al, 2019. Also see their blog post.
We will also briefly discuss a follow-up from Wiewel et al. 2020, and a related paper on generative fluid modelling from the same group, Kim et al. 2019. The latter is nicely summarized in this video.
For those interested in the underlying ML methods, this session will be about autoencoders and sequence-to-sequence models:
- Autoencoders train pairs of neural networks for unsupervised learning of data representations, and Wiewel et al. use them to compress the high-dimensional volumetric fluid data.
- Sequence-to-sequence models allow to predict a variable-length output sequence from a variable-length input sequence, using a pair of recurrent neural networks. “seq2seq” originated in natural language processing, but as we will see it can also be used to predict sequences of 3D images.
6. “Minimalist Chaos”, 19.05.20
We’ll discus the Lorenz `96 model (L96) and its myriad uses. In “Predictability - a problem partly solved”, Edward Lorenz introduced a simple mathematical model exhibiting many of Earth science’s core computational challenges.
Challenging features of L96 include chaotic dynamics, nonlinearity, combination of dissipative and conservative aspects and coupling of vastly differing scales in space and time. Chaos means that small perturbations in the model state due to numerical errors or observation noise will, over time, lead to large deviations in the future model state.
L96 is a frequent test case for algorithms tackling many fundamental problems. We consider two of these: parameter tuning, and parameterizing sub-grid processes:
- Marcel Nonnenmacher will describe work on identifying the 4 parameters of L96. This includes “Recovering the parameters underlying the Lorenz-96 chaotic dynamics,” Mouatadid et al. 2019, “Earth System Modeling 2.0”, Schneider et al., 2017, as well as his own unpublished work.
- “Coupled online learning as a way to tackle instabilities and biases in neural network parameterizations: general algorithms and Lorenz96 case study (v1.0)”, Rasp 2020. This paper and a related blog post discuss the design of parameterizations that approximate the effect of fast, fine-scale processes on slow, coarse scale ones. Linear and ML-based parameterizations are considered.
- Tobias Finn will guide us through stochastic parameterizations, which approximate deterministic chaos using randomness. “Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz ‘96 Model”, Gagne et al., 2020, uses Generative Adversarial Networks (GANs, see episode 5) to describe uncertainty in the tendency of coarse, slow variables as a result of unseen fast, fine variables. It builds on previous stochastic parameterizations without ML.
Finally, we’ll revisit the original paper and the issue of predictability, nearly 25 years later.
5. “Real Fake Clouds” 05.05.20
This paper uses generative adversarial networks, or GANs. In the GAN framework, a generator network learns to generate “fake” data points while a second discriminator network learns to tell real from fake data. Schmidt et al. use GANs to predict cloud reflectance fields from meteorological variables such as temperature and wind speed. Given these meteorological variables, it can produce multiple realistic output patterns instead of an ensemble average. That is, the network attempts to learn the conditional probability distribution of reflectance given the input variables.
Importantly, this paper wasn’t able to get good results just by applying the GAN framework out of the box, and had to use some of the latest specialized tricks as well. So we’ll briefly go through some of these tricks:
- Adding a term to the loss function that corresponds to supervised learning, as proposed for image to image translation tasks by Isola et al. 2018. pdf
- Multi-scale discriminator and generator networks, via Wang et al. 2018. pdf
- A least squares objective function, proposed by Mao et al. 2017 to avoid vanishing gradients. pdf
4. “Far into the Future”, 21.04.20
Lennard Schmidt from UFZ present on his work. He applies machine learning to do quality control for hydrological measurement data. He also uses a sophisticated convLSTM architecture to predict hydrological dynamics in an Elbe catchment basin. Code for a convLSTM layer in tensorflow/keras can be found here.
Eduardo Zorita presents “Deep learning for multi-year ENSO forecasts,” Ham et al. 2019, Nature. link This paper uses machine learning algorithms to predict the El Niño/Southern Oscillation 1.5 years into the future, farther than previous methods have achieved. Notably, it trains on a combination of simulations and historical data.
Additional references on the predictability paradox in climate science: “Do seasonal‐to‐decadal climate predictions underestimate the predictability of the real world?” Eade et al. 2014, Geophys. Research Letters. link
“Skilful predictions of the winter North Atlantic Oscillation one year ahead.” Dunstone et al. 2016, Nature. link
3. “MetNet, Convolutional-Recurrent Nets, and the Self-Attention Principle” 07.04.20
Linda von Garderen presents on her work.
To understand the ML tools that went into this work, we briefly review some concepts from earlier works:
- The convolutional LSTM, which combines convolutional and recurrent neural nets into a single architecture, as introduced by Xingjian et al. in 2015. paper. Review on LSTMs by Christopher Olah.
- Self-attention and the Transformer architecture, introduced by Vaswani et al. in 2017 https://arxiv.org/pdf/1706.03762.pdf, provide a new alternative to convolutional and recurrent nets. MetNet uses a specialized variant called Axial Attention (Ho et al., 2019)paper. We’ll turn to a blog post by Peter Bloem for helpful illustrations. For further reading on the attention concept, see Lillian Weng’s excellent blog post
With these concepts in mind, we examine how MetNet combines them, and consider their results from the perspectives of both ML and weather prediction.
Relevant discussion links:
- discussion between Stephan Rasp (TU Munich) and the MetNet authors on twitter. link
- F1 score used to quantify performance link
- code on github for axial self-attention link
2. “Don’t Fear the Sphere” 31.03.20
We cover “Spherical CNNs on Unstructured Grids,” Jiang et al. 2019, ICLR. We also survey other ML approaches to spherical data (more links in the description on YouTube). With 5 minute presentations by Julianna Carvalho, Tobias Finn and Lennart Marien.
1. “Hidden Fluid Mechanics” 24.03.20
We discuss the paper “Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations,” Raissi et al. 2020, Science, and the more technical study from the same group, “Physics Informed Neural Networks,” Raissi et al., 2019, J. Computational Physics. Tobias Weigel from DKRZ explains the ML support team that forms part of the local Helmholtz AI unit.