### About The Speaker

#### Johannes Oberreuter

Safe Sex - The idea of synthetic data is to mimic the statistical properties of a real dataset without exposing individual entities but exchanging them for synthetic ones. All across the industry (e.g. banks, insurances, ...) this is in high demand due to increased privacy concerns (GDPR). Different approaches exist to generate synthetic data like PCA, autoencoders and generative models. The big challenge is to correctly represent the statistical properties while avoiding simply generating duplicates. Only then it is safe to use this procedure as the latter would expose protected information. In this presentation, we compare the performance of two generative models, namely variational autoencoders and GANs by evaluating how succesful the output is to train supervised and unsupervised models with respect to the real data. Furthermore, we employ the KNN-algorithm to examine the similarity between synthetic and real data and thus determine which algorithm generates entities to be exposed more safely. Finally, we report on the use of variational models to enhance class balance.

**Safe Sex**

The idea of synthetic data is to mimic the statistical properties of a real dataset without exposing individual entities but exchanging them for synthetic ones. All across the industry (e.g. banks, insurances, …) this is in high demand due to increased privacy concerns (GDPR). Different approaches exist to generate synthetic data like PCA, autoencoders and generative models. The big challenge is to correctly represent the statistical properties while avoiding simply generating duplicates. Only then it is safe to use this procedure as the latter would expose protected information. In this presentation, we compare the performance of two generative models, namely variational autoencoders and GANs by evaluating how succesful the output is to train supervised and unsupervised models with respect to the real data. Furthermore, we employ the KNN-algorithm to examine the similarity between synthetic and real data and thus determine which algorithm generates entities to be exposed more safely. Finally, we report on the use of variational models to enhance class balance.

Johannes Oberreuter works as a Data Scientist for Machine Learning Reply in Munich. He is interested in image analysis, natural language processing, deep learning and interpretability of machine Learning models. Before that, he has studied physics, mathematics and philosophy at the Universities of Munich and Heidelberg. He has also obtained a masters degree in mathematics at the University of Cambridge and a PhD at the University of Amsterdam in Cosmologie. As a research associate at the University of Göttingen and the Technical University of Munich he has done research on quantum simulation and quantum entropy. He is an enthusiastic member of the quantum computing group of Reply.