Biased Data, Models, and Algorithms. Perspectives from Data Science

Biased Data Titelbild
Adobe Stock. Lizenzen für Bildungseinrichtungen

Data Science and especially machine learning technologies require massive amounts of data in order to produce stable and reliable models. At the same time, researchers and public advocates pointed for several years to problems embedded in data. Be it as unbalanced data sets and highly problematic representations of culture or society. The event brings together critical approaches and high-end data science applications. We want to discuss how we can cope with the challenges of data-hungry algorithms in a society aware of collection and representation shortcomings.

Join us on 31st of October 2022, Mittelstrasse 43, Room 320.

The event is part of the series «Kritische Perspektiven auf die Digitalisierung» (Critical Perspectives on Digitization) mandated by the Vice-Rectorate Quality.

Call for projects and data sets

We are looking for projects that want to discuss their data sets or collections with the community.
We want to discuss what data you accumulated and how you envision making aware of bias and/or even reducing bias.

For the workshop part, we have six slots. Please reach out to Laura (see above) and pitch your data set in two paragraphs.
We strive for a variety of fields and communities represented in the workshop.

Keynotes

Krikamol Muandet

Towards Collective Intelligence in Heterogeneous Learning

Abstract: Democratization of AI involves training and deploying machine learning models across heterogeneous and potentially massive environments. While a diversity of data can create new possibilities to advance AI systems, it simultaneously poses pressing concerns such as privacy, security, and equity that restricts the extent to which information can be shared across environments. Inspired by the social choice theory, I will first present a choice-theoretic perspective of machine learning as a tool to analyze learning algorithms in heterogeneous environments. To understand the fundamental limits, I will then provide a minimum requirement in terms of intuitive and reasonable axioms under which an empirical risk minimization (ERM) is the only rational learning algorithm in heterogeneous environments. This impossibility result implies that Collective Intelligence (CI), the ability of algorithms to successfully learn across heterogeneous environments, cannot be achieved without sacrificing at least one of these essential properties. Lastly, I will discuss the implications of this result in critical areas of machine learning such as out-of-distribution generalization, federated learning, algorithmic fairness, and multi-modal learning.

Bio: Krikamol Muandet is a tenure-track faculty member at CISPA Helmholtz Center for Information Security, Saarbrücken, Germany. Before joining CISPA, he was a research group leader in the Empirical Inference Department at the Max Planck Institute for Intelligent Systems (MPI-IS), Tübingen, Germany. He was a lecturer in the Department of Mathematics at Mahidol University, Bangkok, Thailand. He received his Ph.D. in computer science from the University of Tübingen in 2015 working mainly with Prof. Bernhard Schölkopf. He received his master's degree in machine learning from University College London (UCL), the United Kingdom where he worked mostly with Prof. Yee Whye Teh at Gatsby Computational Neuroscience Unit. He served as a publication chair of AISTATS 2021 and as an area chair for ICLR 2023, AISTATS 2022, NeurIPS 2021, NeurIPS 2020, NeurIPS 2019, and ICML 2019, among others.

His research interests include kernel methods, kernel mean embedding of distributions, learning under distributional shifts, domain generalization, counterfactual inference, and how to regulate the deployment of machine learning models.

Anna Jobin

What bias? Whose bias? A multi-level perspective on data and Artificial Intelligence.

Abstract: AI technologies such as machine learning, deep learning and artificial neural networks are reshaping data processing and analysis. They have been heralded as solutions to complex problems and are increasingly being used in a variety of sectors including communication, healthcare, and transportation. In light of their powerful transformative force and profound impact, recent scandals have sparked ample debate in academic literature and media coverage. Some AI systems have been shown to discriminate and generally fail to deliver on the promises made.

What are the origins of such failures, and what are the principles and values that should guide the development and use of AI? This talk will address these issues by centering on the notion of bias. Drawing on insights from STS (science and technology studies) as well as on critical algorithm studies, it will discuss why unbiased AI is neither achievable nor desirable. Data is never neutral, and data processing is always deeply intertwined with human decision making. Dealing with bias is therefore not only a technological matter but also one of policy and society.

Bio: Anna Jobin is a researcher with a multidisciplinary background in sociology, economics, and information management. Currently, she serves as a Senior Researcher at the Humboldt Institute for Internet and Society (HIIG) in Berlin, as well as a lecturer at EPFL (Lausanne, Switzerland) and the MCI (Innsbruck, Austria).

In addition, she is an inaugural member of the Swiss Young Academy, an advisory member at the ZeMKI affiliated with the Lab "Platform Governance, Media, and Technology" (University of Bremen), and an associate member of the Sciences and Technologies Studies Laboratory (STSLab) of Lausanne University. Previous affiliations include EPFL, ETH, Cornell University and Tufts University. Her research projects are situated at the crossing of science, technology, and society, with a particular focus on interaction with algorithmic systems, (digital) ethics in research and citizen science, and ethical artificial intelligence.

As an internationally recognized expert on the intersection of digital technology and society, her research and expertise have been featured in popular and specialized media alike. She volunteers as a board member of the Swiss STS Association and as a member of the steering committee of the Swiss Internet Governance Forum.

Since October 2021, Anna Jobin is president of the Swiss Federal Media Commission, an extra-parliamentary commission tasked with advising the Swiss Federal Council on media policy.

Datasets

Biases in Climate Data

Stefan Brönnimann, Institute of Geography and Oeschger Centre for Climate Change Research, University of Bern

Weather stations do not sample the atmosphere in a random or regular fashion, and generating weather or climate fields face the problem of biased sampling. This becomes particularly clear when going back in time. The weather was observed on merchant ships, at strategic locations, trading posts, colonial settlements, or in missionary stations. The data coverage thus reflects world trade, the centers of enlightenment, nation-states, the cold war, and colonialism, and thus becomes a mirror of world history.
Gap-free, spatially and temporally complete data sets are generated from such biased data using a variety of approaches, ranging from geostatistical approaches to data assimilation with numerical models and machine learning approaches. Estimates of spatial covariance or numerical models (weather forecast models) are used to export the information to non-observed regions. The bias is reflected in a statistical form, such as uncertainty or an ensemble spread that will be larger over regions with sparse coverage.
The problems underlying the bias are generally not communicated. The focus is on using the information to reduce errors, but the information on the bias is not passed on to the user. This is a pity because seeing climate data as a societal product could help to understand better the problem at hand (for which climate data products are often only an intermediate step). For instance, if climate data are used for adaptation strategies in agriculture, it might be relevant to know that a station was abandoned because farming was abandoned (this might be even more relevant than the henceforth interpolated data for that location).
The bias of climate data is more and more discussed in the context of questions of justice. It is also sometimes used in the context of human-environmental interaction.

Biased Data, Models, and Algorithms. Perspectives from Data Science Biased (Vormittag)

Vortrag von Prof. Dr. Christiane Tretter (BeDSI): Introduction to the Bern Data Science Initiative

Keynote von Keynote by Dr. Krikamol Muandet: Towards Collective Intelligence in Heterogeneous Learning

Workshop Part 1: Show your Data (Prof. Dr. Stefan Brönnimann und Ismail Prada)

Biased Data, Models, and Algorithms. Perspectives from Data Science Biased (Nachmittag)

Workshop Part 2 led by Gero Schreier and Sumi Suntharam (UB): Open Research Data infrastructure at UNIBE