Women Data Scientists + Columbia Lecturer Host NYC Coding ‘Sprint’

Event Aims to Boost Women’s Participation in Open-Source Software Development

Andreas Mueller is a lecturer at the Data Science Institute and lead software maintainer for Python's scikit-learn toolbox.

Women are notably underrepresented in software development and this is especially true of open-source software. A 2013 survey found that only 11 percent of women were contributors.

To increase participation, the NYC chapter of Women in Machine Learning and Data Science (WiMLDS) has partnered with Columbia Data Science Institute lecturer Andreas Mueller to host a ‘sprint’ on Saturday for volunteer programmers to contribute to Python’s data-science toolkit, scikit-learn. Mueller handles the day-to-day upkeep of scikit-learn, an unpaid job that he himself landed by volunteering at a sprint.

Mueller was recently interviewed by Reshama Shaikh, a data scientist organizing the March 4 event. Questions and answers have been edited from an earlier version for length and clarity.

How did you get started in open-source and scikit-learn?

It all started in 2011 at the NIPS (Neural Information Processing Systems) conference in Granada, Spain, where I had attended a scikit-learn sprint. The scikit-learn release manager at the time had to leave, and I was asked to become release manager.

You reached out to WiMLDS to invite our group to a sprint and have applied to the National Science Foundation for a grant. Why?

There are very few women in computer science and it’s clear that gender bias is part of the problem. There is only one woman among the top 100 contributors to the scikit-learn library. Fortunately, many funding agencies are willing to fund diversity and research.

What do women bring to open source that’s missing?

This is a complicated question, and I want to avoid statements that are generalizations; that one gender does something that another doesn’t. My goal is to make sure that everyone participates. Since men and women use open source, it would be beneficial for the entire ecosystem if both contributed.

Why do you think women are less involved?

The gender disparity is a substantial problem in other places in tech. It could be that women don’t get the opportunity to start being involved. A female friend, a high-profile machine learning researcher, told me she felt anxious about posting in the scikit-learn issue tracker for fear of a mistake being seen by everyone.

Why is contributing to open source so important?

Basically the whole internet works on Linux, and that is open source. Other software projects receive corporate funding. This is true for the Apache ecosystem. Most scientific packages—Python, R and Julia-- don’t have support from industry at all. Many people (including students and self-learners) would not be able to do their work without it. Accessibility to open source is fundamental for education and research. The startup community has flourished as a result of this access.

How does one get involved in contributing to open source?

Mueller: People can reach out to a project on a mailing list. Projects have guidelines on how to contribute, how to get started; they can also sign up for the mailing list. There is an issue tracker on github that lists things people can work on: fix a bug or make a small addition. My advice: start with something small and go on to more interesting stuff. Details here.

Are there other open source projects in need of contributors?

Numpy, matplotlib, jupyter, pandas and scipy. More details can be found at: scikit-learn.org.

550 West 120th Street, Northwest Corner Building, Suite 1401, New York, N.Y. 10027    212.854.5660
©2020 Columbia University