Solving the World's Problems with Data Science
Data science is helping companies maximize profits, but it can also be a powerful tool for solving social problems.
The Data Science for Social Good conference held last month in Chicago highlighted new work in this field, with two Data Science Institute Master’s students—Carlos Espino and Amir Imani —presenting research of their own. Imani’s project used natural language processing techniques to show that ethics rules governing chemists from country to country are more similar than many politicians realized. Espino used crowdsourced data, including cellphone records, to infer urban poverty levels in neighborhoods across Milan and Mexico City.
Predicting Poverty Levels with Crowdsourced Data
Most countries have some version of the U.S. Census Bureau to collect demographic data to guide policymaking. But the high cost of data collection limits how often countries can survey their citizens. This is especially problematic in rapidly growing cities. The needs of the poor can go unmet for years if governments and aid organizations are unaware of their plight.
Espino, a Columbia Master’s student who graduates in December, spent the summer exploring ways to effectively lower the cost of gathering demographic data. As a Data Science for Social Good fellow at the University of Washington, he and Columbia classmate Rachael Dottle paired mobile phone with Open Street Maps to infer poverty levels in Milan and Mexico City, where Espino is from originally.
If neighborhoods had more bars, restaurants and people chatting on mobile phones, they could reasonably assume its residents were better off. They used this information to build a poverty prediction model that significantly outperformed baseline models. They embedded the model into a dashboard to let UN and nonprofit aid workers identify the poorest neighborhoods.
Espino says he hopes to build out the dashboard so that users can upload data in any city to get predicted poverty levels and other information, including “street network centrality” or how connected city regions are (with poorer regions showing less connectivity). The research is part of a larger effort to identify pockets of extreme poverty from infrastructure sensors, social media and other platforms generating massive data.
Topic Modeling to Identify Shared Political Ground
The vast majority of countries in the world have agreed to a ban on producing, using and stockpiling chemical weapons under the 1993 Convention on Chemical Weapons. But many continue to resist enacting a universal set of ethical guidelines for chemists in industry and academia. The watchdog overseeing the weapons ban, the Organisation for the Prohibition of Chemical Weapons (OPCW), has been pushing for such an agreement.
As a consultant to the OPCW’s Office of Strategy and Policy, Imani, a native of Iran and now a Master’s student at Columbia, used natural language processing tools to analyze more than 140 codes of conduct filling more than a thousand pages of text.
Treating each document as a node in a network, he looked at similarities by country of origin, type of institution (university, scientific society, industry or government) and type of document (code of conduct, code of ethics). By visualizing the nodes in his semantic-similarity analysis, he found that documents clustered by institution, not country of origin, suggesting that ethical standards varied very little across political and cultural boundaries. The analysis helped the OPCW formulate The Hague Ethical Guidelines establishing a common code of conduct to discourage chemists from misusing their knowledge.
“The result provided a common ground for a highly political discussion to move forward,” he said.
Read Imani’s related blog piece, “Can ethics be learned?”
— Kim Martineau