Projects

Personal projects

: a web app to explore and visualize the world's quotations. I scraped and parsed the corpus of en.wikiquote.org using BeautifulSoup, and created a validated semantic vector embedding of all 200,000 quotes using gensim and Doc2Vec. I created a web app on AWS using a Flask+MySQL backend to let users find quotes related to any quote or keywords and create interactive visualizations using principal components analysis and D3.

: an interactive visualization of the distribution of coffee chains around New York. I used open data from New York City Department of Health to obtain data on every coffee place in the city, and created the visualization using D3 and leaflet.js.

: I placed 23rd out of 2,623 in Kaggle's Instacart Market Basket Analysis competition. My submission used extensive feature engineering and gradient-boosted classification trees to predict future grocery purchases given historical data for over 200,000 customers. It was implemented in python using pandas, scikit-learn, xgboost and lightgbm.

Collaborative projects

: a python-based framework for conducting replicable behavioral experiments online using Amazon Mechanical Turk. PsiTurk is used by researchers at over a dozen universities. I've contributed over 100 commits to the project, including implementing the framework's command line interface, and have led tutorials on psiTurk at several conferences and workshop.

: in this ongoing project, we are building a dynamic topic model of the Proceedings of the Cognitive Science Society, to measure how the field and its sub-areas have evolved over the last two decades.

: under the advisement of Daniel L. Chen at the Toulouse School of Economics, we used data from the US General Social Survey to try to detect effect of US Circuit Court rulings on social attitudes. By cross-validating Lasso regression models with and without court ruling predictors, we showed that there is no clear evidence of Circuit Court rulings affecting attitudes.

I have also contributed to a variety of other open source projects, including Data For Democracy's winning submission to the UN's Internal Displacement Event Tagging data challenge, DrivenData's Concept to Clinic lung nodule detection project, and the Columbia Blei Lab's Dynamic Topic Model implementation.