How I Learned Data Science Skills as a Math PhD
This post is a review on how I picked up data science skills when I was a PhD student in mathematics. The time span was from January 2016 to May 2017, before I got my job. I break it into three periods: before, during, and after my summer internship. Here, I will talk about what I did, and not what I think I should have done — that’s for another post.
The Transition to Industry
Three and half years into graduate school in the United States, I wanted to leave academia more than ever. In December 2015, I decided to look into my employment options seriously. I had no idea, however, on how to start this transition into the industry. People said math skills are valuable, but the context was never about PhDs. I didn’t see why a company would hire me. The recent graduates from our department went for actuary or software engineering jobs, but I didn’t know how they made it. So, I started my first job search.
I looked into three areas: management consulting, software engineering, and data analytics. First thing I noticed was that companies love STEM majors, as long as it’s not pure Math. They wanted Comp-Sci, Physics, Biology, and all sorts of engineering degrees, but no one wanted pure Math. “Strong analytical skill” by itself meant nothing. The reality was that I didn’t have a marketable skill set. It seemed to me the most concrete skills I could put on my resume was programming.
But I knew only one language back then: MATLAB. It was for homework on partial differential equations. Plus, there were close to zero MATLAB jobs. I had to learn another language, but what could I learn in less than a year?
I was lost. I turned to another student of my advisor, Jiayi Jiang, for help. Jiayi was six months away from graduation but already went through several job interviews. He told me that my best career bet was going into software engineering or machine learning. I asked for more detail about the machine learning route. He then told me that all the interview questions he got were within the scope of a single book — Elements of Statistical Learning (ESL), which might be a good start for me. So ESL became my entry point of data science.
Before the Internship: R and SQL (Jan-May 2016)
I quit ESL by chapter 2. It was a great read, but it was all math and no programming. My real need now was learning to program. After more research, I found that the authors of ESL had put out another book called An Introduction to Statistical Learning (ISLR) with R exercises. I started going through the chapters with all the R exercises, and it was wonderful. The math was simpler than ESL, but I got to learn R.
At the same time, I started applying for internships. Something called SQL was required on every single one of the job postings, so I looked into how to learn SQL by myself. I bought the kindle version of Head First SQL based on Reddit recommendations. (Warning: the kindle version was horrible, don’t buy it.) The learning process was dry and painful. I had to force myself through the chapters because the value of SQL wasn’t clear to me. I stopped after I finished select and where clauses, right before the chapter on joins.
I spent 30 minutes a day to learn besides writing my resume and applying for jobs. In late February, I got my first response when I was at a math conference in Maryland. The email contained a few questions that I answered next day. The day after, they wanted me to take a data challenge. I almost told them I wasn’t ready for that. Instead, I told myself to give it a shot. I managed to scramble together a data report with the help of ISLR, Google, and caffeine.
The offer came in mid-March and I took it. Now that I had an internship lined up, I decided to pause my learning plan so I can get as much research done as possible for my dissertation before the internship starts in May.
During the Internship: R, SQL, and Statistics (May-Aug 2016)
On my first day as an intern, I read a Tufte book. When I was waiting for a work laptop from IT, my boss asked: “Have you read Tufte?” and put a copy of The Visual Display of Quantitative Information on my desk. He was cleaning the team library that day. I had never heard of Tufte, but it was a great read and an eye-opening exposition to data visualization for me.
Over the next two months, this sort of exchange happened again and again. Many of my discussions in the office pointed me to a resource in statistics that I had never heard of. The terms that my colleagues used became little conceptual gold nuggets that I could plug into the search bar to expand the boundary of my knowledge comfort zone. It was a lot of fun.
The analytics team I was in was an R and SQL stack. I spent 5-6 hours a day on R and SQL for my project, and my coding skills improved. But the real value came from the help and code review I got from my mentors — they gave me a better idea on which learning goals to set and how to achieve them. As a graduate student, I was isolated in the world of mathematics research. It was through the internship that I re-discovered the importance of peers in the learning process.
After the internship: Python, R (Aug 2016-May 2017)
After I came back from the internship, I got to teach the statistics lab in the Fall. After two months of work, I wanted to show my students why R was awesome for data analysis. To prepare my lecture notes, I read several books like R for Data Science and Applied Predictive Modeling. My students also came to me with all kinds of questions, project ideas, and bugs.
Besides R, I started learning Python. Back in March, I tried to get into Python with the tutorial Learning Python the Hard Way. It was so boring that in two weeks I stopped and never went back. The isolation and uselessness of material killed my desire to learn. Having said unpleasant experience, I changed my approach this time. I talked my friend Armenak into learning Python with me using another book that I found on Reddit, Automate the Boring Stuff with Python. It turned out to be the best intro-level programming book I had ever used. Armenak and I met at the Starbucks twice a week, where we followed the instructions from the book closely. I learned to make all kinds of throwaway scripts, including my first web scraper. It was fun and useful. Knowing how to write simple Python scripts became a great asset to my current job.
The internship experience made me realize that the best way for me to learn was getting constant feedback on things that I did. My students gave me feedback in R, and my friend gave me feedback in Python. It worked.
So, is that it?
No, I learned more than R and Python after the summer internship, but I can’t go into detail without first talking about meetups. Looking back, I was lucky to find two awesome Nashville communities: Developer Launchpad and Penny University, that helped me build, filter, and prioritize the list of things to learn at a faster rate than anything I had before. I will explain why meetups are awesome in the next post.