Advice to Future Data Scientists: Write Code, Any Code
Aug. 24, 2015
The Georgetown Data Science Certificate program sets ambitious goals for students—cramming software and statistics into 108 hours over a single semester, while also requiring students to work on and produce innovative capstone projects. Most students graduate the program realizing that data science is a team sport, and that effective members of data teams are “jack-of-all-trades” who can adapt and do a variety of jobs at any given time.
In order to be successful in the hypothesis-driven world of data science; students and future data scientists should consider the following advice:
Write code, any code. The best way to learn is by doing. Your code doesn’t have to be pretty; just make it do something, anything. Just like writing, you have to write bad stuff in order to refine it into the good stuff. Ideas of what to do should not be a blocker. On the forums we have many simple suggestions of small software projects to write. Better yet go to codeeval for small programming challenges.
Note what’s challenging for you. You have a lot to explore; mark down the interesting stuff that you’re struggling with, and then back off of it for a bit. Return when you’re ready to explore the topic in depth.
Focus on what you’re good at. Not everyone is a programmer. Not everyone is a statistician. Not everyone can be a domain expert. Whatever interests you, whatever talent you have, augment your assignments with that. One clear example: data narratives are of increasing importance; folks who have a journalist’s background focus on these types of projects and apply the data science pipeline to the act of creating data stories.
Find your own workflow. Windows users will probably find that combining Windows Explorer (or Finder on Mac) with the Terminal will help them better understand what they’re doing. Use an IDE like PyCharm or Spyder or just a simple text editor. If you don’t understand command line git, get the GUI version. Combine the tools you know with the ones that you don’t in order to have the best results.
Relax into it and absorb. When large amounts if information is coming at you, don’t get lost in the details. Instead try to take in the whole picture and note topics, terms, or anecdotes that you either find interesting, or don't understand completely. Come back to them later when you have some free time.
Follow up. Finally, follow up with your instructors. Ask for pointers to more material or specific questions from your notes. Use forums to discuss topics with classmates and experts. Work collaboratively on the material, software, or machine learning topics. Don’t be an island!
Remember, data science is exploratory in nature—innovation happens when data inference techniques are applied to new data sets, or new methods are applied to older data sets. Future Data Scientists should keep in mind that they are embarking on a career whose predominate feature is continuing education and to fall back on the advice above whenever challenges arise.
Benjamin Bengfort is a faculty member and advisor for the Data Analytics Certificate program. His background includes significant professional and military experience beyond his academic work, and he has been a professional Python software developer for the past eight years and a professional data scientist since earning his Master of Science, Computer Science from North Dakota State University in 2008. Benjamin is also a board member of Data Community DC, and a faculty member of District Data Labs.