Benjamin Bengfort has been a faculty member and advisor of the Data Analytics Certificate program since it began in early 2014. His background includes significant professional and military experience beyond his academic work, and he has been a professional Python software developer for the past eight years and a professional data scientist since earning his Master of Science, Computer Science from North Dakota State University in 2008. He is a board member of Data Community DC, and a faculty member of District Data Labs. His main research interests include mixed-initiative machine learning, natural language processing, goal-driven autonomy, collaborative filtering, and multi-agent systems. He is currently pursuing his Ph.D. in Computer Science at The University of Maryland at College Park, working on natural language question and answer systems.
Which courses do you teach for the Center for Continuing and Professional Education? What do you cover?
I teach several courses in the Data Analytics Certificate program, including Foundations of Data Analytics and Data Science, Software Engineering for Data, Machine Learning, and Applied Data Analytics. The Foundations course is an economic perspective into why data science is trending right now, what it is, and what it means to be a data scientist (or to do data science). This foundational course covers the data science pipeline, a pedagogical tool that informs the structure of the rest of the course. In the Software course, I go over best practices for data engineering with Python, and talk about how to write software in a data team composed of statisticians, engineers, and domain experts. In Machine Learning, I cover classification, clustering, and regression algorithms that are used to fit models to existing data sets in order to make predictions. Finally, in Applied, I go over a complete, end-to-end data project, building a recommender system before showcasing student capstone projects.
What is your career story? How did you get to be where you are now?
It was not a straight path. I started out majoring in English at the U.S. Naval Academy (much to the chagrin of my mother, who always thought my strengths were in maths and sciences!), then switched to economics at University of Maryland College Park after an injury. It wasn't until I was living in England with my wife, who was pursuing a master's degree at University of Oxford, that I began to teach myself programming, even though at that time I was toying with the idea of doing graduate work in theology. I got a job working on the electronic version of the Oxford English Dictionary at Oxford University Press, and that experience ultimately led me to apply to graduate programs in computer science upon our return to the U.S. in 2008.
I completed my master's degree at North Dakota State University in 2010. My thesis advisor left Fargo for Philadelphia at that point, and so I went into industry, working first in cybersecurity, then for a time at a educational technology start-up I co-founded. It was during that time that I began to morph from "computer scientist" to "data scientist." I also decided to return once more to school to complete my Ph.D. and, after time at another start-up developing a data-based consumer recommendation engine, I began work full-time as a researcher and Ph.D. candidate in the artificial intelligence area of computer science at University of Maryland College Park.
Data science is almost as much a creative field as it is a technical field, and that's why I love what I do. It has let me take all of these diverse experiences and use them to come up with novel solutions to real problems.
What has opened up doors and opportunities for you professionally over the years?
I started as a pure programmer, working in C and Python to develop distributed software systems, particularly using wireless networks. Wireless networks are everywhere, and pretty quickly I became buried in heaps of data--and the only way to do anything meaningful with it was to use statistical approaches and leverage big data technologies like Hadoop. Once I started moving in that direction, I became a data scientist almost by default. It was, in some ways, an accident of affinity and ability. The transformation was complete once I joined Data Community DC, an organization that connects data professionals and promotes their work in DC, Maryland, and Virginia. Through connections made in DC2, I was able to co-write books for O’Reilly and Packt, speak at Strata and other events, and grow in many more fields that interest data scientists, all outside of my formal education.
What trends are you most excited about in your field?
I’m most excited about open data initiatives that make data sources available; the trickiest part of data science is getting and wrangling the data, and open data sources are making it easier. I often say that right now the innovation that happens in data science is the smashing together of novel data sets and trying them against different algorithms, and right now we are nowhere near to exhausting all the possibilities.
I’m also thrilled to see people from diverse professional backgrounds trying to become data scientists; it used to be the domain solely of programmers, statisticians, or Ph.D. experts. Now, more people are getting into it, and when they realize they have to do programming or math—they don’t back off. That’s really exciting. A lot of it has to do with data celebrities, but is also due in part to data communities all over the country as well as increasingly spectacular (and increasingly widely-reported) results for machine learning applications.
Do you have any advice for professionals in your field? What about those looking to find jobs in your field?
Start with the basics. Learn to program—and become fluent in more than one programming language—so you can pick the most suitable tool to apply to your problem. And if you haven’t looked at statistics in a while, dust off that old high school or college textbook. Without a solid handle on statistics and programming, you can’t really begin to tackle the work of data science.
Even when you’ve become adept at the skills you need, don't be afraid to collaborate and work with others. I might spend several hours working alone at my computer, but then it's time to stand up and walk over to the coffee shop with one of my classmates or colleagues. We get much further by talking to each other and offering each other a different perspective. And don't neglect the many free or inexpensive options to enhance your skills. The DC area has a thriving Meetup and workshop ecosystem. Take advantage of that.
What are the most challenging and rewarding parts of your job?
The most challenging aspect for me is balancing the idea of “fast and dirty research-style code” with building well-designed data systems. There is a trade-off between time and effort when it comes to data engineering, and often we want answers fast, no matter how we get them. I love designing well-built novel systems, however, and when it comes to data ingestion, wrangling, and computational systems I hope that I’ve shown in my career over and over again that a well-designed data product, a software application that not only derives its value from data, but also generates new data in return, is precisely what drives the data economy, and taking care to enforce good design in data systems is extremely important.
As for the most rewarding, there is a joke about electrical engineers--how they will work hours and weeks to get a little LED light blinking, and that tiny innocuous blink represents significant innovation, even though the outside world only sees a blinking light. My little red light is incremental improvement in the precision and recall of statistical models. Having a cross validation move up from 91.2% to 91.8% is an indication of success in data science. But the most rewarding part is when my students, interns, or colleagues finally get what a significance that is. It means that they’ve understood the fundamental something about programming, statistics, and data that drives the data science economy.
Who is your greatest inspiration?
My entire family inspires me in different ways. My mother home-schooled my sisters and I for ten years, then returned to the workforce by building and running a hugely successful tech start-up. My father completed his own doctorate degree just a few years ago, in higher education administration. Each of them combines their own ambition with a sincere desire to help others in their respective fields succeed.
I also have two sisters who are doing incredible things. One of them traveled and lived in China for three years before returning to DC to do policy work on international semiconductor trade. The other is quickly racking up degrees and fluencies in foreign languages, and has finally relaxed from her international explorations to settle down as a law student at Stanford. Both of them combine courage with an open mind to explore new ideas and new cultures in a way that is very meaningful to me.
Finally my wife, a military veteran and a Rhodes scholar, has put down her sword in favor of a much more powerful pen. She writes captivating and unsettling fiction, emotional poetry, and works to perfect her craft like the master she is. Her unique talent is to find art in the mundane and to expose the what could be in the things we typically ignore. My family’s diverse talents and interests constantly expose me to new ideas every day, and all of us love learning. Their inspiration drives my research and my work.
What do you do that creates a strong learning environment for your students?
I’m well known for setting an extremely fast classroom pace that doesn’t let up and borders on uncomfortable. But I don’t do this because I want to “weed out” students or to prove how much smarter I am than they are. In fact at the very beginning, I let them know that this is going to be a firehose for precisely the same reason a firehose works--there is little time and a lot of information and that creates the pressure of my classroom environment.
In foreign language learning, this kind of environment is called “immersive,” and that’s the environment I’m trying to create. Data science covers a lot of ground, from programming, software development, and distributed systems to statistical models and validation, regressions, classifications, machine learning, deployment, visualization and even more. My students are generally in three categories: programmers, statisticians, or domain experts. In an immersive environment, students start to blend together into data scientists, students start to rely on the expertise of their classmates, and this is truly what it’s about.
When my students relax, and focus on absorbing and understanding rather than details, they find themselves on a broad path, on which I’ve posted signage to routes of deeper inquisition. And of course, when they are ready to explore deeper, outside of the classroom I’m available to answer questions and provide as much advice as I can.
We strive to infuse values of The Spirit of Georgetown into everything we do. Which Jesuit value speaks to you most?
The Jesuit value that speaks to me most is “Community in Diversity.” I firmly believe that through diversity comes innovation, and that homogenous teams can do nothing but repeat what they know, including the mistakes that they know how to make. There is, of course, a data model and a visualization that shows how we must all consciously work towards diversity, or we will not achieve it: The Parable of the Polygons. Statistical models benefit from diverse data sources, and so do data teams.