My name is Ben Cook and I lead the Applied Machine Learning team at Hudl. We’re a group of 12 engineers and data scientists focused on supercharging Hudl Assist and making Hudl Focus cameras smarter.
This is pretty much my dream job. I’m very fortunate to be able to work on hard, interesting problems that impact the sports industry in a big way. But the route I took to get here was indirect, to say the least.
Even though I was pretty good at math as a kid, by the end of high school I had decided that it wasn’t for me. As an undergrad, I studied sociology without taking a single course in statistics.
As a first year grad student at University of Nebraska-Lincoln (UNL), I took a stats course that was taught in Stata. Because I had started reading about all the cool things you can do in R and wanted to learn, I did all the programming assignments twice: once in Stata and once in R. In this way, I was able to prove to myself that I was using R correctly.
Pretty quickly, I realized that I was more interested in statistics and programming than social theory. So much so that I decided to switch to a more quantitative graduate program. This was right when people started talking about “data science” and I wanted to study something as close to this new field as possible. The only problem was that I didn’t have the necessary background.
To address my deficiency, I took all my math pre-requisites through a distance learning program on the side while adding a statistics minor to my master’s degree.
At this point, I was technically qualified for most graduate programs I was interested in, but my background was non-traditional. After completing several applications and getting almost as many rejections, I finally got an acceptance letter, from Harvard! The program was a 1-year master’s in computational science and engineering — basically applied math and computer science.
As I was finishing my degree, I reached out to Hudl. My pitch: you all have a lot of data and you should be doing analytics on it. I’d love to come help you figure that out!
After a series of conversations with the CTO, I started as Hudl’s first data scientist. What I pictured myself doing was building a series of epic sports analytics models, basically Moneyball for high school football. What I actually did was mostly pulling data from microservice Mongo clusters to answer basic questions about metrics like video views… not exactly what I had envisioned.
It didn’t take very long for me to realize that we needed a data warehouse, so I formed a small data engineering team and we spun up an MVP.
Once we had a functioning data warehouse, I helped start Hudl’s first R&D team — a marriage between the computer vision and data science teams. This team did a lot of cool research, but the biggest impact we ever had was building and shipping a player tracking system, including a UI for manually correcting the output of our player tracking algorithm. And what we found out (the hard way) was that developing a killer machine learning algorithm is a small percentage of the overall problem if you want to create a real-world system.
In early 2020, we transitioned the R&D team to become the Applied Machine Learning team. We don’t need to be industry leaders in state of the art object detection, but we are damn sure going to be world class at building intelligent systems at massive scale, systems that automate data generation when it makes sense, and augment human effort when it’s more feasible.
For this to work, data scientists on my team have to write high quality code! They also need to understand how Docker works and be able to use an IDE efficiently. Sometimes, working through a problem in a Jupyter notebook makes sense, but sometimes you need to write good unit tests!
The takeaway is that the most effective data scientists are also pretty decent software engineers. You can’t be an expert in everything, but the better you are at writing code and the better you understand how software tools work, the more you will be able to get your work into production so that it can impact real people. That’s very motivating for me so I am constantly working hard to become a better programmer.
Don’t get me wrong, there is definitely a place for people who want to develop algorithms for a living, doing research and writing papers. We need those people to keep making progress! But that skill set is overemphasized in the data science and machine learning communities. New state of the art models are getting open sourced every day. The APIs are getting better and the barrier to entry is getting lower.
The goal of this website is to help people who want to become better data scientists become better programmers — anyone on a journey like mine. I mostly do this with bite-sized tricks, solutions and quick starts. I hope you like it!