ML pipelines should be Python packages: Transfer learning

When machine learning pipelines are well-formed Python packages, transfer learning is much easier!

This post is my second stab at convincing people that ML pipelines should be Python packages. A previous post argued (among other things) that Python packages make it easier to develop and understand an ML pipeline. Here I want to make the case that Python packages make it easier to develop and understand future ML pipelines. That is, Python packages dramatically simplify transfer learning because they’re composable. This may seem obvious if you’ve used something like Keras Applications, but are you actually writing Python packages when you build machine learning models…?

The test case for this argument is an ASCII letter classifier that starts from some MNIST feature weights. In this admittedly contrived example, starting from feature weights is important because I have many fewer labeled examples of ASCII letters (100 per class to be precise) than MNIST digits.

[Read More]

Object spread operator for Python

Say you have a dictionary that you want to both copy and update. In JavaScript, this is a common pattern that gets its own syntax, called the object spread operator:

const oldObject = { hello: 'world', foo: 'bar' }
const newObject = { ...oldObject, foo: 'baz' }

After running this snippet, newObject will be an updated copy of oldObject{ hello: 'world', foo: 'baz' }. Turns out, you can also do this in Python since 3.5:

old_dict = {'hello': 'world', 'foo': 'bar'}
new_dict = {**old_dict, 'foo': 'baz'}
[Read More]

Source code layout for ML pipelines

When I look at a new open source deep learning project, I start with several questions.

  • What’s the structure of the model?
  • How’s the model trained?
  • How’s the training data formatted? How’s it preprocessed?
  • How can I do inference with a trained model?

But for a machine learning pipeline to be useful to me in a real world scenario, all of the above are table stakes. There’s no way to make progress on model architecture or hyperparameter optimization until these questions are well understood.

And although a ton of machine learning progress is being made in a transparent way, many research-focused repositories obfuscate the answer to these basic questions.

[Read More]

Changing the Python Version in Conda

The latest version of Anaconda comes with Python 3.7. But sometimes you need to use an earlier release. For example, as of today (2019-02-28), TensorFlow does not yet work with the latest release of Python. The preferred way to use a previous version is to create a separate conda environment for each project.

To create a fresh conda environment called tensorflow with Python 3.6 and its own pip, run the following:

conda create --name tensorflow python=3.6 pip

From there you can activate the tensorflow environment and then pip or conda install whatever you need. For example:

conda activate tensorflow
conda install tensorflow
pip install ipython matplotlib

Then to return to the base environment, just run conda deactivate.

[Read More]