Google’s Colab was greeted with all sorts of hype when it was first publicly released in early 2018. After originally being quite excited about it, I wrote a short post with a few tips for new users, which covered taking advantage of the free GPU runtime, installing additional third-party Python libraries, and uploading and using data files to your Colab environment.
Well, like every novelty, Colab’s excitement wore off a bit after the initial euphoria. However, after dipping back into the books and needing a stable notebook environment which I could access and share seamlessly between my notebook, workstation, and Chromebook, I decided to give Colab another look. It turned out to be a good decision; I have been regularly using Colab for the past few months for all of my learning-related coding.
This post is a second entry in the short-but-hopefully-useful Google Colab environment tips series, and includes 3 more things I’ve learned while managing my own Colab coding environment while learning. I stress that this is what I am using for my learning, no mission-critical projects, and I am primarily using Colab as I can switch between my various machines seamlessly, while still having access to GPUs for training (and even TPUs).
Note that some of these are plain vanilla Jupyter tricks, so don’t @ me.
0. Get Colab out of the browser
OK, this isn’t really a Colab tip, but first get Colab out of the browser. If you’re like me, your tab situation is sub-optimal. Sullying that up any further with both the Colab management interface and a bunch of notebooks won’t help, so run Colab as its own standalone app. This is OS-dependent, but involves having the Colab “app” installed in your Chrome browser, and selecting both “Open as window” and “Create shortcuts…” from the app context menu, after which you need to find the shortcut and use it to open the app in its own window.
That’s it; now you open Colab in its very own window, like in the post header image, from its very own icon. I know, not really on topic, but still useful.
1. Download a file to local computer
This is another simple one, but important enough to mention. A use case: you have created a Keras model and want to visualize the model. You call plot_model
to create a PNG file, but since Colab virtual machines don’t give you permanency in file storage, you want to download the image. The following excerpt does it:
Execute the cell and a pop up dialog prompts you for a download location. Which leads us to…
1(b). Display an image inline
Yeah, it’s elementary, but it took me a few to remember that I was working with more or less a regular Jupyter environment with Colab. So, to display the above image inline, use:
A quick intuitive modification will allow you to inline all sorts of other files as you would expect. Lesson: remember you are using a plain old Jupyter notebook, more or less. OK, on to Colab stuff.
2. Access your Google Drive filesystem
Let’s say I want to save that model image file to my Google Drive instead of my local computer. There are all sorts of ways to get files in and out of Google Drive. I find this is the most straightforward way of getting CSV data, for example, out of Google Drive.
After clicking the link and entering the authorization code, you can access your drive as follows:
Sure, it isn’t permanent, but it isn’t much work, and seems more straightforward than using any of the other options, in my view (and is no less permanent). If you use something like AutoKey, a desktop automation and text expansion tool, then often-used code excerpts and commands like this become even more trivial anyhow.
Back on topic: now you can save files to (or get files from) Google Drive. Easy, so long as you are comfortable in the terminal… which you should be anyways. Either work with the data file where it is or move or copy it up a few directory levels to the Colab VM root. Since it disappears from here after dropping your instance, however, it makes more sense to me to just read the data from where it sits in the CSV file in the filesystem:
3. Use custom libraries and modules stored in Google Drive
So, what if you have custom Python libraries or modules you want to import into Colab projects?
For example, I have a folder I called ‘my_modules’ in my Colab directory I store common .py files I want access to in Colab. I don’t want to store them on GitHub, and they aren’t clean files I want to otherwise have to make clean and share with others. Let’s say they are simple a collection of helper modules I have gotten used to using myself; data loader functions, data cleaning functions, and the like.
Any files like this I store in a Dropbox folder with the same name which I sync with the Google Drive version. This way I can use the file both inside Colab and outside. I have direct OS access to Google Drive contents — natively on Chrome, and via ocamlfuse on Ubuntu — and can make use of them starting with the same Google Drive filesystem access trick in the above point.
This snippet is useful. Say I have a module called naive_sharding.py
in my my_modules
directory. Since it is a number of directory levels down, here is the easiest way for me to leave the file where it is and import it within Colab:
That’s it; the naive_sharding.py
module has been imported, and is ready to use.
Modifying some of the above code snippets, you can see how easy it would be to get, say, model weights into and out of a Colab environment quite trivially. And so with the short notes above, and an outside-the-box thinking, you can accomplish quite a bit with Google Colab, despite what numerous naysayers might have you believe. Given there is zero setup, and Chromebook access is likewise trivial, I have recently found Colab to be an ideal coding tool.
Don’t forget to read the first 3 tips and tricks found here.
Related: