As a newbie, I’ve been receiving files via email, copying them to my Jupyter Notebook folder, running my script, emailing the resulting outputs back to my customer. As a prospective data scientist I’ve been feeling positively embarrassed about this ridiculously low-tech process! Thanks to my colleagues Shaun and Christine, I’ve been set onto the path of automation – thanks guys :). Here’s what I learned about this setup:
Resources
- How job scheduling works
- Making sure your bash_profile is correct
- Using runipy for command line execution
My crontab recipe (MAC OS)
- Make sure the runipy package is installed and updated
- From the Terminal command line, change to the directory where your .ipynb notebook is located and then check that you can run it successfully via the command line:
runipy MyNotebook.ipynb OutputNotebook.ipynb
The above will run the original and output the results to the new (I just like to make sure my original is definitely definitely not affected in any way – this is probably paranoid!)
- Now let’s fire up Crontab. From the command line:
env EDITOR=nano crontab -e
The Crontab window is displayed.
- Use the following format, described in detail here to construct the commands required:
* * * * * command1 && command2 && command3, etc.
My example was as follows:
14 8 * * * .~/.bash_profile && cd ~ && cd Docs && runipy viz.ipynb viz_out ipynb
- Sourcing your regular bash_profile (.~/.bash_profile) before anything else is crucial otherwise you will find that commands that normally work suddenly don’t work for reasons explained in detail here.
- Use Ctrl + O to save, and Ctrl + X to exit.
- You will receive mail via the command line! Just type mail to select the latest mail and read what the problems were, if any.
NOTE! If you can’t run your .ipynb file via the command line then it definitely won’t run in Crontab, so make sure you resolve any command line issues first and then only attempt scheduling… I know, I know – it’s obvious now… 🙂