October 3, 2021 paperless-ng raspberry-pi backups
I’ve recently done some work in setting up myself a Paperless-NG instance on a a Raspberry Pi at home. Since I don’t have a lot of resources (i.e. money) to throw at it, all my data resides on a single portable hard drive; not ideal. However, being a PhD student, my university gives me 1TB of OneDrive cloud storage which I’m putting to good use for cheap, off-site data storage. Whilst this guide is for OneDrive, rclone does support 43 different remotes so you can adapt this guide to your needs.
Note: Paperless-NG is now being maintained in a new fork. See this post for more information on upgrading.
Edit 04/10/2021: Some good comments were made on that other services exist that solve this issue out-of-the-box. I like
rclone_jobberas it seamlessly does snapshots. It’s low cost and easy to set up. Please do consider other methods before mindlessly committing to mine.
First of all, install RClone with whatever package manager you’re using. For raspberry pi,
sudo apt-get install rclone will suffice.
Next we’ll have a quick look at how the aforementioned remotes work.
What we’re going to do is create the first “remote” within OneDrive (in green on the right) that is unencrypted. This means we can sync data to this folder via the rclone cli but it’s unencrypted. I use paperless for some sensitive documents so this isn’t great. Therefore we use a “crypt” remote within the OneDrive remote (as seen in red). Remotes within remotes.
Connecting to OneDrive via rclone is stupid easy. First run
rclone config on the raspberry pi:
As you can see, this opens a empty menu for us with a couple commands. Hit
n for new and type in a descriptive name such as
onedrive_uni. After this you will see roughly 43+ different remote types for you to chose. At the time of writing, the OneDrive remote was number 26.
The next 2 options are
client_secret which we can leave blank. The next options is region which will probably be “Microsoft Cloud Global” (i.e. option 1) for you.
The next options allows you to edit the advanced configuration (which I have never needed to), so leave that blank and hit enter.
This next step is different if you’re doing this on a headless machine (such as via SSH) or one with a monitor. I will go over how to do the headless setup, but this is very similar to the non-headless one. Hit
n and press enter.
As you can see, the next step requires rclone on a non-headless machine. If you’re doing the setup via SSH, this will most likely be the machine you’re physically at.
Next up is the type of configuration which is “OneDrive Personal or Business” and will (most likely) be yours too.
The next two options are left on their defaults.
After the previous steps, we will be at the home page of
rclone config. Once again hit
n for new and enter a descriptive name for the remote - I went for
As before, select the correct remote type which this time is “Encrypt/Decrypt a remote” and at the time of writing, was number 11.
The next step will ask for the location of your remote. For this guide we are focusing on OneDrive but you could chose any remote. To keep my OneDrive organised, I chose
The next two steps focusing on the levels of encryption. I usually go for “standard” for filename encryption, and “true” for directory encryption.
Be very careful in these next steps as once you’ve entered your password and salt, you cannot retrieve it again without repeating the whole process. Furthermore, if you do not backup your keys and lose the config through hard drive failure or whatnot, then all your backed up data is GONE. For this, I use bitwarden’s secure notes.
My configuration for the password and salt can be seen below. I chose 128 bit keys but you can go for whatever you like.
Once again, leave the final two options as default.
rclone_jobber is not required. A bash one-liner using
rclone sync would be enough. However, cloning the repo and using the script means consistent results across any device, plus I like how it does the backups for me.
Execute the following 3 lines
The original script uses environmental variables, but I prefer hard-coding these values. Skip these steps if you would rather not.
source="$HOME/test_rclone_data"to wherever your paperless data is kept such as
monitoring_URLequal to this URL
job_backup_to_remote.sh looks like this:
We then need to create the
filter_rules file. My current setup just excludes the
export directories and looks like:
To automate this process we need to use a scheduler, such as
cron is great as it is pre-installed on every linux OS as far as I am aware.
Begin editing the crontab with
cron -e. Add the following to the end of it:
This reads as “run
/opt/rclone_jobber/job_backup_to_remote.sh at 2:00 am every day”. You may edit this to suit your needs. Note, however, that rclone is smart enough to only update the backup with new data.
Using FOSS tools like
cron enables us to use any off-site data storage we desire. From google drive to onedrive, ssh to ftp. Paperless-ng is super powerful but what is all that worth if you lose your data. ALWAYS BACK-UP YOUR FILES.