October 3, 2021 paperless-ng raspberry-pi backups ☕️ buy me a coffee
I’ve recently done some work in setting up myself a Paperless-NG instance on a a Raspberry Pi at home. Since I don’t have a lot of resources (i.e. money) to throw at it, all my data resides on a single portable hard drive; not ideal. However, being a PhD student, my university gives me 1TB of OneDrive cloud storage which I’m putting to good use for cheap, off-site data storage. Whilst this guide is for OneDrive, rclone does support 43 different remotes so you can adapt this guide to your needs.
In this guide we will set up 2 “remotes” with RClone and then use rclone_jobber to easily back up our data.
Note: Paperless-NG is now being maintained in a new fork. See this post for more information on upgrading.
Edit 04/10/2021: Some good comments were made on that other services exist that solve this issue out-of-the-box. I like
rclone
withrclone_jobber
as it seamlessly does snapshots. It’s low cost and easy to set up. Please do consider other methods before mindlessly committing to mine.
First of all, install RClone with whatever package manager you’re using. For raspberry pi, sudo apt-get install rclone
will suffice.
Next we’ll have a quick look at how the aforementioned remotes work.
What we’re going to do is create the first “remote” within OneDrive (in green on the right) that is unencrypted. This means we can sync data to this folder via the rclone cli but it’s unencrypted. I use paperless for some sensitive documents so this isn’t great. Therefore we use a “crypt” remote within the OneDrive remote (as seen in red). Remotes within remotes.
Connecting to OneDrive via rclone is stupid easy. First run rclone config
on the raspberry pi:
As you can see, this opens a empty menu for us with a couple commands. Hit n
for new and type in a descriptive name such as onedrive_uni
. After this you will see roughly 43+ different remote types for you to chose. At the time of writing, the OneDrive remote was number 26.
The next 2 options are client_id
and client_secret
which we can leave blank. The next options is region which will probably be “Microsoft Cloud Global” (i.e. option 1) for you.
The next options allows you to edit the advanced configuration (which I have never needed to), so leave that blank and hit enter.
This next step is different if you’re doing this on a headless machine (such as via SSH) or one with a monitor. I will go over how to do the headless setup, but this is very similar to the non-headless one. Hit n
and press enter.
As you can see, the next step requires rclone on a non-headless machine. If you’re doing the setup via SSH, this will most likely be the machine you’re physically at.
Next up is the type of configuration which is “OneDrive Personal or Business” and will (most likely) be yours too.
The next two options are left on their defaults.
After the previous steps, we will be at the home page of rclone config
. Once again hit n
for new and enter a descriptive name for the remote - I went for onedrive_uni_enc
.
As before, select the correct remote type which this time is “Encrypt/Decrypt a remote” and at the time of writing, was number 11.
The next step will ask for the location of your remote. For this guide we are focusing on OneDrive but you could chose any remote. To keep my OneDrive organised, I chose onedrive_uni:backups/
.
The next two steps focusing on the levels of encryption. I usually go for “standard” for filename encryption, and “true” for directory encryption.
Be very careful in these next steps as once you’ve entered your password and salt, you cannot retrieve it again without repeating the whole process. Furthermore, if you do not backup your keys and lose the config through hard drive failure or whatnot, then all your backed up data is GONE. For this, I use bitwarden’s secure notes.
My configuration for the password and salt can be seen below. I chose 128 bit keys but you can go for whatever you like.
Once again, leave the final two options as default.
rclone_jobber
Strictly speaking, rclone_jobber
is not required. A bash one-liner using rclone sync
would be enough. However, cloning the repo and using the script means consistent results across any device, plus I like how it does the backups for me.
Execute the following 3 lines
The original script uses environmental variables, but I prefer hard-coding these values. Skip these steps if you would rather not.
rclone_jobber=$rclone_jobber
to rclone_jobber=/opt/rclone_jobber
source="$HOME/test_rclone_data"
to wherever your paperless data is kept such as source=/media/PORTABLE_UN/paperless
dest="${remote}:"
to dest=onedrive_enc:paperless
options="--filter-from=$rclone_jobber/examples/filter_rules"
to options="--filter-from=$rclone_jobber/filter_rules"
monitoring_URL
equal to this URLMy final job_backup_to_remote.sh
looks like this:
We then need to create the filter_rules
file. My current setup just excludes the consume
and export
directories and looks like:
To automate this process we need to use a scheduler, such as cron
. cron
is great as it is pre-installed on every linux OS as far as I am aware.
Begin editing the crontab with cron -e
. Add the following to the end of it:
This reads as “run /opt/rclone_jobber/job_backup_to_remote.sh
at 2:00 am every day”. You may edit this to suit your needs. Note, however, that rclone is smart enough to only update the backup with new data.
Using FOSS tools like rclone
, rclone_jobber
, and cron
enables us to use any off-site data storage we desire. From google drive to onedrive, ssh to ftp. Paperless-ng is super powerful but what is all that worth if you lose your data. ALWAYS BACK-UP YOUR FILES.