Using rsync for backups, because it's not shiny and new

mesa@piefed.social · 10 months ago

Using rsync for backups, because it's not shiny and new

NuXCOM_90Percent@lemmy.zip · 10 months ago

I would generally argue that rsync is not a backup solution. But it is one of the best transfer/archiving solutions.

Yes, it is INCREDIBLY powerful and is often 90% of what people actually want/need. But to be an actual backup solution you still need infrastructure around that. Bare minimum is a crontab. But if you are actually backing something up (not just copying it to a local directory) then you need some logging/retry logic on top of that.

At which point you are building your own borg, as it were. Which, to be clear, is a great thing to do. But… backups are incredibly important and it is very much important to understand what a backup actually needs to be.

tal@olio.cafe · 10 months ago

I would generally argue that rsync is not a backup solution.

Yeah, if you want to use rsync specifically for backups, you’re probably better-off using something like rdiff-backup, which makes use of rsync to generate backups and store them efficiently, and drive it from something like backupninja, which will run the task periodically and notify you if it fails.

rsync: one-way synchronization

unison: bidirectional synchronization

git: synchronization of text files with good interactive merging.

rdiff-backup: rsync-based backups. I used to use this and moved to restic, as the backupninja target for rdiff-backup has kind of fallen into disrepair.

That doesn’t mean “don’t use rsync”. I mean, rsync’s a fine tool. It’s just…not really a backup program on its own.

neidu3@sh.itjust.works · edit-2 10 months ago

+1 for rdiff-backup. Been using it for 20 years or so, and I love it.

melfie@lemy.lol · 10 months ago

Having a synced copy elsewhere is not an adequate backup and snapshots are pretty important. I recently had RAM go bad and my most recent backups had corrupt data, but having previous snapshots saved the day.

melfie@lemy.lol · 10 months ago

Don’t understand the downvotes. This is the type of lesson people have learned from losing data and no sense in learning it the hard way yourself.

tomenzgg@midwest.social · 10 months ago

How would you pin down something like this? If it happened to me, I expect I just wouldn’t understand what’s going on.

melfie@lemy.lol · edit-2 10 months ago

I originally thought it was one of my drives in my RAID0 array that was failing, but I noticed copying data was yielding btrfs corruption errors on both drives that could not be fixed with a scrub and I was also getting btrfs corruption errors on the root volume as well. I figured it would be quite an odd coincidence if my main SSD and 2 hard disks all went bad and I happened upon an article talking about how corrupt data can also occur if the RAM is bad. So, I installed and booted into Memtester86+ and it immediately started showing errors on the single 16Gi stick I was using. I happened to have a spare stick that was a different brand, and that one passed the memory test with flying colors. After that, all the corruption errors went away and everything has been working perfectly ever since.

I will also say that legacy file systems like ext4 with no checksums wouldn’t even complain about corrupt data. I originally had ext4 on my main drive and at one point thought my OS install went bad, so I reinstalled with btrfs on top of LUKS and saw I was getting corruption errors on the main drive at that point, so it occurred to me that 3 different drives could not have possibly had a hardware failure and something else must be going on.

So, I’m quite convinced at this point that RAID is not a backup, even with the abilities of btrfs to self-heal, and simply copying data elsewhere is not a backup, because something like bad RAM in both cases can destroy data during the copying process, whereas older snapshots in the cloud will survive such a hardware failure. Older data backed up that wasn’t coped with faulty RAM may be fine as well, but you’re taking a chance that a recent update may overwrite good data with bad data. I was previously using Rclone for most backups while testing Restic with daily, weekly, and monthly snapshots for a small subset of important data the last few months. After finding some data that was only recoverable in a previous Restic snapshot, I’ve since switched to using Restic exclusively. I was mainly concerned about the space requirements of keeping historical snapshots, and I’m still working on tweaking retention policies and taking separate snapshots of different directories with different retention policies according risk tolerance for each directory I’m backing up. For some things, I think even btrfs local snapshots would suffice.

koala@programming.dev · 10 months ago

Beware rdiff-backup. It certainly does turn rsync (not a backup program) into a backup program.

However, I used rdiff-backup in the past and it can be a bit problematic. If I remember correctly, every “snapshot” you keep in rdiff-backup uses as many inodes as the thing you are backing up. (Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.

But it does make rsync a backup solution; a snapshot or a redundant copy is very useful, but it’s not a backup.

(OTOH, rsync is still wonderful for large transfers.)

tal@olio.cafe · 10 months ago

Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.

I think that you may be thinking of rsnapshot rather than rdiff-backup which has that behavior; both use rsync.

But I’m not sure why you’d be concerned about this behavior.

Are you worried about inode exhaustion on the destination filesystem?

koala@programming.dev · 10 months ago

Huh, I think you’re right.

Before discovering ZFS, my previous backup solution was rdiff-backup. I have memories of it being problematic for me, but I may be wrong in my remembering of why it caused problems.

non_burglar@lemmy.world · 10 months ago

I use rsync and a pruning script in crontab on my NFS mounts. I’ve tested it numerous times breaking containers and restoring them from backup. It works great for me at home because I don’t need anything older than 4 monthly, 4 weekly, and 7 daily backups.

However, in my job I prefer something like bacula. The extra features and granularity of restore options makes a world of difference when someone calls because they deleted prod files.

tal@olio.cafe · 10 months ago

I don’t know if there’s a term for them, but Bacula (and I think AMANDA might fall into this camp, but I haven’t looked at it in ages) are oriented more towards…“institutional” backup. Like, there’s a dedicated backup server, maybe dedicated offline media like tapes, the backup server needs to drive the backup, etc).

There are some things that rsnapshot, rdiff-backup, duplicity, and so forth won’t do.

At least some of them (rdiff-backup, for one) won’t dedup files with different names. If a file is unchanged, it won’t use extra storage, but it won’t identify different identical files at different locations. This usually isn’t all that important for a single host, other than maybe if you rename files, but if you’re backing up many different hosts, as in an institutional setting, they likely files in common. They aren’t intended to back up multiple hosts to a single, shared repository.
Pull-only. I think that it might be possible to run some of the above three in “pull” mode, where the backup server connects and gets the backup, but where they don’t have the ability to write to the backup server. This may be desirable if you’re concerned about a host being compromised, but not the backup server, since it means that an attacker can’t go dick with your backups. Think of those cybercriminals who encrypt data at a company and wipe other copies and then demand a ransom for an unlock key. But the “institutional” backup systems are going to be aimed at having the backup server drive all this, and have the backup server have access to log into the individual hosts and pull the backups over.
Dedup for non-identical files. Note that restic can do this. While files might not be identical, they might share some common elements, and one might want to try to take advantage of that in backup storage.
rdiff-backup and rsnapshot don’t do encryption (though duplicity does). If one intends to use storage not under one’s physical control (e.g. “cloud backup”), this might be a concern.
No “full” backups. Some backup programs follow a scheme where one periodically does a backup that stores a full copy of the data, and then stores “incremental” backups from the last full backup. All rsnapshot, rdiff-backup, and duplicity are always-incremental, and are aimed at storing their backups on a single destination filesystem. A split between “full” and “incremental” is probably something you want if you’re using, say, tape storage and having backups that span multiple tapes, since it controls how many pieces of media you have to dig up to perform a restore.
I don’t know how Bacula or AMANDA handle it, if at all, but if you have a DBMS like PostgreSQL or MySQL or the like, it may be constantly receiving writes. This means that you can’t get an atomic snapshot of the database, which is critical if you want to be reliably backing up the storage. I don’t know what the convention is here, but I’d guess either using filesystem-level atomic snapshot support (e.g. btrfs) or requiring the backup system to be aware of the DBMS and instructing it to suspend modification while it does the backup. rsnapshot, rdiff-backup, and duplicity aren’t going to do anything like that.

I’d agree that using the more-heavyweight, “institutional” backup programs can make sense for some use cases, like if you’re backing up many workstations or something.

mesa@piefed.social · 10 months ago

Ive personally used rsync for backups for about…15 years or so? Its worked out great. An awesome video going over all the basics and what you can do with it.

Eldritch@piefed.world · 10 months ago

And I generally enjoy Veronica’s presentation. Knowledgable and simple.

mesa@piefed.social · 10 months ago

Her https://tinkerbetter.tube/w/ffhBwuXDg7ZuPPFcqR93Bd made me learn a new way of looking at data. There was some tricks I havent done before. She has such good videos.

Eldritch@piefed.world · 10 months ago

Yep, I found her through YouTube. Her and action retro’s content is always great.with some Adrian black on the side.

overload@sopuli.xyz · 10 months ago

Veronica is fantastic. Love her video editing, it reminds me more of the early days of YouTube.

Eager Eagle@lemmy.world · edit-2 10 months ago

It works fine if all you need is transfer, my issue with it it’s just not efficient. If you want a “time travel” feature, your only option is to duplicate data. Differential backups, compression, and encryption for off-site ones is where other tools shine.

suicidaleggroll@lemmy.world · 10 months ago

If you want a “time travel” feature, your only option is to duplicate data.

Not true. Look at the --link-dest flag. Encryption, sure, rsync can’t do that, but incremental backups work fine and compression is better handled at the filesystem level anyway IMO.

Eager Eagle@lemmy.world · edit-2 10 months ago

Isn’t that creating hardlinks between source and dest? Hard links only work on the same drive. And I’m not sure how that gives you “time travel”, as in, browsing snapshots or file states at the different times you ran rsync.

Edit: ah the hard link is between dest and the link-dest argument, makes more sense.

I wouldn’t bundle fs and backup compression in the same bucket, because they have vastly different reqs. Backup compression doesn’t need to be optimized for fast decompression.

state_electrician@discuss.tchncs.de · 10 months ago

Why videos? I feel like an old man yelling at clouds every time something that sounds interesting is presented in a fucking video. Videos are so damn awful. They take time, I need audio and I can’t copy&paste. Why have they become the default for things that should’ve been a blog post?

czardestructo@lemmy.world · 10 months ago

Thank you for putting into words what ive subconsciously been thinking for years. Every search result prioritizes videos at the top and I’m still annoyed every time. Or even worst I have to hunt through a 10 minute video for the 30 seconds of info I needed. Stoohhhhpppp internet of new! Make it good again!

Wawe@lemmy.world · edit-2 10 months ago

They linked blog post with the video: https://vkc.sh/everyday-rsync/

vga@sopuli.xyz · 10 months ago

Ad money.

kchr@lemmy.sdf.org · 10 months ago

Hear hear. Knowledge should be communicated in an easily shareable way that can also be archived as easily, in contrast to a video requiring hundreds of MB:s.

northernlights@lemmy.today · 10 months ago

Especially for a command line tool

Matthew@midwest.social · 10 months ago

man rsync

sugar_in_your_tea@sh.itjust.works · 10 months ago

Yeah it’s slow

What’s slow about async? If you have a reasonably fast CPU and are merely syncing differences, it’s pretty quick.

pathief@lemmy.world · 10 months ago

It’s single thread, one file at a time.

sugar_in_your_tea@sh.itjust.works · 10 months ago

That would only matter if it’s lots of small files, right? And after the initial sync, you’d have very few files, no?

Rsync is designed for incremental syncs, which is exactly what you want in a backup solution. If your multithreaded alternative doesn’t do a diff, rsync will win on larger data sets that don’t have rapid changes.

quick_snail@feddit.nl · 10 months ago

It’s slow?!?

okamiueru@lemmy.world · 10 months ago

That part threw me off. Last time i used it, I did incremental backups of a 500 gig disk once a week or so, and it took 20 seconds max.

HereIAm@lemmy.world · 10 months ago

Compared to something multi threaded, yes. But there are obviously a number of bottlenecks that might diminish the gains of a multi threaded program.

Tja@programming.dev · 10 months ago

With xargs everything is multithreaded.

ominous ocelot@leminal.space · 10 months ago

rsnapshot is a script for the purpose of repeatedly creating deduplicated copies (hardlinks) for one or more directories. You can chose how many hourly, daily, weekly,… copies you’d like to keep and it removes outdated copies automatically. It wraps rsync and ssh (public key auth) which need to be configured before.

SayCyberOnceMore@feddit.uk · 10 months ago

Hardlinks need to be on the same filesystem, don’t they? I don’t see how that would work with a remote backup…?

solrize@lemmy.ml · 10 months ago

I’ve been using borg because of the backend encryption and because the deduplication and snapshot features are really nice. It could be interesting to have cross-archive deduplication but maybe I can get something like that by reorganizing my backups. I do use rsync for mirroring and organizing downloads, but not really for backups. It’s a synchronization program as the name implies, not really intended for backups.

cmgvd3lw@discuss.tchncs.de · 10 months ago

I think Arch wiki recommends rsync for backups

surph_ninja@lemmy.world · 10 months ago

Use borg/borgmatic for your backups. Use rsync to send your differentials to your secondary & offsite backup storage.

calliope@retrolemmy.com · 10 months ago

Tangentially, I don’t see people talk about rclone a lot, which is like rsync for cloud storage.

It’s awesome for moving things from one provider to another, for example.

David Vasandani@social.coop · 10 months ago

@calliope It’s also great for local or remote backups over ssh, smb, etc.

calliope@retrolemmy.com · 10 months ago

It has been remarkably useful! I keep trying to tell people about it but apparently I am just their main use case or something.

I would have loved it when I was using Samba to share files on my local network decades ago. It’s like a Swiss Army knife!

Landless2029@lemmy.world · 10 months ago

I tried rclone once because I wanted to sync a single folder from documents and freaked out when it looked like it was going to purge all documents except for my targeted folder.

Then I just did it via the portal…

calliope@retrolemmy.com · 10 months ago

rsync can sometimes look similarly scary! I very clearly remember triple-checking what it’s doing.

rclone works amazingly well if you have hundreds of folders or thousands of files and you can’t be bothered to babysit a portal.

Eldritch@piefed.world · 10 months ago

It’s fine. But yes in the Linux space. We tend to want to host ourselves. Not have to trust some administrator of some cloud we don’t know/trust.

TehNomad@piefed.social · 10 months ago

rclone does support other protocols besides S3. You can also selfhost your own S3 storage.

i_stole_ur_taco@lemmy.ca · 10 months ago

The thing I hate most about rsync is that I always fumble to get the right syntax and flags.

This is a problem because once it’s working I never have to touch it ever again because it just works and keeping working. There’s not enough time to memorize the usage.

NuXCOM_90Percent@lemmy.zip · 10 months ago

One trick that one of my students taught me a decade or so ago is to actually make an alias to list the useful flags.

Yes, a lot of us think we are smart and set up aliases/functions and have a huge list of them that we never remember or, even worse, ONLY remember. What I noticed her doing was having something like goodman-rsync that would just echo out a list of the most useful flags and what they actually do.

So nine times out of 10 I just want rsync -azvh --progress ${SRC} ${DEST} but when I am doing something funky and am thinking “I vaguely recall how to do this”? dumbman rsync and I get a quick cheat sheet of what flags I have found REALLY useful in the past or even just explaining what azvh actually does without grepping past all the crap I don’t care about in the man page. And I just keep that in the repo of dotfiles I copy to machines I work on regularly.

muix@lemmy.sdf.org · 10 months ago

tldr and atuin have been my main way of remembering complex but frequent flag combinations

oddlyqueer@lemmy.ml · 10 months ago

This is why I still don’t know sed and awk syntax lol. I eventually get the data in the shape I need and then move on, and never imprint how they actually work. Still feel like a script kiddie every time I use them (so once every few years).

tal@olio.cafe · 10 months ago

sed can do a bunch of things, but I overwhelmingly use it for a single operation in a pipeline: the s// operation. I think that that’s worth knowing.

sed 's/foo/bar/'

will replace all the first text in each line matching the regex “foo” with “bar”.

That’ll already handle a lot of cases, but a few other helpful sub-uses:

sed 's/foo/bar/g'

will replace all text matching regex “foo” with “bar”, even if there are more than one per line

sed 's/\([0-9a-f]*\)/0x\1/g

will take the text inside the backslash-escaped parens and put that matched text back in the replacement text, where one has ‘\1’. In the above example, that’s finding all hexadecimal strings and prefixing them with ‘0x’

If you want to match a literal “/”, the easiest way to do it is to just use a different separator; if you use something other than a “/” as separator after the “s”, sed will expect that later in the expression too, like this:

sed 's%/%SLASH%g

will replace all instances of a “/” in the text with “SLASH”.

ryper@lemmy.ca · 10 months ago

I was planning to use rsync to ship several TB of stuff from my old NAS to my new one soon. Since we’re already talking about rsync, I guess I may as well ask if this is right way to go?

Suburbanl3g3nd@lemmings.world · 10 months ago

I couldn’t tell you if it’s the right way but I used it on my Rpi4 to sync 4tb of stuff from my Plex drive to a backup and set a script up to have it check/mirror daily. Took a day and a half to copy and now it syncs in minutes tops when there’s new data

GreenKnight23@lemmy.world · 10 months ago

yes, it’s the right way to go.

rsync over ssh is the best, and works as long as rsync is installed on both systems.

qjkxbmwvz@startrek.website · 10 months ago

On low end CPUs you can max out the CPU before maxing out network—if you want to get fancy, you can use rsync over an unencrypted remote shell like rsh, but I would only do this if the computers were directly connected to each other by one Ethernet cable.

SayCyberOnceMore@feddit.uk · 10 months ago

It depends

rsync is fine, but to clarify a little further…

If you think you’ll stop the transfer and want it to resume (and some data might have changed), then yep, rsync is best.

But, if you’re just doing a 1-off bulk transfer in a single run, then you could use other tools like xcopy / scp or - if you’ve mounted the remote NAS at a local mount point - just plain old cp

The reason for that is that rsync has to work out what’s at the other end for each file, so it’s doing some back & forwards communications each time which as someone else pointed out can load the CPU and reduce throughput.

(From memory, I think Raspberry Pi don’t handle large transfers over scp well… I seem to recall a buffer gets saturated and the throughput drops off after a minute or so)

Also, on a local network, there’s probably no point in using encryption or compression options - esp. for photos / videos / music… you’re just loading the CPU again to work out that it can’t compress any further.

atk007@lemmy.world · 10 months ago

Rsnapshot. It uses rsync, but provides snapshot management and multiple backup versioning.

Tja@programming.dev · 10 months ago

Yes, but a few hours writing my own scripts will save me from several minutes of reading its documentation…

1984@lemmy.today · 10 months ago

I never thought of it as slow. More like very reliable. I dont need my data to move fast, I need it to be copied with 100% reliability.

vext01@lemmy.sdf.org · 10 months ago

I used to use rsnapshot, which is a thin wrapper around rsync to make it incremental, but moved to restic and never looked back. Much easier and encrypted by default.

Jessica@discuss.tchncs.de · 10 months ago

If you’re trying to back up Windows OS drives for some reason, robocopy works quite similarly to rsync.

SayCyberOnceMore@feddit.uk · 10 months ago

Ah… robocopy… that’s a great tool