• 0 Posts
  • 46 Comments
Joined 11 months ago
cake
Cake day: July 30th, 2023

help-circle


  • Apologies for being late, I wanted to be as correct as I could be.

    So, straight to the point: Nextcloud by default uses plain files if you don’t configure the primary storage to be an S3/object store. As far as I can tell, this is not automatic and is an intentional change at system creation by the original admin. There is a third-party migration script, but there does not appear to be a first-party method of converting between the two. That’s very good news for you! (I think/hope)

    My instance was set up as a standalone, so I cannot speak for the all-in-one image. Poking around the root data directory (datadirectory in the config.php), I was able to locate my user account by internal username - which if you do not use LDAP will be the shortened login name. On default LDAP configs, this internal username may be a GUID, but that can be changed during the LDAP enablement process by overriding the Internal Username field in the Expert LDAP settings.

    Once in the user’s home folder in the root data directory, my subdirectory options are cache, files, files_trashbin, files_versions, uploads.

    • files contains the “live” structure of how I perceive my Nextcloud home folder in the Web UI and the Nextcloud Desktop sync engine
    • files_trashbin is an unstructured data folder containing every file that was deleted by this user and kept per the trash folder’s retention policy (this can be configured at the site level). Files retain their original name, but have a suffix added which takes the form .d######... where the numbers appear to be a Unix timestamp, likely the deletion date. A quick scan of these with the file command in Linux showed that each one had an expected file header based on its extension (i.e., a .png showed as a PNG image with an expected resolution). In the Web UI, there is metadata about which folder the file originally resided in, but I was not able to quickly identify this in the file structure. I believe this info is coming in from the SQL database.
    • files_version are how Nextcloud is storing its file version history (if enabled). Old versions are cleaned up per a set of default behaviors to keep more copies of more recent changes, up to a maximum age deletion threshold set at the site level. This folder is stored in approximately the same structure as the main files live structure, however each copy of each version is appended a suffix .v######... where the number appears to be the Unix timestamp the version was taken (*I have not verified that this exactly matches what the UI shows, nor have I read the source code that generates this). I’ve spot checked via the Linux file command and sha256 that the files in this versions structure appear to be real data - tested one Excel doc and one plain text doc.

    I think that should get a fairly rough answer to your original question, but if I left something out you’re curious about, let me know.


    Finally, I wanted to thank you for making me actually take a look at how I had decided to configure and back up my Nextcloud instance and ngl it was kind of a mess. The trash bin and versions can both get out of hand if you have frequently changing or deleting/recreating files (I have network synchronization glued onto some of my games that do not have good remote save support). Retention policy on trash and versions cleaned up extraneous data a lot, as only one of those was partially configured.

    I can see a lot of room for improvements… just gotta rip the band-aid off and make intelligent decisions rather than just slapping an rsync job that connects to the Nextcloud instance and replicates down the files and backend database. Not terrible, but not great.

    In the backend I’m already using ZFS for my files and Redis database, but my core SQL database was located on the server’s root partition (which is XFS - I’d rather not mess with a DKMS module from a boot CD if something happens and upstream borks the compile, which is precisely what happened when I upgraded to OpenZFS 2.1.15).

    I do not have automatic ZFS snapshots configured at this time, but based on the above, I’m reasonably confident that I could get data back from a ZFS snapshot if any of the normal guardrails within Nextcloud failed or did not work as intended (trash bin and internal version history). Plus, the data in that cursed rsync backup should be at least 90% functional.



  • I don’t have a full answer to snapshots right now, but I can confirm Nextcloud has VFS support on Windows. I’ve been working on a project to move myself over to it from Syno drive. Client wise, the two have fairly similar features with one exception - Nextcloud generates one Explorer sidebar object per connection, which I think Synology handles as shortcuts in the one directory. If prefer if NC did the later or allowed me to choose, but I’m happier with what I got for now.

    As for the snapshotting, you should be able to snapshot the underlying FS/DB at the same time, but I haven’t poked deeply at that. Files I believe are plain (I will disassemble my nextcloud server to confirm this tonight and update my comment), but some do preserve version history so I want to be sure before I give you final confirmation. The Nextcloud root data directory is broken up by internal user ID, which is an immutable field (you cannot change your username even in LDAP), probably because of this filesystem.

    One thing that may interest you is the external storage feature, which I’ve been working on migrating a large data set I have to:

    • can be configured per-user or system-wide
    • password can be per-user, system-wide, or re-use the login password on the fly
    • data is stored raw on an external file server - supports a bunch of protocols, off hand SMB, S3, WebDAV, FTP
    • shows up as a normal-ish folder in the base user folder
    • can template names, such as including your username as part of the share name
    • Nextcloud does not independently contribute versioning data to the backend file server, so the only version control is what your backing server natively implements

    Admin docs for reference: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html

    I use LDAP user auth to my nextcloud, with two external shares to my NAS using a pass-through session password (the NAS is AD joined to the same domain as Nextcloud uses for LDAPS). I don’t know if/how the “store password in database” option is encrypted, but if anyone knows I would be curious, because using session passwords prevents the user from sharing the folder to at least a federated destination (I tried with my friend’s NC server, haven’t tried with a local user yet but I assume the same limitations apply). If that’s your vibe, then this is a feature XD.

    One of my two external storage mounts is a “common” share with multiple users accessing the same directory, and the second share is \\nas.example.com\home\nextcloud. Internally, these I believe is handled by PHP spawning smbclient subprocesses, so if you have lots of remote files and don’t want to nuke your Nextcloud, you will probably need to increase the PHP child limits (that too me too long to solve lol)

    That funny sub-mount name above handles an edge case where Nextcloud/DAV can’t handle directories with certain characters - notably the # that Synology uses to expose their #recycle and #snapshot structures. This means that remote mount to SMB has a limitation at the moment where you can’t mount the base share of a Synology NAS that has this feature enabled. I tried a server-side Nextcloud plugin to try to filter this out before it exposed to DAV, but it was glitchy. Unsure if this was because I just had too many files for it to handle thanks to the way Synology snapshots are exposed or if it actually was something else - either way I worked around the problem for now by not ever mounting a base share of my Synology NAS. Other snapshot exposure methods may be affected - I have a ZFS TrueNAS Core, so maybe I’ll throw that at it and see if I can break Nextcloud again :P

    Edit addon: OP just so I answer your real question when I get to this again this evening - when you said that Nextcloud might not meet your needs, was your concern specifically the server-side data format? I assume from the rest of your questions that you’re concerned with data resilience and the ability to get your data back without any vendor tools - that it will just be there when you need it.



  • As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.

    Enterprise backups tend to fall along the recommendation called 3-2-1:

    • 3 copies of the data, of which
    • 2 are backups, and
    • 1 is off-site (and preferably offline)

    On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn’t have an off-site, but I do have two external hard drives connected to my NAS.

    • All devices are backed up to the NAS for fast recovery access between 1w and 24h RPO
    • The NAS backs up various parts of itself to the external hard drives every 24h
      • Data is split up by role and convenience factor - just putting stuff together like Tetris pieces, spreading out the NAS between the two drives
      • The most critical data for me to have first during a recovery is backed up to BOTH external disks
    • Coincidentally, both drives happen to be from different vendors, but I didn’t initially plan it that way, the Seagate drive was a gift and the WD drive was on sale

    Story time

    I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.

    The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That’s how much you might be willing to spend on keeping that data safe.


  • Running nextcloud (non docker version) and I don’t see near so many client updates - usually once every few weeks, which would be a reasonable expected pace. Server updates are less frequent.

    On Windows (all of my primary devices), I just install the NC client update and skip the explorer restart, pending full reboot later. Tis the nature of literally anything that deeply integrates with Explorer. I’ve seen explorer “death” during updates from several vendors that have similar explorer plugins, not just NC. Explorer sometimes just decides to nope out even without NC updating.

    Now on one device I hadn’t opened for a while, I saw NC run two updates in a row, but that was my fault for procrastinating the first one.

    Here’s the desktop release history: https://github.com/nextcloud/desktop/releases
    I don’t see a “one every day” within the block of time between Dec 6 and today, unless you had the release candidate builds which may have been more frequent in a few spots.


  • In the IT world, we just call that a server. The usual golden rule for backups is 3-2-1:

    • 3 copies of the data total, of which
    • 2 are backups (not the primary access), and
    • 1 of the backups is off-site.

    So, if the data is only server side, it’s just data. If the data is only client side, it’s just data. But if the data is fully replicated on both sides, now you have a backup.

    There’s a related adage regarding backups: “if there’s two copies of the data, you effectively have one. If there’s only one copy of the data, you can never guarantee it’s there”. Basically, it means you should always assume one copy somewhere will fail and you will be left with n-1 copies. In your example, if your server failed or got ransomwared, you wouldn’t have a complete dataset since the local computer doesn’t have a full replica.

    I recently had a a backup drive fail on me, and all I had to do was just buy a new one. No data loss, I just regenerated the backup as soon as the drive was spun up. I’ve also had to restore entire servers that have failed. Minimal data loss since the last backup, but nothing I couldn’t rebuild.

    Edit: I’m not saying what your asking for is wrong or bad, I’m just saying “backup” isn’t the right word to ask about. It’ll muddy some of the answers as to what you’re really looking for.



  • I don’t have an immediate answer for you on encryption. I know most of the communication is encrypted in flight for AD, and on disk passwords are stored hashed unless the “use reversible encryption field is checked”. There are (in Microsoft terms) gMSAs (group-managed service accounts) but other than using one for ADFS (their oath provider), I have little knowledge of how it actually works on the inside.

    AD also provides encryption key backup services for Bitlocker (MS full-partition encryption for NTFS) and the local account manager I mentioned, LAPS. Recovering those keys requires either a global admin account or specific permission delegation. On disk, I know MS has an encryption provider that works with the TPM, but I don’t have any data about whether that system is used (or where the decryptor is located) for these accounts types with recoverable credentials.

    I did read a story recently about a cyber security firm working with an org who had gotten their way all the way down to domain admin, but needed a biometric unlocked Bitwarden to pop the final backup server to “own” the org. They indicated that there was native windows encryption going on, and managed to break in using a now-patched vulnerability in Bitwarden to recover a decryption key achievable by resetting the domain admin’s password and doing some windows magic. On my DC at home, all I know is it doesn’t need my password to reboot so there’s credentials recovery somewhere.

    Directly to your question about short term use passwords: I’m not sure there’s a way to do it out of the box in MS AD without getting into some overcomplicated process. Accounts themselves can have per-OU password expiration policies that are nanosecond accurate (I know because I once accidentally set a password policy to 365 nanoseconds instead of a year), and you can even set whole account expiry (which would prevent the user from unlocking their expired password with a changed one). Theoretically, you could design/find a system that interacts with your domain to set, impound/encrypt, and manage the account and password expiration of a given set of users, but that would likely be add on software.


    1. Yes I do - MS AD DC

    2. I don’t have a ton of users, but I have a ton of computers. AD keeps them in sync. Plus I can point services like gitea and vCenter at it for even more. Guacamole highly benefits from this arrangement since I can set the password to match the AD password, and all users on all devices subsequently auto-login, even after a password change.

    3. Used to run single domain controller, now I have two (leftover free forever licenses from college). I plan to upgrade them as a tick/tock so I’m not spending a fortune on licensing frequently

    4. With native Windows clients and I believe sssd realmd joins, the default config is to cache the last hash you used to log in. So if you log in regularly to a server it should have an up to date cache should your DC cluster become unavailable. This feature is also used on corporate laptops that need to roam from the building without an always-on VPN. Enterprises will generally also ensure a backup local account is set up (and optionally auto-rotated) in case the domain becomes unavailable in a bad way so that IT can recover your computer.

    5. I used to run in homemade a Free IPA and a MS AD in a cross forest trust when I started ~5-6y ago on the directory stuff. Windows and Mac were joined to AD, Linux was joined to IPA. (I tried to join Mac to IPA but there was only a limited LDAP connector and AD was more painless and less maintenance). One user to rule them all still. IPA has loads of great features - I especially enjoyed setting my shell, sudoers rules, and ssh keys from the directory to be available everywhere instantly.

    But, I had some reliability problems (which may be resolved, I have not followed up) with the update system of IPA at the time, so I ended up burning it down and rejoining all the Linux servers to AD. Since then, the only feature I’ve lost is centralized sudo and ssh keys (shell can be set in AD if you’re clever). sssd handles six key MS group policies using libini, mapping them into relevant PAM policies so you even have some authorization that can be pushed from the DC like in Windows, with some relatively sane defaults.

    I will warn - some MS group policies violate Linux INI spec (especially service definitions and firewall rules) can coredump libini, so you should put your Linux servers in a dedicated OU with their own group policies and limited settings in the default domain policy.



  • As an IT Engineer this concept frankly terrified me and feels like your opening yourself up to a potential zero click attack - such as https://threatpost.com/apple-mail-zero-click-security-vulnerability/165238/

    So my initial answer is an emphatic “please do not the ZIP”. It could be as mundane as a ZIP bomb, or it could explain a vulnerability in the operating system or automatic extraction program. Having a human required to open the ZIP prior to its expansion reduces its attack surface area somewhat (but not eliminates it) because it allows the human to go “huh this ZIP looks funny” if something is off, rather than just dispatching an automated task.

    With that out of the way - what’s your use case with this? There has to be a specific reason your interested in saving a few clips here on one highly specific archive format, but not others like the tar unix archive, 7z, or RAR.





  • I’m probably the overkill case because I have AD+vC and a ton of VMs.

    RPO 24H for main desktop and critical VMs like vCenter, domain controllers, DHCP, DNS, Unifi controller, etc.

    Twice a week for laptops and remote desktop target VMs

    Once a week for everything else.

    Backups are kept: (may be plus or minus a bit)

    • Daily backups for a week
    • Weekly backups for a month
    • Monthly backups for a year
    • Yearly backups for 2-3y

    The software I have (Synology Active Backup) captures data using incremental backups where possible, but if it loses its incremental marker (system restore in windows, change-block tracking in VMware, rsync for file servers), it will generate a full backup and deduplicate (iirc).

    From the many times this has saved me from various bad things happening for various reasons, I want to say the RTO is about 2-6h for a VM to restore and 18 for a desktop to restore from the point at which I decide to go back to a backup.

    Right now my main limitation is my poor quad core Synology is running a little hot on the CPU front, so some of those have farther apart RPOs than I’d like.