@imaybemistakenbut

imaybemistakenbut@alien.top · 11 months ago

Cool! Traditionally ML datasets tend to compress and dedupe very well so depending on the budget I would probably look at an appliance with a software stack compatible that performs this extremely well then offload to object store as you scale out.

What you are looking for is a scalable appliance and I would look to building out a requirements document first covering the basic questions such as, speed, capacity, data delta(growth over time), redundancy and uptime.

Once you delve deep into these questions you’ll be asking the right questions of how and what in relation to data flow. It will then build out baseline requirements for the technology stack you require.

When I’m scoping solutions, the destination hardware is always the last question answered as if you have it as the first question, the solution is doomed from the start.

There is no “cheap” way to get petabyte level of storage. What you will spend on hdds without dedupe and compression would cover the cost of an appliance for dedupe and compression. So a mixture between the two is probably the best approach if the growth rate of the data can be pre-conditioned by a dedupe appliance before offloaded to object storage.

imaybemistakenbut@alien.top · 11 months ago

What’s the data you’re backing up and will it dedupe/compress well? How are you backing up software wise, veeam/comvault or will you be doing standard base line rsync etc? Does it need to be an enterprise system that you can fold into the existing backup strategy of the location or is it going to be separate and utilise secondhand equipment? Is a self hosted object storage system out of the equation or are you looking for best bang for buck for hdd purchase?

imaybemistakenbut@alien.top · 1 year ago

Honestly for my home lab, I’ve gone down the mikrotik path with a CRS309-1G-8S+IN as my core, CRS326-24G-2S+RM for my edge and a couple of unmanaged 2.5gb switches with 10gb uplinks. I would warn against mikrotik if you are looking at layer 3 routing on switch though.

The reason I say this is because of experience. They don’t have enough compute power for the layer3 to really work well at high data rates. Fine for hosting external web services etc but if you’re looking for layer3 with performance in mind they aren’t a good option.

Map out what your plan is from a network architecture perspective and understand if you want onswitch layer3 routing to occur or if you want to offload this to your firewall.

After I realised the layer3 performance was a bit lousy, I re-designed my network with a virtual sophos firewall that has 10g down and uplinks plus a couple 1gb links and threw a lot of resources at that the vm for it to perform my layer3 routing duties throughout my home lab.

It’s also a better design(having your firewall do your routing) from a control and inspection perspective due to the firewall being inexpensive and giving quite a lot of flexibility with regard to traffic overview.