Major IT outage affecting banks, airlines, media outlets across the world

rxxrc@lemmy.ml · edit-2 2 months ago

Major IT outage affecting banks, airlines, media outlets across the world

catch22@programming.dev · 2 months ago

Interesting how ARPA net (the internet) was build to with stand these issues, but companies like Microsoft and Amazon (and no regulation) have completely reversed it’s original intent. I actually didn’t even notice this since I use Lemmy, and have my own internal network running home assistant, synology, emby, ect…

UncleArthur@lemmy.world · 2 months ago

Annoyingly, my laptop seems to be working perfectly.

Valmond@lemmy.world · 2 months ago

That’s the burden when you run Arch, right?

Damage@slrpnk.net · 2 months ago

lol he said it’s working

jaybone@lemmy.world · 2 months ago

He said it’s working annoyingly.

sasquash@sopuli.xyz · 2 months ago

never do updates on a Friday.

spyd3r@sh.itjust.works · 2 months ago

Never update unless something is broken.

Toribor@corndog.social · 2 months ago

This is fine as long as you politely ask everyone on the Internet to slow down and stop exploiting new vulnerabilities.

Ookami38@sh.itjust.works · 2 months ago

I think vulnerabilities found count as “something broken” and chap you replied to simply did not think that far ahead hahah

huginn@feddit.it · 2 months ago

For real - A cyber security company should basically always be pushing out updates.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Exactly. You don’t know what the vulnerabilities are, but the vendors pushing out updates typically do. So stay on top of updates to limit the attack surface.

Major releases can wait, security updates should be pushed as soon as they can be proven to not break prod.

wreckedcarzz@lemmy.world · 2 months ago

always pushing out updates

Notes: Version bump: Eric is a twat so I removed his name from the listed coder team members on the about window.

git push --force

leans back in chair productive day, productive day indeed

Passerby6497@lemmy.world · 2 months ago

That’s advice so smart you’re guaranteed to have massive security holes.

iknowitwheniseeit@lemmynsfw.com · 2 months ago

BTW, I use Arch.

Nachorella@lemmy.sdf.org · 2 months ago

If it was Arch you’d update once every 15 minutes whether anything’s broken or not.

sugar_in_your_tea@sh.itjust.works · 2 months ago

I use Tumbleweed, so I only get updates once/day, twice if something explodes. I used to use Arch, so my update cycle has lengthened from 1-2x/day to 1-2x/week, which is so much better.

wreckedcarzz@lemmy.world · 2 months ago

gets two update notifications

Ah, must be explosion Wednesday

Nachorella@lemmy.sdf.org · 2 months ago

I really like the tumbleweed method, seems like the best compromise between arch and debian style updates.

Hotzilla@sopuli.xyz · 2 months ago

This is AV, and even possible that it is part of definitions (for example some windows file deleted as false positive). You update those daily.

rozodru@lemmy.ca · 2 months ago

yeah someone fucked up here. I mean I know you’re joking but I’ve been in tech for like 20+ years at this point and it was always, always, ALWAYS, drilled into me to never do updates on Friday, never roll anything out to production on Friday. Fridays were generally meant for code reviews, refactoring in test, work on personal projects, raid the company fridge for beer, play CS at the office, whatever just don’t push anything live or update anything.

And especially now the work week has slimmed down where no one works on Friday anymore so you 100% don’t roll anything out, hell it’s getting to the point now where you just don’t roll anything out on a Thursday afternoon.

Trailblazing Braille Taser@lemmy.dbzer0.com · 2 months ago

And especially now the work week has slimmed down where no one works on Friday anymore

Excuse me, what now? I didn’t get that memo.

meanmon13@lemmy.zip · 2 months ago

Yeah it’s great :-) 4 10hr shifts and every weekend is a 3 day weekend

jedibob5@lemmy.world · edit-2 2 months ago

Is the 4x10 really worth the extra day off? Tbh I’m not sure it would work very well for me… I find just one 10-hour day to be kinda draining, so doing that 4 times a week every week feels like it might just cancel out any benefits of the extra day off.

meanmon13@lemmy.zip · 2 months ago

I am very used to it so I don’t find it draining. I tried 5x8 once and it felt more like working an extra day than getting more time in the afternoon. If that makes sense. I also start early around 7am, so I am only staying a little later than other people

rozodru@lemmy.ca · 2 months ago

sorry :( yeah I, at most, do 3 days in the office now. Fridays are a day off and Mondays mostly everyone just works from home if at all. downtown Toronto on Mondays and Fridays is pretty much dead.

Blackmist@feddit.uk · 2 months ago

Yep, anything done on Friday can enter the world on a Monday.

I don’t really have any plans most weekends, but I sure as shit don’t plan on spending it fixing Friday’s fuckups.

sugar_in_your_tea@sh.itjust.works · 2 months ago

And honestly, anything that can be done Monday is probably better done on Tuesday. Why start off your week by screwing stuff up?

We have a team policy to never do externally facing updates on Fridays, and we generally avoid Mondays as well unless it’s urgent. Here’s roughly what each day is for:

Monday - urgent patches that were ready on Friday; everyone WFH
Tuesday - most releases; work in-office
Wed - fixing stuff we broke on Tuesday/planning the next release; work in-office
Thu - fixing stuff we broke on Tuesday, closing things out for the week; WFH
Fri - documentation, reviews, etc; WFH

If things go sideways, we come in on Thu to straighten it out, but that almost never happens.

sasquash@sopuli.xyz · 2 months ago

Actually I was not even joking. I also work in IT and have exactly the same opinion. Friday is for easy stuff!

merc@sh.itjust.works · 2 months ago

You posted this 14 hours ago, which would have made it 4:30 am in Austin, Texas where Cloudstrike is based. You may have felt the effect on Friday, but it’s extremely likely that the person who made the change did it late on a Thursday.

ari_verse@lemm.ee · 2 months ago

A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was “massive single point of failure and security threat”, as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it’s own doing or due to others?

I guess that we now know

ansiz@lemmy.world · 2 months ago

All of the security vendors do it over enough time. McAfee used to be the king of them.

https://www.zdnet.com/article/defective-mcafee-update-causes-worldwide-meltdown-of-xp-pcs/

https://www.bleepingcomputer.com/news/security/trend-micro-antivirus-modified-windows-registry-by-mistake-how-to-fix/

https://www.techradar.com/news/microsoft-releases-fix-for-botched-windows-defender-update-but-its-still-facing-problems

SupraMario@lemmy.world · 2 months ago

No bad actors did this, and security goes in fads. Crowdstrike is king right now, just as McAfee/Trellix was in the past. If you want to run around without edr/xdr software be my guest.

Saik0@lemmy.saik0.com · 2 months ago

If you want to run around without edr/xdr software be my guest.

I don’t think anyone is saying that… But picking programs that your company has visibility into is a good idea. We use Wazuh. I get to control when updates are rolled out. It’s not a massive shit show when the vendor rolls out the update globally without sufficient internal testing. I can stagger the rollout as I see fit.

SupraMario@lemmy.world · 2 months ago

You can do this with CS as well, but the dumbasses where pushing major updates via channel files which aren’t for that. They tried to squeak by without putting out a major update via the sensor updates which you can control. Basically they fucked up their own structure because a bunch of people where complaining and more than likely management got involved and overwrote best practices.

YTG123@sopuli.xyz · 2 months ago

>Make a kernel-level antivirus
>Make it proprietary
>Don’t test updates… for some reason??

boaratio@lemmy.world · 2 months ago

CrowdStrike: It’s Friday, let’s throw it over the wall to production. See you all on Monday!

misk@sopuli.xyz · 2 months ago

My work PC is affected. Nice!

wreckedcarzz@lemmy.world · 2 months ago

Plot twist: you’re head of IT

R00bot@lemmy.blahaj.zone · 2 months ago

Same! Got to log off early 😎

Munkisquisher@lemmy.nz · 2 months ago

Dammit, hit us at 5pm on Friday in NZ

BigRedUndead@sh.itjust.works · 2 months ago

4:00PM here in Aus. Absolutely perfect for an early Friday knockoff.

Magnolia_@lemmy.ca · 2 months ago

Noice!

r00ty@kbin.life · 2 months ago

My favourite thing has been watching sky news (UK) operate without graphics, trailers, adverts or autocue. Back to basics.

jedibob5@lemmy.world · 2 months ago

Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

rozodru@lemmy.ca · 2 months ago

It’s just amatuer hour across the board. Were they testing in production? no code review or even a peer review? they roll out for a Friday? It’s like basic level start up company “here’s what not to do” type shit that a junior dev fresh out of university would know. It’s like “explain to the project manager with crayons why you shouldn’t do this” type of shit.

It just boggles my mind that if you’re rolling out an update to production that there was clearly no testing. There was no review of code cause experts are saying it was the result of poorly written code.

Regardless if you’re low level security then apparently you can just boot into safe and rename the crowdstrike folder and that should fix it. higher level not so much cause you’re likely on bitlocker which…yeah don’t get me started no that bullshit.

regardless I called out of work today. no point. it’s friday, generally nothing gets done on fridays (cause we know better) and especially today nothing is going to get done.

skittle07crusher@sh.itjust.works · 2 months ago

Was it not possible for MS to design their safe mode to still “work” when Bitlocker was enabled? Seems strange.

catloaf@lemm.ee · 2 months ago

I’m not sure what you’d expect to be able to do in a safe mode with no disk access.

candybrie@lemmy.world · 2 months ago

Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

Lightor@lemmy.world · 2 months ago

Most companies, money included, try to roll out updates during the middle of start of a week. That way if there are issues the full team is available to address them.

rozodru@lemmy.ca · 2 months ago

Because if you roll out something to production on a friday whose there to fix it on the Saturday and Sunday if it breaks? Friday is the WORST day of the week to roll anything out. you roll out on Tuesday or Wednesday that way if something breaks you got people around to jump in and fix it.

debil@lemmy.world · 2 months ago

And hence the term read-only Friday.

Revan343@lemmy.ca · 2 months ago

explain to the project manager with crayons why you shouldn’t do this

Can’t; the project manager ate all the crayons

RegalPotoo@lemmy.world · 2 months ago

Agreed, this will probably kill them over the next few years unless they can really magic up something.

They probably don’t get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn’t a contract breach.

If you are running crowdstrike, it’s probably because you have some regulatory obligations and an auditor to appease - you aren’t going to be able to just turn it off overnight, but I’m sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can’t imagine them seeing much growth

Skydancer@pawb.social · 2 months ago

Nah. This has happened with every major corporate antivirus product. Multiple times. And the top IT people advising on purchasing decisions know this.

SupraMario@lemmy.world · 2 months ago

Yep. This is just uninformed people thinking this doesn’t happen. It’s been happening since av was born. It’s not new and this will not kill CS they’re still king.

jedibob5@lemmy.world · edit-2 2 months ago

Don’t most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.

IsThisAnAI@lemmy.world · 2 months ago

What lawsuits do you think are going to happen?

Nachorella@lemmy.sdf.org · 2 months ago

They can have all the clauses they like but pulling something like this off requires a certain amount of gross negligence that they can almost certainly be held liable for.

IsThisAnAI@lemmy.world · 2 months ago

Whatever you say my man. It’s not like they go through very specific SLA conversations and negotiations to cover this or anything like that.

Nachorella@lemmy.sdf.org · 2 months ago

I forgot that only people you have agreements with can sue you. This is why Boeing hasn’t been sued once recently for their own criminal negligence.

IsThisAnAI@lemmy.world · 2 months ago

👌👍

Cryophilia@lemmy.world · 2 months ago

Forget lawsuits, they’re going to be in front of congress for this one

IsThisAnAI@lemmy.world · 2 months ago

For what? At best it would be a hearing on the challenges of national security with industry.

ThrowawaySobriquet@lemmy.world · 2 months ago

I think you’re on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.

Cryophilia@lemmy.world · 2 months ago

The London Stock Exchange went down. They’re fukd.

Wooki@lemmy.world · 2 months ago

Testing is production will do that

This is fine🔥🐶☕🔥@lemmy.world · 2 months ago

Not everyone is fortunate enough to have a seperate testing environment, you know? Manglement has to cut cost somewhere.

Blisterexe@lemmy.zip · 2 months ago

Manglement is the good term lmao

Bell@lemmy.world · 2 months ago

Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

wizardbeard@lemmy.dbzer0.com · edit-2 2 months ago

This didn’t go through Windows Update. It went through the ctowdstrike software directly.

sandalbucket@lemmy.world · 2 months ago

Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

NaibofTabr@infosec.pub · 2 months ago

If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

rxxrc@lemmy.ml · 2 months ago

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

And yeah staged updates or even just… some testing? Not sure how this one slipped through.

dactylotheca@suppo.fi · 2 months ago

Not sure how this one slipped through.

I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

“Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

GoofSchmoofer@lemmy.world · 2 months ago

At least nothing like this happens in the airline industry

dactylotheca@suppo.fi · 2 months ago

Certainly not! Or other industries for that matter. It’s a good thing executives everywhere aren’t just concentrating on squeezing the maximum amount of money out of their companies and funneling it to themselves and their buddies on the board.

Sure, let’s “rightsize” the company by firing 20% of our workforce (but not management!) and raise prices 30%, and demand that the remaining employees maintain productivity at the level it used to be before we fucked things up. Oh and no raises for the plebs, we can’t afford it. Maybe a pizza party? One slice per employee though.

Confused_Emus@lemmy.dbzer0.com · 2 months ago

One of my coworkers, while waiting on hold for 3+ hours with our company’s outsourced helpdesk, noticed after booting into safe mode that the Crowdstrike update had triggered a snapshot that she was able to roll back to and get back on her laptop. So at least that’s a potential solution.

Munkisquisher@lemmy.nz · 2 months ago

Yeah saw that several steel mills have been bricked by this, that’s months and millions to restart

gazter@aussie.zone · 2 months ago

Got a link? I find it hard to believe that a process like that would stop because of a few windows machines not booting.

This is fine🔥🐶☕🔥@lemmy.world · 2 months ago

a few windows machines with controller application installed

That’s the real kicker.

drspod@lemmy.ml · 2 months ago

Those machines should be airgapped and no need to run Crowdstrike on them. If the process controller machines of a steel mill are connected to the internet and installing auto updates then there really is no hope for this world.

This is fine🔥🐶☕🔥@lemmy.world · 2 months ago

But daddy microshoft says i gotta connect the system to the internet uwu

wizardbeard@lemmy.dbzer0.com · 2 months ago

No, regulatory auditors have boxes that need checking, regardless of the reality of the technical infrastructure.

Munkisquisher@lemmy.nz · 2 months ago

I work in an environment where the workstations aren’t on the Internet there’s a separate network, there’s still a need for antivirus and we were hit with bsod yesterday

conciselyverbose@sh.itjust.works · 2 months ago

There are a lot of heavy manufacturing tools that are controlled and have their interface handled by Windows under the hood.

They’re not all networked, and some are super old, but a more modernized facility could easily be using a more modern version of Windows and be networked to have flow of materials, etc more tightly integrated into their systems.

The higher precision your operation, the more useful having much more advanced logs, networked to a central system, becomes in tracking quality control. Imagine after the fact, you can track some .1% of batches that are failing more often and look at the per second logs of temperature they were at during the process, and see that there’s 1° temperature variance between the 30th to 40th minute that wasn’t experience by the rest of your batches. (Obviously that’s nonsense because I don’t know anything about the actual process of steel manufacturing. But I do know that there’s a lot of industrial manufacturing tooling that’s an application on top of windows, and the higher precision your output needs to be, the more useful it is to have high quality data every step of the way.)

ᕙ(⇀‸↼‶)ᕗ@lemm.ee · 2 months ago

best day ever. the digitards get a wakeup call. how often have been lectured by imbeciles how great whatever dumbo closed source is. “i need photoshop”, “windows powershell and i get work done”, “azure and onedrive and teams…best shit ever”, " go use NT, nobody will use a GNU".

yeah well, i hope every windows user would be kept of the interwebs for a year and mac users just burn in hell right away. lazy scum that justifies shitting on society for their own comfort. while everyone needs a drivers license, dumb fucking parents give tiktok to their kids…idiocracy will have a great election this winter.

rottingleaf@lemmy.world · 2 months ago

Servers on Windows? Even domain controllers can be Linux-based.

bdonvr@thelemmy.club · 2 months ago

Huh, so that’s why the office couldn’t order pizza last night lmfao

kamenoko@sh.itjust.works · 2 months ago

AWS No!!!

Oh wait it’s not them for once.

uuhhhhmmmm@sh.itjust.works · 2 months ago

I think we’re getting a lot of pictures for !pbsod@lemmy.ohaa.xyz

2 months ago

And subscribed!

solomon42069@lemmy.world · 2 months ago

Why is no one blaming Microsoft? It’s their non resilient OS that crashed.

blackn1ght@feddit.uk · 2 months ago

Probably because it’s a Crowdstrike issue, they’ve pushed a bad update.

solomon42069@lemmy.world · 2 months ago

OK, but people aren’t running Crowdstrike OS. They’re running Microsoft Windows.

I think that some responsibility should lie with Microsoft - to create an OS that

Recovers gracefully from third party code that bugs out
Doesn’t allow third party software updates to break boot

I get that there can be unforeseeable bugs, I’m a programmer of over two decades myself. But there are also steps you can take to strengthen your code, and as a Windows user it feels more like their resources are focused on random new shit no one wants instead of on the core stability and reliability of the system.

It seems to be like third party updates have a lot of control/influence over the OS and that’s all well and good, but the equivalent of a “Try and Catch” is what they needed here and yet nothing seems to be in place. The OS just boot loops.

EnderMB@lemmy.world · edit-2 2 months ago

It’s not just Windows, it’s affecting services that people that primarily use other OS’s rely on, like Outlook or Federated login.

In these situations, blame isn’t a thing, because everyone knows that a LSE can happen to anyone at any time. The second you start to throw stones, people will throw them back when something inevitably goes wrong.

While I do fundamentally agree with you, and believe that the correct outcome should be “how do we improve things so that this never happens again”, it’s hard to attach blame to Microsoft when they’re the ones that have to triage and ensure that communication is met.

solomon42069@lemmy.world · edit-2 2 months ago

I reckon it’s hard to attach blame to Microsoft because of the culture of corporate governance and how decisions are made (without experts).

Tech has become a bunch of walled gardens with absolute secrecy over minor nothings. After 1-2 decades of that, we have a generation of professionals who have no idea how anything works and need to sign up for $5 a month phone app / cloud services just to do basic stuff they could normally do on their own on a PC - they just don’t know how or how to put the pieces together due to inexperience / lack of exposure.

Whether it’s corporate or government leadership, the lack of understanding of basics in tech is now a liability. It’s allowed corporations like Microsoft to set their own quality standards without any outside regulation while they are entrusted with vital infrastructure and to provide technical advisory, even though they have a clear vested interest there.

lanolinoil@lemmy.world · 2 months ago

banks wouldn’t use something that black box. just trust me bro wouldn’t be a good pitch

catloaf@lemm.ee · 2 months ago

If you trust banks that much, I have very bad news for you.

barsquid@lemmy.world · 2 months ago

AFAICT Microsoft is busy placing ads on everything and screen logging user activity instead of making a resilient foundation.

For contrast: I’ve been running Fedora Atomic. I’m sure it is possible to add some kernel mod that completely breaks the system. But if there was a crash on boot, in most situations, I’d be able to roll back to the last working version of everything.

Major IT outage affecting banks, airlines, media outlets across the world

Major IT outage affecting banks, airlines, media outlets across the world

'Completely unprecedented' outage causes havoc with IT systems across globe — as it happened