All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…

  • catch22@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Interesting how ARPA net (the internet) was build to with stand these issues, but companies like Microsoft and Amazon (and no regulation) have completely reversed it’s original intent. I actually didn’t even notice this since I use Lemmy, and have my own internal network running home assistant, synology, emby, ect…

      • Toribor@corndog.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        This is fine as long as you politely ask everyone on the Internet to slow down and stop exploiting new vulnerabilities.

        • Ookami38@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I think vulnerabilities found count as “something broken” and chap you replied to simply did not think that far ahead hahah

          • huginn@feddit.it
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 months ago

            For real - A cyber security company should basically always be pushing out updates.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              Exactly. You don’t know what the vulnerabilities are, but the vendors pushing out updates typically do. So stay on top of updates to limit the attack surface.

              Major releases can wait, security updates should be pushed as soon as they can be proven to not break prod.

            • wreckedcarzz@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              always pushing out updates

              Notes: Version bump: Eric is a twat so I removed his name from the listed coder team members on the about window.

              git push --force

              leans back in chair productive day, productive day indeed

      • Hotzilla@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        This is AV, and even possible that it is part of definitions (for example some windows file deleted as false positive). You update those daily.

    • rozodru@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      yeah someone fucked up here. I mean I know you’re joking but I’ve been in tech for like 20+ years at this point and it was always, always, ALWAYS, drilled into me to never do updates on Friday, never roll anything out to production on Friday. Fridays were generally meant for code reviews, refactoring in test, work on personal projects, raid the company fridge for beer, play CS at the office, whatever just don’t push anything live or update anything.

      And especially now the work week has slimmed down where no one works on Friday anymore so you 100% don’t roll anything out, hell it’s getting to the point now where you just don’t roll anything out on a Thursday afternoon.

          • jedibob5@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            2 months ago

            Is the 4x10 really worth the extra day off? Tbh I’m not sure it would work very well for me… I find just one 10-hour day to be kinda draining, so doing that 4 times a week every week feels like it might just cancel out any benefits of the extra day off.

            • meanmon13@lemmy.zip
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              I am very used to it so I don’t find it draining. I tried 5x8 once and it felt more like working an extra day than getting more time in the afternoon. If that makes sense. I also start early around 7am, so I am only staying a little later than other people

        • rozodru@lemmy.ca
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          sorry :( yeah I, at most, do 3 days in the office now. Fridays are a day off and Mondays mostly everyone just works from home if at all. downtown Toronto on Mondays and Fridays is pretty much dead.

      • Blackmist@feddit.uk
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Yep, anything done on Friday can enter the world on a Monday.

        I don’t really have any plans most weekends, but I sure as shit don’t plan on spending it fixing Friday’s fuckups.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          And honestly, anything that can be done Monday is probably better done on Tuesday. Why start off your week by screwing stuff up?

          We have a team policy to never do externally facing updates on Fridays, and we generally avoid Mondays as well unless it’s urgent. Here’s roughly what each day is for:

          • Monday - urgent patches that were ready on Friday; everyone WFH
          • Tuesday - most releases; work in-office
          • Wed - fixing stuff we broke on Tuesday/planning the next release; work in-office
          • Thu - fixing stuff we broke on Tuesday, closing things out for the week; WFH
          • Fri - documentation, reviews, etc; WFH

          If things go sideways, we come in on Thu to straighten it out, but that almost never happens.

      • sasquash@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Actually I was not even joking. I also work in IT and have exactly the same opinion. Friday is for easy stuff!

    • merc@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      You posted this 14 hours ago, which would have made it 4:30 am in Austin, Texas where Cloudstrike is based. You may have felt the effect on Friday, but it’s extremely likely that the person who made the change did it late on a Thursday.

  • ari_verse@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was “massive single point of failure and security threat”, as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it’s own doing or due to others?

    I guess that we now know

  • YTG123@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    >Make a kernel-level antivirus
    >Make it proprietary
    >Don’t test updates… for some reason??

  • boaratio@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    CrowdStrike: It’s Friday, let’s throw it over the wall to production. See you all on Monday!

  • r00ty@kbin.life
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    My favourite thing has been watching sky news (UK) operate without graphics, trailers, adverts or autocue. Back to basics.

  • jedibob5@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

    • rozodru@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      It’s just amatuer hour across the board. Were they testing in production? no code review or even a peer review? they roll out for a Friday? It’s like basic level start up company “here’s what not to do” type shit that a junior dev fresh out of university would know. It’s like “explain to the project manager with crayons why you shouldn’t do this” type of shit.

      It just boggles my mind that if you’re rolling out an update to production that there was clearly no testing. There was no review of code cause experts are saying it was the result of poorly written code.

      Regardless if you’re low level security then apparently you can just boot into safe and rename the crowdstrike folder and that should fix it. higher level not so much cause you’re likely on bitlocker which…yeah don’t get me started no that bullshit.

      regardless I called out of work today. no point. it’s friday, generally nothing gets done on fridays (cause we know better) and especially today nothing is going to get done.

        • catloaf@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I’m not sure what you’d expect to be able to do in a safe mode with no disk access.

      • candybrie@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

        • Lightor@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Most companies, money included, try to roll out updates during the middle of start of a week. That way if there are issues the full team is available to address them.

        • rozodru@lemmy.ca
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Because if you roll out something to production on a friday whose there to fix it on the Saturday and Sunday if it breaks? Friday is the WORST day of the week to roll anything out. you roll out on Tuesday or Wednesday that way if something breaks you got people around to jump in and fix it.

      • Revan343@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        explain to the project manager with crayons why you shouldn’t do this

        Can’t; the project manager ate all the crayons

    • RegalPotoo@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Agreed, this will probably kill them over the next few years unless they can really magic up something.

      They probably don’t get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn’t a contract breach.

      If you are running crowdstrike, it’s probably because you have some regulatory obligations and an auditor to appease - you aren’t going to be able to just turn it off overnight, but I’m sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can’t imagine them seeing much growth

      • Skydancer@pawb.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Nah. This has happened with every major corporate antivirus product. Multiple times. And the top IT people advising on purchasing decisions know this.

        • SupraMario@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Yep. This is just uninformed people thinking this doesn’t happen. It’s been happening since av was born. It’s not new and this will not kill CS they’re still king.

      • jedibob5@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        Don’t most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.

      • Nachorella@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        They can have all the clauses they like but pulling something like this off requires a certain amount of gross negligence that they can almost certainly be held liable for.

        • IsThisAnAI@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Whatever you say my man. It’s not like they go through very specific SLA conversations and negotiations to cover this or anything like that.

        • IsThisAnAI@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          For what? At best it would be a hearing on the challenges of national security with industry.

    • ThrowawaySobriquet@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I think you’re on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.

    • Bell@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

      • sandalbucket@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

    • NaibofTabr@infosec.pub
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

      This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

      • rxxrc@lemmy.mlOP
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

        And yeah staged updates or even just… some testing? Not sure how this one slipped through.

        • dactylotheca@suppo.fi
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Not sure how this one slipped through.

          I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

          “Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

            • dactylotheca@suppo.fi
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              Certainly not! Or other industries for that matter. It’s a good thing executives everywhere aren’t just concentrating on squeezing the maximum amount of money out of their companies and funneling it to themselves and their buddies on the board.

              Sure, let’s “rightsize” the company by firing 20% of our workforce (but not management!) and raise prices 30%, and demand that the remaining employees maintain productivity at the level it used to be before we fucked things up. Oh and no raises for the plebs, we can’t afford it. Maybe a pizza party? One slice per employee though.

        • Confused_Emus@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          One of my coworkers, while waiting on hold for 3+ hours with our company’s outsourced helpdesk, noticed after booting into safe mode that the Crowdstrike update had triggered a snapshot that she was able to roll back to and get back on her laptop. So at least that’s a potential solution.

    • Munkisquisher@lemmy.nz
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Yeah saw that several steel mills have been bricked by this, that’s months and millions to restart

      • gazter@aussie.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Got a link? I find it hard to believe that a process like that would stop because of a few windows machines not booting.

          • drspod@lemmy.ml
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 months ago

            Those machines should be airgapped and no need to run Crowdstrike on them. If the process controller machines of a steel mill are connected to the internet and installing auto updates then there really is no hope for this world.

        • conciselyverbose@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          There are a lot of heavy manufacturing tools that are controlled and have their interface handled by Windows under the hood.

          They’re not all networked, and some are super old, but a more modernized facility could easily be using a more modern version of Windows and be networked to have flow of materials, etc more tightly integrated into their systems.

          The higher precision your operation, the more useful having much more advanced logs, networked to a central system, becomes in tracking quality control. Imagine after the fact, you can track some .1% of batches that are failing more often and look at the per second logs of temperature they were at during the process, and see that there’s 1° temperature variance between the 30th to 40th minute that wasn’t experience by the rest of your batches. (Obviously that’s nonsense because I don’t know anything about the actual process of steel manufacturing. But I do know that there’s a lot of industrial manufacturing tooling that’s an application on top of windows, and the higher precision your output needs to be, the more useful it is to have high quality data every step of the way.)

  • ᕙ(⇀‸↼‶)ᕗ@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    best day ever. the digitards get a wakeup call. how often have been lectured by imbeciles how great whatever dumbo closed source is. “i need photoshop”, “windows powershell and i get work done”, “azure and onedrive and teams…best shit ever”, " go use NT, nobody will use a GNU".

    yeah well, i hope every windows user would be kept of the interwebs for a year and mac users just burn in hell right away. lazy scum that justifies shitting on society for their own comfort. while everyone needs a drivers license, dumb fucking parents give tiktok to their kids…idiocracy will have a great election this winter.

      • solomon42069@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        OK, but people aren’t running Crowdstrike OS. They’re running Microsoft Windows.

        I think that some responsibility should lie with Microsoft - to create an OS that

        1. Recovers gracefully from third party code that bugs out
        2. Doesn’t allow third party software updates to break boot

        I get that there can be unforeseeable bugs, I’m a programmer of over two decades myself. But there are also steps you can take to strengthen your code, and as a Windows user it feels more like their resources are focused on random new shit no one wants instead of on the core stability and reliability of the system.

        It seems to be like third party updates have a lot of control/influence over the OS and that’s all well and good, but the equivalent of a “Try and Catch” is what they needed here and yet nothing seems to be in place. The OS just boot loops.

        • EnderMB@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          2 months ago

          It’s not just Windows, it’s affecting services that people that primarily use other OS’s rely on, like Outlook or Federated login.

          In these situations, blame isn’t a thing, because everyone knows that a LSE can happen to anyone at any time. The second you start to throw stones, people will throw them back when something inevitably goes wrong.

          While I do fundamentally agree with you, and believe that the correct outcome should be “how do we improve things so that this never happens again”, it’s hard to attach blame to Microsoft when they’re the ones that have to triage and ensure that communication is met.

          • solomon42069@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            2 months ago

            I reckon it’s hard to attach blame to Microsoft because of the culture of corporate governance and how decisions are made (without experts).

            Tech has become a bunch of walled gardens with absolute secrecy over minor nothings. After 1-2 decades of that, we have a generation of professionals who have no idea how anything works and need to sign up for $5 a month phone app / cloud services just to do basic stuff they could normally do on their own on a PC - they just don’t know how or how to put the pieces together due to inexperience / lack of exposure.

            Whether it’s corporate or government leadership, the lack of understanding of basics in tech is now a liability. It’s allowed corporations like Microsoft to set their own quality standards without any outside regulation while they are entrusted with vital infrastructure and to provide technical advisory, even though they have a clear vested interest there.

        • barsquid@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          AFAICT Microsoft is busy placing ads on everything and screen logging user activity instead of making a resilient foundation.

          For contrast: I’ve been running Fedora Atomic. I’m sure it is possible to add some kernel mod that completely breaks the system. But if there was a crash on boot, in most situations, I’d be able to roll back to the last working version of everything.