Google researchers have come out with a new paper that warns that generative AI is ruining vast swaths of the internet with fake content — which is painfully ironic because Google has been hard at work pushing the same technology to its enormous user base.
LLM is the insanely productive content creator. We can’t say how much of the web is generated by it at any moment (and that’s ignoring older copypaste articles), but the organic material one wants to prioritise in machine learning gets significantly reduced. This tech, if not isolated from it’s learning material, is predictably falling into a feedback loop, and at each cycle it is going to get worse.
Surprisingly, pre LLM-boom datasets can probably become more valuable than contemporary ones.
I remember reading that from 2021-2023, LLMs generated more text than all humans had published combined - so arguably, actually human generated text is going to be a rarity
Garbage in, garbage out