• 1 Post
  • 11 Comments
Joined 1 year ago
cake
Cake day: June 13th, 2023

help-circle
  • The project creator doesn’t mince words:

    wordfreq was built by collecting a whole lot of text in a lot of languages. That used to be a pretty reasonable thing to do, and not the kind of thing someone would be likely to object to. Now, the text-slurping tools are mostly used for training generative AI, and people are quite rightly on the defensive. If someone is collecting all the text from your books, articles, Web site, or public posts, it’s very likely because they are creating a plagiarism machine that will claim your words as its own.

    So I don’t want to work on anything that could be confused with generative AI, or that could benefit generative AI.

    OpenAI and Google can collect their own damn data. I hope they have to pay a very high price for it, and I hope they’re constantly cursing the mess that they made themselves.


  • Okay, but (as per the article) the allegedly-“top” court that made the ruling, the European Union’s General Court (EGC), is not the same as the court that the lawsuit would be appealed to, the European Court of Justice (ECJ). How can the EGC be the “top” court if the ECJ is above it?

    Besides, the bottom line is that saying “the top court ruled on this” strongly implies that it’s a final decision, but that’s not the case here. Regardless of the details of which court does what, that’s misleading and therefore clickbait. Don’t write headlines telling me it’s hopeless when there’s actually hope!



  • In general, you’re not wrong in your summary of how the Web developed. The problem is, though, that you seem to be assuming that since the Web did develop that way, that it had to develop that way. I disagree with that: I think other possibilities existed and might have been viable or even dominant if the dice of fate/random chance had happened to land differently. (And I think that they would’ve been much more likely to be viable or even dominant if some of the regulatory environment had been different, e.g. if residential ISPs hadn’t been allowed to get away with things like drastically asymmetric connections and prohibiting users from running servers. More enforcement of accessibility and standards compliance, instead of tolerating companies deliberately abusing things like Flash and Javascript to unduly restrict users, would’ve also gone a long way.)

    and make it look/function the same across different screens and different brands of computers.

    That was not only totally optional, but also arguably considered harmful. HTML was intended to leave presentation up to the client to a certain extent, by design. Megalomaniacal marketers and graphic designers demanding to have pixel-perfect control and doing a bunch of dirty hacks (e.g. abusing <table> for page layout instead of tabular data) to achieve it were fundamentally Doing It Wrong.

    But I do wonder if anyone is thinking about how foss replacements and competition will gain any ground because honestly they either pay the bills with donations and ads, or they charge a subscription fee because these things cost money to run.

    Or they implement a distributed architecture that offloads the bandwidth and storage costs to users directly, a la Bittorrent, IPFS, Freenet, etc.