I’ve been looking into self-hosting LLMs or stable diffusion models using something like LocalAI and / or Ollama and LibreChat.

Some questions to get a nice discussion going:

  • Any of you have experience with this?
  • What are your motivations?
  • What are you using in terms of hardware?
  • Considerations regarding energy efficiency and associated costs?
  • What about renting a GPU? Privacy implications?
  • rufus@discuss.tchncs.de
    cake
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 month ago

    Quite some AI questions coming up in selfhosted in the last few days…

    Here’s some more communities I’m subscribed to:

    And a few inactive ones on lemmy.intai.tech

    I’m using koboldcpp and ollama. KoboldCpp is really awesome. In terms of hardware it’s an old PC with lots of RAM but no graphics card, so it’s quite slow for me. I occasionally rent a cloud GPU instance on runpod.io Not doing anything fancy, mainly role play, recreational stuff and I occasionally ask it to give me creative ideas for something, translate something or re-word or draft an unimportant text / email.

    Have tried coding, summarizing and other stuff, but the performance of current AI isn’t enough for my everyday tasks.

    • Unforeseen@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      Thanks for the post, super appreciate the posting of other communties. I think this is a great way to grow Lemmy and create discoverability for niche communities, I’ll keep that in mind myself on future opportunities.

  • Audalin@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    Have been using llama.cpp, whisper.cpp, Stable Diffusion for a long while (most often the first one). My “hub” is a collection of bash scripts and a ssh server running.

    I typically use LLMs for translation, interactive technical troubleshooting, advice on obscure topics, sometimes coding, sometimes mathematics (though local models are mostly terrible for this), sometimes just talking. Also music generation with ChatMusician.

    I use the hardware I already have - a 16GB AMD card (using ROCm) and some DDR5 RAM. ROCm might be tricky to set up for various libraries and inference engines, but then it just works. I don’t rent hardware - don’t want any data to leave my machine.

    My use isn’t intensive enough to warrant measuring energy costs.