Dieser Blogpost ist auch auf Deutsch verfügbar

With Llama.cpp, Georgi Gerganov has greatly simplified the process of running LLMs locally. There are also a number of other tools and frameworks that make this possible, such as gpt4all. The barrier to entry has now been lowered significantly— for instance with whisper.cpp, also from Georgi. If you want to hear about it in his own words, have a listen to this Changelog episode an.

llama.cpp

Now let’s get started with the guide to trying out an LLM locally:

git clone [email protected]:ggerganov/llama.cpp.git
cd llama.cpp
make

First, you need an appropriate model, ideally in ggml format. ggml is a C++ library that allows you to run LLMs on just the CPU. On Intel and AMDs processors, this is relatively slow, however. Too slow for my tastes, but it can be done with some patience. If I have understood correctly, it runs considerably faster on M1 Macs because the AI acceleration of the CPU can be used in that case.

Anyway, back to the model. I used llama.cpp to give gpt4-x-alpaca-13b a try. After downloading, the following line can be entered in the checked-out directory. The path to the model must be adjusted as needed. I basically snagged the parameters from somewhere without giving it much thought:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -m /path/to/model/file/ggml-model-q4_1.bin

A test output looks like this:

main: seed = 1681807028
llama.cpp: loading model from /media/davidk/big/ai/gpt-x-alpaca-13b-ggml/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: '### Human:'
Reverse prompt: '### Instruction:

'
sampling: temp = 0.000000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.200000
generate: n_ctx = 2048, n_batch = 8, n_predict = -1, n_keep = 2

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.
<!DOCTYPE html>
<html>
<head>
    <title>Page Title</title>
    <link rel="stylesheet" href="style.css">
</head>
<body>
    <nav>
        <ul>
            <li>Home</li>
            <li>About Us</li>
            <li>Products</li>
            <li>Contact Us</li>
        </ul>
    </nav>
    <div>
        <h1>Welcome to Our Site!</h1>
        <p>Learn more about our products and services.</p>
    </div>
    <footer>
            <p>Copyright &copy; 2020. All Rights Reserved.</p>
    </footer>
</body>
</html>

This is followed by a bunch of empty lines. Whatever that means. :) But, you can cancel with CTRL-C and enter a new prompt:

> Please generate a corresponding CSS-File for this page!
body {
    background-color: white;
    font-family: 'Open Sans', sans-serif;
    font-size: 16px;
    color: #3333;
    line-height: 1.5;
    padding: 32px;
}

nav {
    float: left;
    width: 100%;
    list-style: none;
    padding: 16px;
}

.footer {
    padding: 16px;
    color: #3333;
    font-size: 12px;
}

.copyright {
    float: right;
}

.copyright a {
    color: #3333;
    text-decoration: none;
    font-weight: bold;
    font-size: 12px;
}
....

A model with 30 billion parameters generates:

<!DOCTYPE html>
<html>
<head>
    <title>My Page</title>
    <link rel="stylesheet" href="style.css">
</head>
<body>
    <header>
        <h1>My Page</h1>
    </header>
    <nav>
        <ul>
            <li><a href="#">Home</a></li>
            <li><a href="#">About</a></li>
            <li><a href="#">Contact</a></li>
        </ul>
    </nav>
    <main>
        <article>
            <h2>Main Content</h2>
            <p>This is the main content of the page.</p>
        </article>
    </main>
    <footer>
        <p>Copyright &copy; 2020</p>
    </footer>
</body>
</html>

and

body {
    font-family: 'Open Sans', sans-serif;
    font-size: 16px;
    color: #333;
    background-color: #fff;
    margin: 0;
    padding: 0;
}

header {
    background-color: #333;
    padding: 10px;
    color: #fff;
}

nav {
    float: left;
    width: 100%;
    padding: 10px;
    background-color: #333;
    color: #fff;
}

nav ul {
    list-style-type: none;
    margin: 0;
    padding: 0;
}

nav ul li {
    display: inline-block;
    margin-right: 10px;
}

nav ul li a {
    color: #fff;
    text-decoration: none;
}

main {
    padding: 10px;
    background-color: #fff;
}

article {
    padding: 10px;
    margin: 0;
}

article h2 {
    font-size: 24px;
    font-weight: bold;
    margin: 0;
}

article p {
    margin: 0;
}

footer {
    background-color: #333;
    padding: 10px;
    color: #fff;
}

footer p {
    margin: 0;
}

That does look quite convincing, but it also took a seriously long time.

gpt4all

gpt4all is based on llama.cpp. They created a fork and have been working on it from there. In other words, the programs are no longer compatible, at least at the moment.

gpt4all also links to models that are available in a format similar to ggml but are unfortunately incompatible. My first attempt was with a patched version of llama.cpp, which also runs in the terminal. I tried that out like so:

git clone [email protected]:nomic-ai/gpt4all.git
cd gpt4all/chat
./gpt4all-lora-quantized-linux-x86 -m /path/to/model/gpt4all-lora-quantized.bin

The file gpt4all-lora-quantized.bin can be found on this page or obtained directly from here. This is a model with 6 billion parameters. That makes it significantly smaller than the one above, and the difference is easy to see: it runs much faster, but the quality is also considerably worse.

> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.
Here's your requested page! The code is below in plain text format so you can easily copy it into an editor or CMS of your choice to customize as needed.
> Please output the code!
<html><head>
  <title></title>
</head>
<body>
	<!-- Menu -->
    <!-- Footer -->
</body>
</html>.

This response is generated by a simple Python script that reads in an HTML file and generates the corresponding plain text code. The resulting output can then be copied into any editor or CMS for further customization as needed!

However, gpt4all also offers an installer with another model that was presumably trained on other sources. In this case, there is also a graphical interface.

The prompt from above produced very nice results:

> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.

Sure, I can generate an HTML-page with menu, footer, etc. Here's the code:
<!DOCTYPE html>
<html>
<head>
  <title>My HTML Page</title>
  <style>
    /* Style for menu */
    #menu {
      position: absolute;
      top: 0;
      left: 0;
      width: 100%;
      height: 100%;
      background-color: #333;
      color: #fff;
    }
    #menu ul {
      list-style: none;
      margin: 0;
      padding: 0;
    }
    #menu ul li {
      float: left;
      padding: 10px;
    }
    #menu ul li a {
      display: block;
      color: #fff;
      text-decoration: none;
      font-size: 20px;
      font-family: Arial, sans-serif;
    }
    #menu ul li a:hover {
      background-color: #555;
    }
    /* Style for footer */
    #footer {
      position: absolute;
      bottom: 0;
      left: 0;
      width: 100%;
      height: 50px;
      background-color: #555;
      color: #fff;
    }
    #footer p {
      margin: 0;
      padding: 10px;
    }
  </style>
</head>
<body>
  <div id="menu">
    <ul>
      <li><a href="#">Home</a></li>
      <li><a href="#">About</a></li>
      <li><a href="#">Contact</a></li>
    </ul>
  </div>
  <div id="footer">
    <p>&copy; My Website</p>
  </div>
</body>
</html>

Opinions on style may differ, of course.

dalai

Here is another frontend from the world of javascript. But I haven’t tried it out yet.

Oobabooga

The text-generation-webui is supposed to be a hassle-free tool similar to AUTOMATIC1111 for stable diffusion. Unfortunately, I can’t run that (yet) since I don’t have an NVidia card, only an AMD, which doesn’t support CUDA.

Licenses / rights

Unfortunately, the license/rights situation is generally a bit hazy, as far as I can see. There are tons of models lying around on huggingface, some of which indicate very free licenses. I have a hard time believing that it’s all on the up and up.

I was pleased to see that a developer of AI platforms trained a GPT-LLM himself that can be found here. However, this is still missing the “Alpaca” / ChatGPT features. In other words, it is “only” a text completer. But I’d bet there will be more in store here in the future.