Most programs in the AI-field are based upon Python and pytorch. So the question arises, why I’ve chosen Node.js. The answer is simple: Because most people on the web can use it. The learning-curve in this regard is - at least in my perception - lower than using Python and in the same run learning a Python-based web-framework.

LLaMA-Node provides an easy-to-use API for running llama.cpp directly within your node-process. All you have to do is install two dependencies. So you can begin by creating a package.json-file:

  "dependencies": {
    "@llama-node/llama-cpp": "^0.1.6",
    "llama-node": "^0.1.6"

Now run npm i in your console to install these dependencies. Please note, that I’m using Node 18, that provides a new readline-package.

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

import * as readline from 'node:readline/promises';
import { stdin as input, stdout as output } from 'node:process';

const rl = readline.createInterface({ input, output });

const model = path.resolve(process.cwd(), "/path/to/ggml-vic13b-uncensored-q8_0.bin");

const llama = new LLM(LLamaCpp);
const config = {
  modelPath: model,
  enableLogging: true,
  nCtx: 2048,
  nParts: -1,
  seed: 0,
  f16Kv: false,
  logitsAll: false,
  vocabOnly: false,
  useMlock: false,
  embedding: false,
  useMmap: true,
function create_prompt(question) {
  return `Convert the query-text below to a machine readable json-request. Valid requests are:
{"type": "add", "x": x, "y": y} 
for the question to add 2 numbers. Always use x and y as names for the parameters. The values have to be decimal-values.

### User: Convert the query to a machine readable json-request.
Text: ${question}
### Assistant:`;
console.log("What shall be computed?");

const run = async () => {
  let question = await rl.question('Me: ');
  await llama.load(config);

  let result = "";

  process.stdout.write("Bot: ");
  let promise = llama.createCompletion({
    nThreads: 8,
    nTokPredict: 512,
    topK: 40,
    topP: 0.1,
    temp: 0.2,
    repeatPenalty: 1,
    prompt: create_prompt(question)
  }, (response) => {
  await promise.finally(()=> {
    let query = JSON.parse(result.replace("<end>",""));
    let response = "";
    switch(query.type) {
      case "add": { response = query.x + query.y; break;}
      case "divide": { response = query.x / query.y; break;}
      case "multiply": { response = query.x * query.y; break;}
      case "subtract": { response = query.x - query.y; break;}
    console.log("Result: " + response);


This time I used a vicuna-model with 13B parameters. In my opinion it works pretty well and did the job for my test-requests. All of these models base on LLaMA and may inherit its licensing, so take care, what you do with these models. Wikipedia already lists a lot of models that are Open-Source and at a given time these will become good enough to use as well.

> Me: Please divide a million by three.
> Bot: {"type": "divide", "x": 1000000, "y": 3}
> Result: 333333.3333333333
> Me: 400 + 200?
> Bot: {"type": "add", "x": 400, "y": 200}
> Result: 600

Using German :)

> Me: eins plus zwei?
> Bot: {"type": "add", "x": 1, "y": 2}
> Result: 3

Let’s make it a little more difficult:

> Me: what's 2 meter plus 20cm?
> Bot: {"type": "add", "x": 2, "y": 20}
> Result: 22

Ok, that last one was a pipe dream.

I’ve also tried the 7b-variant of the vicuna-model, but it was not able to follow my prompts.

But we don’t have to stop right here. Let’s improve on our prompting a little. I tried adding the following:

If units are given to the numbers, please provide these like this:
{"x": {"value": <x-value>, "unit": <x-unit>}}
where <unit> is the unit of x.

Prompting again this outputs:

> Me: what's 2 meter plus 20cm?
> Bot: {"type": "add", "x": {"value": 2, "unit": "m"}, "y": {"value": 20, "unit": "cm"}}

When adding a little more JavaScript this works as well:

function scaleOf(x){
    switch(x) {
      case "m": return 1;
      case "cm": return 0.01;
    return 1;

function postProcess(x) {
    if(x instanceof Object) {
        return x.value * scaleOf(x.unit);
    return x;

let query = JSON.parse(result);
query.x = postProcess(query.x);
query.y = postProcess(query.y);

The output becomes 2.2. The return 1; at the end of scaleOf ensures the example still works in case we don’t provide any units.

Of course this example is not so impressive, but it shows that it’s not that complicated to integrate a LLM with “normal software”. You have to keep in mind, that this is all still in development, but I think it’s very interesting to see that you can already do a lot with very little effort.

I’ve also tried to improve the prompt, so that it directly gives me the correct factor for the metric prefixes, but I failed to get consistent output with both the unit and the factor.

LangChain provides another solution for JavaScript and TypeScript, but I’m not so sure, if this already provides llama.cpp-support.