How to use Ollama LLM with AMB82 mini

geofrancis · November 30, 2025, 1:08pm

void sendollama() {

 // String model = "moondream:1.8b";

    String model = "llava:34b";

  const char *myDomain = "10.0.0.190";

  String getResponse = "", Feedback = "";

  Serial.println("Connect to " + String(myDomain));


  if (client2.connect(myDomain, 11434)) {

    Serial.println("Connection successful");

     Camera.getImage(0, &img_addr, &img_len);

       Serial.println("Image Capture");

    uint8_t *fbBuf = (uint8_t *)img_addr;

    uint32_t fbLen = img_len;

    char *input = (char *)fbBuf;

    char output[base64_enc_len(3)];

    String imageFile = "data:image/jpeg;base64,";

    for (uint32_t i = 0; i < fbLen; i++) {

      base64_encode(output, (input++), 3);

      if (i % 3 == 0) {

        imageFile += String(output);

      }

    }

    String Data = "{\"model\": \"" + model + "\", \"messages\": [{\"role\": \"user\",\"content\": [{ \"type\": \"text\", \"text\": \"" + message + "\"},{\"type\": \"image_url\", \"image_url\": {\"url\": \"" + imageFile + "\"}}]}]}";

    Serial.println("POST");

    client2.println("POST /v1/chat/completions HTTP/1.1");

    client2.println("Host: " + String(myDomain));

    client2.println("Authorization: Bearer " + ollama_key);

    client2.println("Content-Type: application/json; charset=utf-8");

    client2.println("Content-Length: " + String(Data.length()));

    client2.println("Connection: close");

    Serial.println("Close");

    client2.println();




    unsigned int Index;

    for (Index = 0; Index < Data.length(); Index = Index + 1024) {

      client2.print(Data.substring(Index, Index + 1024));

    }




    Serial.println("Receive");

    startTime = millis();

    boolean state = false;

    boolean markState = false;





    while ((startTime + waitTime) > millis()) {

      Serial.print(".");

        delay(1000);

    }

    while (client2.available()) {

      char c = client2.read();

      if (String(c) == "{") {

        markState = true;

      }

      if (state == true && markState == true) {

        Feedback += String(c);

      }

      if (c == '\n') {

        if (getResponse.length() == 0) {

          state = true;

        }

        getResponse = "";

      } else if (c != '\r') {

        getResponse += String(c);

      }

      // startTime = millis();

    }

    if (Feedback.length() > 0) {

      // break;

    }

    // }

    Serial.println();

    client2.stop();

It works but there are still some improvements i would like to make, the main one is a way of detecting if the reply is ready , at the moment its just a delay but if you dont wait long enough you just get a NULL response.

KevinKL · December 2, 2025, 7:43am

Hi @geofrancis ,

Thanks for your feedback. We encourage you to contribute your example code into our Arduino SDK via pull request. Pull requests · Ameba-AIoT/ameba-arduino-pro2 · GitHub Do let us know if you require any assistance.

JimmyChin · December 2, 2025, 3:09pm

Hi @geofrancis,

Interesting project! Using AMB82-Mini to capture images and feed them to LLaVA - basically a poor man’s GPT-4 Vision on an edge device

Saw your question about detecting when the response is ready, here are some thoughts:

1. Use Streaming Mode

Ollama API supports streaming. Add "stream”: true to your request:

cpp
String Data = "{\"model\": \"" + model + "\", \"messages\": [...], \"stream\": true}";
```

Then read character by character - when you see `"done": true`, you know it's finished.

**2. Check HTTP Response Header**

Parse the `Content-Length` header before reading the body:

```cpp
while (client2.connected()) {
    String line = client2.readStringUntil('\n');
    if (line.startsWith("Content-Length:")) {
        int contentLength = line.substring(16).toInt();
        // Now you know how much to read
    }
    if (line == "\r") break;  // End of headers
}
```

**3. Use Timeout Instead of Fixed Delay**

Instead of fixed `delay()`, try dynamic timeout:

```cpp
unsigned long timeout = millis() + 30000;  // 30 second timeout
while (millis() < timeout) {
    if (client2.available()) {
        char c = client2.read();
        // Process data...
        timeout = millis() + 5000;  // Reset timeout when data received
    }
    delay(10);
}
```

**4. Try a Lighter Model**

`llava:34b` responds slowly. For faster responses:
- `llava:7b` - Good balance between speed and quality
- `moondream:1.8b` - The one you commented out, fast but less accurate
- `bakllava` - Optimized for vision tasks

**5. JSON End Detection**

You can also use brace matching to detect when JSON is complete:

```cpp
int braceCount = 0;
bool inString = false;
while (client2.available()) {
    char c = client2.read();
    if (c == '"') inString = !inString;
    if (!inString) {
        if (c == '{') braceCount++;
        if (c == '}') braceCount--;
    }
    response += c;
    if (braceCount == 0 && response.length() > 0) break;
}
```

---

Also, Base64 encoding increases data size by ~33%, so keep an eye on image resolution given the limited RAM.

Nice experiment, good luck!

geofrancis · December 2, 2025, 4:46pm

this is what im sending, where do i add streaming?

String Data = "{\"model\": \"" + model + "\",\"stream\": true, \"messages\": [{\"role\": \"user\",\"content\": [{ \"type\": \"text\", \"text\": \"" + message + "\"},{\"type\": \"image_url\", \"image_url\": {\"url\": \"" + imageFile + "\"}}]}]}";

JimmyChin · December 3, 2025, 4:28am

Hi @geofrancis,

It’s actually just a few small changes:

1. Add "stream”: true to your JSON:

cpp
String Data = "{\"model\": \"" + model + "\", \"stream\": true, \"messages\": ...
```

**2. Replace your fixed delay loop:**
```cpp
// Remove this:
while ((startTime + waitTime) > millis()) {
    Serial.print(".");
    delay(1000);
}

// Replace with:
unsigned long timeout = millis() + 60000;
while (millis() < timeout) {
    while (client2.available()) {
        char c = client2.read();
        Feedback += c;
        timeout = millis() + 5000;  // Reset when data received

        if (Feedback.indexOf("[DONE]") >= 0) {
            break;
        }
    }
    delay(10);
}
```

Since you're using `/v1/chat/completions`, the stream ends with `data: [DONE]`.

geofrancis · December 3, 2025, 12:40pm

that didnt work it just sends far too much data, it takes minutes per message to get though as it sends one letter at a time per message




e1


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"'"},"finish_reason":null}]}






e1


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"s"},"finish_reason":null}]}






eb


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" visibility"},"finish_reason":null}]}






e1


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}






e4


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" The"},"finish_reason":null}]}






e4


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" top"},"finish_reason":null}]}






e8


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" portion"},"finish_reason":null}]}






e8


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" appears"},"finish_reason":null}]}






e7


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" darker"},"finish_reason":null}]}






e1


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}






e1


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]}






e2


data: {"id":"chatcmpl-864","object":"chat.completion.chunk","created":1764765477,"model":"gemma3:12b","system_fi

Topic		Replies	Views
[Tutorial] Generative AI Vision ✨ Arduino rtl8735b , share	3	141	March 29, 2025
Using local server for AI llava Vision AI	7	115	October 29, 2025
AMB82 mini- ObjectDetection AI rtl8735b , evb-amb82	1	77	October 21, 2024
Error: vipnn not applied on amb82 mini AI rtl8735b , evb-amb82	5	359	February 14, 2024
Ameba Mini keeps crashing/resetting every 3-5 minutes AI evb-amb82	2	111	August 12, 2024

How to use Ollama LLM with AMB82 mini

Related topics