AmebaD + RTL8720DN SSL socket data receive problem

Hi,
I use AmebaD SDK where I enabled SSL sockets and control them via AT commands. I use external HTTP library on my host to download some file from HTTP/HTTPS server. When I download it via TCP socket, everything goes well. When I do it through SSL socket (requests and download flow from host point of view is the same - I use the same AT commands) file is downloaded to about 97% and then ATPR command returns that there is no more data.

When downloading files from the same server through browser, or my own python script everything works perfectly. I also tested different files with different sizes and no matter the file size if it’s 300kB or 500kB the download stops at about 97%.

I made some tests regarding different ATPR recv sizes and here’s what I discovered:

			dl_test.txt - Content Length: 457800 B, ATPR buffer size: 256 B
			dl_test: No more data! Remaining length: 15583, bytes received: 442217, missing: 3,40%.
			
			dl_test.txt - Content Length: 457800 B, ATPR buffer size: 2048 B
			dl_test: No more data! Remaining length: 13791, bytes received: 444009, missing: 3,01%.
			
			dl_test.txt - Content Length: 457800 B, ATPR buffer size: 3072 B
			dl_test: No more data! Remaining length: 12767, bytes received: 445033, missing: 2,78%.
			
			dl_test.txt - Content Length: 457800 B, ATPR buffer size: 3996 B
			dl_test: No more data! Remaining length: 11843, bytes received: 445957, missing: 2,58%.

Seems like download is failing at the same point each time, remaining length difference between each cases is the same as buffer size difference (15583-13791 == |256 - 2048| etc.) File contains only text data (repeating “DownloadTest1234\r\n” through whole file).

For SSL support I use MBEDTLS.

Any hints on what might be the reason or what should I check or test?

Hi @kilimanjaro please share your device info and software info first, so we can better assist you, for example,

  • Board model
  • SDK version and where it’s from

@xidameng
I use BW16 module and release branch (or master as it is named now) that was recommended to me here RTL8720DN/BW16 firmware boot problem. Atcmd version is v2.2.1 and SDK version is v3.5.

I made some modifications to this SDK including:

  • Support for UART ATcmds and SSL via ATcmds (defines and pin changes only)

  • Support for SSL client example (defines only) and provided my own certificate

  • Extended LOG_SERVICE_BUFLEN to 4200 and STACKSIZE to 4800 of log_service to support larger buffers

  • Changes in ATSO command to support HTTPS OTA update

That’s everything that was changed and the only place I touched SSL is providing my own cert for MQTT connection which shouldn’t affect download process.

  • Extended LOG_SERVICE_BUFLEN to 4200 and STACKSIZE to 4800 of log_service to support larger buffers
  • is LOG_SERVICE_BUFLEN that which is defined in platform_opts.h?
  • In which file is the stacksize of log_service changed?
  • can you provide more details on the commands you run in the testing process? from my testing, ATPR only receives data once, up to the buffer limit. Are you using the auto receive mode ATPK for downloading the file?
  • Yes, it’s in platform_opts.h
  • \component\common\api\at_cmd\log_service.c line 467
  • I run ATPC to open SSL socket, then ATPT for http requests and ATPR for polling for data. I don’t use ATPK at all.
    When I download files this way through TCP socket everything works perfectly.

Unfortunately I can’t use ATPK as I have multi-threaded host with MQTT connection on one thread and the other thread running HTTP download both using Realtek sockets. As specified in AN0075 document about ATPK: received data will return to host without any information in the head so I woudln’t be able to distinguish which connection returned data.

@kilimanjaro

It looks like the error is caused by some MbedTLS behaviour. I am testing using ATPC to open a SSL socket, and using ATPK to receive a large data file sent from the server side, and I am seeing that it stops before the file contents are completely transferred. I guess this is the same thing you are seeing.

I am not sure what is happening, I plan to enable TLS debug, then capture and decrypt the TLS traffic in wireshark to see the details.

Seems like the same thing, looking forward to hearing from you about tests results.

@kilimanjaro

After some testing, this is what I have found:

  • A file of size 1501 bytes will be received entirely, using either ATPR or ATPK
  • A file exceeding 1501 bytes will also be received entirely, but the data exceeding 1501 bytes seem to be stuck in a buffer somewhere, until another transmission occurs from the server. After this transmission, using ATPR once will first clear out the remaining data from the previous file, using ATPR a second time will retrieve the data from the new transmission. If the data sent in the new transmission also exceeds 1501 bytes, the process repeats.

The value of 1501 may be specific only to my build, but I think the overall behavior should be the same as yours. My understanding is that repeated ATPR calls should clear out the buffers, so this is definitely wrong and should be investigated.

As a stop-gap measure, perhaps you could try sending the server a dummy request after the download, just to get the server to send a short response, and see if this gets you the missing data.

Thanks, I’ll test that workaround. Will you investigate further to find the root cause of this problem?

yes, the data is getting to through to the TCP layer. It is getting stuck somewhere in the TLS layer.

@wyy
I tested your workaround and it works, thanks for this suggestion. Any updates on TLS layer investigation?

@kilimanjaro

thanks for letting me know it works, that would confirm that we are seeing the same issue, and that a solution would probably work for both of us.

I am trying to enable LwIP debugging, to compare the debug messages between the TCP layer and TLS layer to narrow down where the data is stuck at.

@wyy any updates, are you still investigating?

Since I last looked at it, from the LwIP debug messages, it seemed like the IP layer also experienced the same phenomenon, i.e. I had no idea where the data is getting stuck at, since LwIP also reported no missing data.
Unfortunately, that is at the limit of the source code that is accessible to me. I have forwarded the issue to internal developers for investigation, but any progress on the issue is now out of my control. I must apologize that I would be probably unable to provide any detailed updates.

Ok, thanks for passing the subject further on. Where will I be able to find information about the prepared solution when it is available?

I will post updates as I receive them

1 Like