Bug 39332 - Using Socket.ReceiveFrom() with UDP hangs and receiving socket gets packet endlessly
Summary: Using Socket.ReceiveFrom() with UDP hangs and receiving socket gets packet en...
Status: IN_PROGRESS
Alias: None
Product: Class Libraries
Classification: Mono
Component: System (show other bugs)
Version: 4.2.0 (C6)
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Alexis Christoforides
URL:
Depends on:
Blocks:
 
Reported: 2016-03-03 20:18 UTC by David Evans
Modified: 2016-05-23 23:09 UTC (History)
2 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Simple command line app to reproduce the problem (12.29 KB, text/plain)
2016-03-03 20:22 UTC, David Evans
Details

Description David Evans 2016-03-03 20:18:30 UTC
Versions: 

Ubuntu 14.04.1
Mono versions 4.0.4.1, 4.2.2.30, and 4.3.2.467

Summary:

We use some SimpleSNMP code to talk to an uninterrupt-able power supply, and in automated testing we have a simulated SNMP server class that listens to a UDP socket and responds to requests with simulated SNMP responses. Rarely, but reproducible, the client side SimpleSNMP code would hang in socket.ReceiveFrom(). Receive timeout is not respected and the client thread hangs.

Workaround:

Instead use socket.Connect(), then socket.Send() and socket.Receive(). If I use those variants then I cannot reproduce the problem. So I have a workaround for now, but I bet most people writing or porting UDP code are going to default to SendTo/ReceiveFrom for those kinds of sockets since Connect() is more of a TCP pattern.

Details:

The client thread is basically:
- create socket and set Send/Receive timeouts
- sock.SendTo() to send a 42 byte SNMP request packet 
- sock.ReceiveFrom() to get the response

The server thread is basically:
- server = new UdpClient() and setup send/receive timeouts on the underlying Client socket
- await UdpClient.ReceiveAsync();
- for each request received, sends response with server.Send()

They communicate over a localhost port.

Sometimes I can loop this for 100,000 packets without issue. But more often we'll get the hang. Sometimes the hang occurs within just a few thousand iterations and it doesn't seem to fail more often with more iterations, feels more like it is racy / randomly distributed when it will hang.

Also oddly, when the problem occurs the server thread sees that same request packet every time it calls await UdpClient.ReceiveAsync() and basically spin locks getting the same packet over and over. So when the client wedges in this way, the receiver wedges in a different way.

The received packets on the server side have one difference once the problem occurs. Normally they would see the 42 byte long packets at the server, even though the buffer used with SendTo() was larger, because SendTo was called with a length param of 42. But once the problem occurs, the server starts to get packets that are the same size as the buffer used in SendTo() (e.g. 1024 bytes) in spite of the length parameter of 42. Changing the sent packet size to 42 to match the length parameter doesn't impact the problem, however. Just mentioning it in case that is a clue.

I'll attach a test application which reproduces the problem. Calling the command line app with param 100000 to run that many packets round trip usually reproduces the problem for us. The test application also allows an optional parameter to introduce a delay with each packet response, and that doesn't seem to impact the problem. Small delays in responding with the server thread will still reproduce the problem. Long delays will successfully trigger the client side receive timeout on the socket. So I can see that's working, except when the problem occurs where the client side receive will hang.
Comment 1 David Evans 2016-03-03 20:22:06 UTC
Created attachment 15241 [details]
Simple command line app to reproduce the problem

Usage for this command line app:
UdpMonoTest.exe <connect/sendto> <iterations> <optional delay_ms>

Calling it with "sendto" will use the SendTo/ReceiveFrom variants that repro the problem for me. Using "connect" uses the connect alternative pattern. Generally I would see the issue within 100000 iterations. And the delay_ms is on the server side, introducing a small delay between receiving a request packet and responding back to the client.

Note You need to log in before you can comment on or make changes to this bug.