Communication types and protocols for use in modern world mobile applications

Communication types and protocols in mobile apps

In modern world, mobile devices take a large part of our lifestyle. Today, using our smartphones, we can read news articles, send messages, do voice and video calls, share files, media, play games with other people. There’s a plethora of different apps and games across all mobile platforms leveraging modern technology that allow us to do all this. However, even if there’s already a ton of apps for different things we can do on the go, there’s still room for more. When it comes to creating another connected app, we have to choose the correct communication type to suit our app needs wisely. Communication of the app can make it or break it. Let’s have a look at different types so when the time comes, we don’t hesitate to choose the one that suits us best.

Connection types

Our apps can use at least two types of connection. These are local area connection (LAN) and wide area connection (WAN). If you look deeper into the problem, you’ll find the distinguishing factor for these two. The first type is applicable when two devices are in the same network and use the network’s infrastructure for communication, and the second one kicks in when you try connecting to a public web server like websites and some internet multiplayer game servers. In the first case, there may not be a dedicated server on the network the apps connect to. Instead, one of the devices takes the role of the server while still being a client to it, and all the others connect to that server. The best example for this is a multiplayer network game like Counter Strike. One peer starts a server and serves incoming connections. The server peer often takes part in the same game session while also being a broker between two clients sending messages to each other that materialize as bullets and other in-game projectiles. The server, however, decides who wins and who loses in this case. This is possible because all computing nodes inside LAN are publicly routable. Every peer can send packets to any other peer inside one network. LAN is also the fastest possible connection type. Because all peers are inside one network, there’s zero routing time spending for packets. However, this type of connection lives on the bare bones of TCP/UDP network protocols. This means you don’t really need to roll your own http server at least because there’s no need for one. However, no one keeps you from doing this. For more information about LAN networking between mobile devices, have a look at our previous articles on this topic.

It doesn’t matter what kind of connection you’re using. When connected via LAN, you’ll definitely be using sockets. When it’s a TCP/UDP networking via Wi-Fi, Bluetooth, with dedicated server or without it, there will be sockets all the way.

The second type is networking via a WEB or WAN connection. In this case, one peer is not directly accessible to other peers. The packets travel a long distance through numerous DNS servers and routing tables. Direct discovery is not possible this way. However, we could roll out a server somewhere in the world and tie it to its shiny new DNS name. This server can be a web server like REST Web API, a dedicated TCP server, or a combination of both – namely Web API with websockets support. Let’s take a closer look at each of these types and approaches we can take to leverage this tech.

REST Web API

This one is the easiest and the most straightforward way of communication between the server and the client. Every platform has its own way to deal with this type of connection. On Windows based platforms, including early versions of Windows Phone, up to 8.1 Silverlight and Xamarin platforms, we can use HttpWebRequest and HttpWebResponse including the modern async pattern. However, when it comes to newer platforms like Windows 8/8.1, Windows 10 UWP and Windows Phone 8.1 RT we’re no longer able to use the HttpWebRequest/HttpWebResponse. Instead, we can use the Microsoft provider brilliant HttpClient and still have a single code base across all .Net platforms.

The code will look like this:

HttpClient client = new HttpClient();

HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, requestUri);

HttpResponseMessage response = (HttpResponseMessage)await client.SendAsync(request);

string result = await response.Content.ReadAsStringAsync();

…and it will work on all platforms the way you would expect it to.

This code looks good, it works well, but it has its own downsides if used in certain circumstances:

Due to the nature of HTTP, this code will use too many resources when real-time or close to real-time communication with the server is needed. In this case, the client will be polling the server in some intervals. Each time the client requests the info from the server it opens the socket session to the server, sends some bits of information, waits for the response and closes the connection. This means that client cannot just sit there and wait for the server to push some data down the pipe. Client is here for data, and when data is available, the client goes away.
Concerning the first downside, the connection is nothing close to real-time.
The server has to push information to the client using some other kind of data delivery. In theory, push notifications could be this kind of data delivery. However, pushes are not guaranteed to be delivered by any of the providers. We have three major mobile platforms from Apple, Google and Microsoft with their own push servers – APNS, GCM and WNS respectively, and neither of them guarantees delivery. To add even more salt to this problem, all three push servers allow different amount of data to be sent using a single push notification. For Apple, the maximum amount of data inside a push notification is 255 bytes. That’s right, you’ve read it correctly - 255 bytes.

There’re several ways to overcome these drawbacks. For example, we could use the long polling technique (also known as Comet) using the same HttpClient by Microsoft.

Long Polling and Streaming

The basic idea behind long polling is that the server doesn’t return the response result when all the data is in place. Instead, the server returns a response stream but preserves the connection and flushes data to the stream when it’s available. On the other hand, the client is configured to get the response stream instead of data, sets the maximum timeout to infinity, and waits for new bits of data to appear on the stream. This technique is close to real-time performance, but it still uses the common way of handling data sent over HTTP – Json or XML serialization or binary data, and you still can have the same HttpClient as a REST service gateway.

The code would like this:

Client

The code is taken directly from a WinForms application and can be pasted directly into another WinForms /WPF app, UAP/UWP app or a Xamarin Android or iOS app, except for the mechanics of running something on UI thread from background thread.

delegate void OnUpdateUI(string message);
OnUpdateUI updateUIDelegate;

private async void startBtn_Click(object sender, EventArgs e)
{
    await Task.Run(() => ReadStream());
}

private void ReadStream()
{
    updateUIDelegate = new OnUpdateUI(UpdateUI);

    var baseUrl = @"http://localhost:45955/api/ChatAsync?accountId=1";
    var client = new HttpClient();
    var stream = client.GetStreamAsync(baseUrl).Result;
            
    while (stream.CanRead)
    {
        byte[] buffer = new byte[1024];

        int readBytes = stream.Read(buffer, 0, 1024);

        string message = UTF8Encoding.UTF8.GetString(buffer);

        if (readBytes > 0 && !string.IsNullOrEmpty(message))
            this.Invoke(updateUIDelegate, message);
    }
}

private void UpdateUI(string message)
{
    this.outputTxt.Text += string.Format("\r\n{0}", message);
}

Server

All of the listed below happens inside the ApiController of ASP Web API 2.x

public class Record 
{
    public int Id { get; set; }
    public string StringTime { get; set; }

    public Record(int id, string time)
    {
        this.Id = id;
        StringTime = time;
    }
}

[RoutePrefix("api/ChatAsync")]
public class StreamingController : ApiControllerBase
{
    public HttpResponseMessage Get([FromUri] int accountId)
    {
        HttpResponseMessage response = Request.CreateResponse();
        FeedWriter feedWriter = new FeedWriter();            
        response.Content = new PushStreamContent
        (
            new Action<Stream, HttpContent, TransportContext>(feedWriter.WriteToStream), 
            new MediaTypeHeaderValue("application/octet-stream")
        );
        response.Content.Headers.ContentType = 
new MediaTypeHeaderValue("application/octet-stream");

        return response;
    }
}

The controller is a generic Web API controller with a single method. This method constructs the response on its own instead of using the Web API 2.0 default mechanics of IHttpActionResult. This way we bypass all of the content negotiations and serialization parts of WebAPI for ability to stream our content in chunks.

The one missing bit is FeedWriter class. Here it is:

class FeedWriter
{
    public void WriteToStream(Stream stream, HttpContent content, TransportContext context)
    {
        int i = 0;

        try
        {
            // simulate long running i/o bound task
            while (i < 10)
            {
                Thread.Sleep(1000); // give no more than 1 Json object per second
                var record = new Record(1, DateTime.Now.TimeOfDay.ToString());
                var str = JsonConvert.SerializeObject(record);
                var buffer = UTF8Encoding.UTF8.GetBytes(str);

                // allign the buffer, so the it easier for client to read
                if (buffer.Length < 1024)
                    Array.Resize(ref buffer, 1024);

                stream.Write(buffer, 0, buffer.Length);
                i++;
            }
        }
        catch (HttpException ex)
        {
            if (ex.ErrorCode == -2147023667) // The remote host closed the connection.
                return;
        }
        finally
        {
            stream.Close();
        }
    }
}

If you compile the above code to Web API and a WinForms app and try this out, you’ll see that the content starts appearing on the client immediately with intervals of one second between Json objects. However, if you tried this using the Swagger Web API documentation pages (I used Swashbuckle when writing this), you would see a completely different picture. Swagger would wait for the entire 10 seconds before it would be ready to show the response content. You would not see the content in chunks as it arrives like in a WinForms client sample app, instead, content would be complete.

The conclusion to long polling and streaming Web API methods is next: Your method can easily support streaming and long polling while supporting both types of client requests – the common requests for response content and requests for the response streams. This should work in most situations. In addition, this technique allows streaming all kinds of media starting with simple binary representations of Json or XML going all up to the audio/video streaming and file downloads.

The downsides of this kind of communication is that you cannot use a full duplex mode. You’re already connected to the server with one web session and you cannot write to this session any more. To push data to the server, you would have to roll out another connection. But will the server know about your already established streaming connection? In most cases, the answer would be NO. Of course, the server could be developed with this in mind, it could have some kind of incoming requests caching and so on. However, this would add another layer of unneeded complexity to the server.

“Well, how can we achieve full duplex communication with the server while keeping our server a Rest API?” you may ask. The answer is WebSockets.

WebSockets

WebSockets were the part of initial HTML5 push and were designed to be implemented in browsers and web servers, but they can be used by any client or server application. The WebSockets protocol is a standalone protocol built on top of TCP. The only relation to HTTP is the initial handshake request to the server, also known as Protocol Upgrade Request. Unlike HTTP, WebSocket provides the full-duplex communication, which is not achievable with a single HTTP connection. In addition, the WebSockets protocol enables streams of messages on top of TCP. TCP alone deals with streams of bytes only, with no inherent concept of messages.

The WebSockets handshake resembles HTTP requests so that servers can handle HTTP as well as WebSockets connections on the same port. Moreover, the WebSockets protocol can cohabit the HTTP on the same server and the same Web API controller. The protocol uses a special URI scheme with ws or wss instead of http or https prefixes for distinguishing protocol upgrade requests from all the rest. Apart from that, WebSockets URI doesn’t allow the usage of # (hash or number sign). All the rest is coherent to HTTP URI scheme.

“So how do I implement WebSockets?” Well, that’s an easy task. We roll out the standard ApiController and a controller method. The only difference is that we accept the WS request using our server abstraction for a group of clients (chat in the following sample) and return 101 code, which is SwithchingProtocols, instead of common OK(200) or an error if a request is not a WebSocket request. The controller could look like this:

Server
[RoutePrefix("api/Chat")]
public class ChatController : ApiController
{
    public HttpResponseMessage Get(string username)
    {
        HttpContext context = HttpContext.Current;

        if (context.IsWebSocketRequest)
        {
            context.AcceptWebSocketRequest(new ChatWebSocketHandler(username));
            return Request.CreateResponse(HttpStatusCode.SwitchingProtocols);
        }
        return Request.CreateErrorResponse
 (
HttpStatusCode.NotAcceptable, 
"not a ws request"
 );
    }
} 

#Chat abstraction could look like the following lines:

class ChatWebSocketHandler : WebSocketHandler
{
    private static WebSocketCollection _chatClients = new WebSocketCollection();
    private string _username;
    public ChatWebSocketHandler(string username)
    {
        _username = username;
    }
    public override void OnOpen()
    {
        _chatClients.Add(this);
    }

    public override void OnMessage(string message)
    {
        _chatClients.Broadcast(_username + ": " + message);
    }

    public override void OnClose()
    {
        _chatClients.Clear();
        base.OnClose();
    }
}

WebSocketHandler can be found in Microsoft.Web.WebSockets namespace, which can be found in Microsoft.WebSockets.dll obtained through Nuget. Of course, I could use SignalR, for example, but in this case I am interested in the bare bones of the implementation, and SignalR hides all the guts behind the nitty-gritty abstractions and API interfaces. This completes the server part of the WebSockets protocol implementation. Let’s see how we can consume this from, say, for a mobile client.

In our case, we add every incoming client connection to a _chatClients, which in its turn implements the IEnumerable interface, so we can add our derived class ChatWebSocketHandler to it and later find any client if needed. But hey, we’re using a sample chat for WebSockets, so let’s just leave it the way it is now!

Client

Windows UAP and Windows UWP

Now with our socket server in place, let’s see the options for the client code. For Windows 8, 8.1, Phone 8.1 RT and Windows 10 UWP we can use the MessageWebSocket class for message frames or StreamWebSocket class for binary data from Windows.Networking.Sockets namespace. Let’s take a look at MessageWebSocket.

private MessageWebSocket messageWebSocket;
private DataWriter messageWriter; // needed for writing data to the socket
// assume we have Start button in our xaml
private async void Start_Click(object sender, RoutedEventArgs e)
{
    bool connecting = true;
    try
    {
        // Have we connected yet?
        if (messageWebSocket == null)
        {
            Uri server;
            if (!rootPage.TryGetUri(ServerAddressField.Text, out server))
            {
                return;
            }

            rootPage.NotifyUser("Connecting to: " + server, NotifyType.StatusMessage);

            messageWebSocket = new MessageWebSocket();
            messageWebSocket.Control.MessageType = SocketMessageType.Utf8;
            messageWebSocket.MessageReceived += MessageReceived;

// Dispatch close event on UI thread. This allows us to avoid synchronizing access to messageWebSocket.
            messageWebSocket.Closed += async (senderSocket, args) =>
            {
            await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, () => 
			Closed(senderSocket, args));
            };

            await messageWebSocket.ConnectAsync(server);
            messageWriter = new DataWriter(messageWebSocket.OutputStream);

            rootPage.NotifyUser("Connected", NotifyType.StatusMessage);
        }
        else
        {
            rootPage.NotifyUser("Already connected", NotifyType.StatusMessage);
        }

        connecting = false;
        string message = InputField.Text;
        OutputField.Text += "Sending Message:\r\n" + message + "\r\n";

        // Buffer any data we want to send.
        messageWriter.WriteString(message);

        // Send the data as one complete message.
        await messageWriter.StoreAsync();

        rootPage.NotifyUser("Send Complete", NotifyType.StatusMessage);
    }
    catch (Exception ex) // For debugging
    {
        // Error happened during connect operation.
        if (connecting && messageWebSocket != null)
        {
            messageWebSocket.Dispose();
            messageWebSocket = null;
        }

        WebErrorStatus status = WebSocketError.GetStatus(ex.GetBaseException().HResult);

        switch (status)
        {
            case WebErrorStatus.CannotConnect:
            case WebErrorStatus.NotFound:
            case WebErrorStatus.RequestTimeout:
                rootPage.NotifyUser("Cannot connect to the server. Please make sure " +
                    "to run the server setup script before running the sample.",
 			NotifyType.ErrorMessage);
                break;

            case WebErrorStatus.Unknown:
                throw;

            default:
                rootPage.NotifyUser("Error: " + status, NotifyType.ErrorMessage);
                break;
        }

        OutputField.Text += ex.Message + "\r\n";
    }
}
private void MessageReceived(MessageWebSocket sender,
	MessageWebSocketMessageReceivedEventArgs args)
{
    try
    {
        MarshalText(OutputField, "Message Received; Type: " + args.MessageType + "\r\n");
        using (DataReader reader = args.GetDataReader())
        {
            reader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;

            string read = reader.ReadString(reader.UnconsumedBufferLength);
            MarshalText(OutputField, read + "\r\n");
        }
    }
    catch (Exception ex) // For debugging
    {
        WebErrorStatus status = WebSocketError.GetStatus(ex.GetBaseException().HResult);

        if (status == WebErrorStatus.Unknown)
        {
            throw;
        }

        MarshalText(OutputField, "Error: " + status + "\r\n");
        MarshalText(OutputField, ex.Message + "\r\n");
    }
}
private void MarshalText(TextBox output, string value)
{
    MarshalText(output, value, true);
}
private void MarshalText(TextBox output, string value, bool append)
{
    var ignore = output.Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
    {
        if (append)
        {
            output.Text += value;
        }
        else
        {
            output.Text = value;
        }
    });
}

The code above shows two main points. We have a separate method Start_Click to send the data to the socket, and an event handler MessageReceived that is fired when data is received through the socket. With the first one, everything should be obvious. We instantiate the socket, connect it to the server if it’s not there already, and write the message to the socket OutputStream using our data writer. This is simple. However, the MessageReceived event fires in another thread, so we need to break through the multithreading environment up to our UI thread to deliver the message to the user. We do that using two methods for text forwarding using the CoreDispather. That’s it for Windows UAP and Windows UWP. For Windows desktop, the code would look a bit different though.

Windows desktop / WinForms / WPF / Xamarin

For Windows desktop and Xamarin platforms the namespace Windows.Networking.Sockets doesn’t exist, so we would have to find other approaches. Luckily, when Windows Vista and 7 were introduced Microsoft added WebSockets functionality to System.Net namespace. Adding a using System.Net.WebSockets; statement to the code allows us to instantiate the ClientWebSocket class which was built for the exact purpose.

private async void startBtn_Click(object sender, EventArgs e)
{
    if (socket == null)
    {
        statusTxt.Text = "Status: connecting...";

        serverUri = new Uri(addressTxt.Text);
        socket = new ClientWebSocket();

        try
        {
            await socket.ConnectAsync(serverUri, CancellationToken.None);
        }
        catch (Exception ex)
        {
            return;
        }

        if (socket.State == WebSocketState.Open)
            statusTxt.Text = "Status: connected";
    }

    if (socket.State != WebSocketState.Open)
    {
        statusTxt.Text = "Status: disconnected";
        await socket.CloseAsync(WebSocketCloseStatus.Empty, "", 
 CancellationToken.None);

        if (t != null)
        {
            if (t.IsAlive)
                t.Join();

            t = null;
            listenning = false;
        }
        return;
    }

    string msg = messageTxt.Text;

    if (string.IsNullOrEmpty(msg))
    {
        msg = "<empty>";
    }

    ArraySegment<byte> bytesToSend = 
new ArraySegment<byte>(Encoding.UTF8.GetBytes(msg));

    await socket.SendAsync(bytesToSend, 
WebSocketMessageType.Text, true, CancellationToken.None);
            
    if (socket.State == WebSocketState.Open)
    {
        if (!listenning)
        {
            t = new Thread(new ThreadStart(Receive));
            t.Start();
            listenning = true;
        }
    }
}

private void Receive()
{
    while (true)
    {
        try
        {
            ArraySegment<byte> bytesReceived = new ArraySegment<byte>(new byte[1024]);
            WebSocketReceiveResult result = 
socket.ReceiveAsync(bytesReceived, CancellationToken.None).Result;

            string resultStr = 
Encoding.UTF8.GetString(bytesReceived.Array, 0, result.Count);

            this.Invoke(UpdateUI, resultStr);
        }
        catch(Exception ex)
        {
            Debug.WriteLine(ex.Message);
        }

        Thread.Sleep(20);
    }
}

We have the same start button and its event handler, and we have a Receive method. However, this method is not an event handler anymore. The main difference between desktop/Xamarin and UAP/UWP models, except for using different classes and namespaces for WebSockets implementation, is that the ClientWebSocket doesn’t fire the event when data is received. Instead, it uses a programming model common for all the similar classes in System.Net namespace. We create a socket, connect it to the server, and fire up a thread for listening to incoming messages. If a message is received, we execute a simple delegate method to update the UI of a program. The Xamarin Android and Xamarin iOS code would be the same except for the UI updates. We don’t use the delegate and Control.Invoke(delegate(target)). Instead, we should use the platform specific way to run something on the UI thread.

That concludes our journey through the world of communication techniques that can be used in modern application development routines. Every approach has its own strengths and weaknesses, and it’s crucial to select one that will suit you most.

I hope you’ve enjoyed reading this and are excited for more. ☺ Cheers!