Client-server communication
Client-server applications are configured on a network (internet/intranet). Clients send requests to the server for a resource and, in turn, receive responses from the server. A computer that can send such requests for a resource/service is called a client, and the computer that contains the program that provides the requested resource/service to more than one client is called a server. Both clients and servers can be connected through a wired/wireless network protocol:
In the preceding figure, client-server communication can be visualized as a program running on the client machine interacting with another program running on the server machine. This communication through the network involves networking services offered by diverse communication protocols.
In a single processor system, applications can talk to each other through shared memory. The producer process writes data to the buffer or filesystem, and the consumer process reads the data from there. In distributed systems, there is no shared memory. In these systems, application communication is intense as they have to coordinate with each other and generate output in the shortest period of time for inter-process communication. As a result, computers engage in diverse methods of communication between distributed applications that may be remotely located from each other.
To address the issue, all communication systems are expected to comply with the Open Systems Interconnection model (OSI model), which is a conceptual model that characterizes and standardizes the communication functions of a telecommunication or computing system, irrespective of their underlying internal structure and technology. This model partitions a communication system into seven layers: each layer serves the layer above it and is served by the layer below it. Among these, the data link layer is one of the most important layers as it serves the purpose of error detection and correction, thereby ensuring data quality. It groups bits into frames and ensures that each frame is received correctly. It puts a special pattern at the start and end of each frame to mark it; also, it computes a checksum by adding all the bytes to the frame in a particular order. To ensure defect-free transmission, follow these two strategies:
- Error detecting strategy: In this strategy, the receiver receives limited information. It can only detect that some error has occurred and reject the message subsequently, but it cannot correct it.
- Error correction strategy: In this method, the receiver receives duplicate information with each block referring to the previous block. If any error occurs, information from the following block can be used to correct the error at the receiver end itself. For example, if a frame consists of i data bits and duplicate d data bits, then the total length would be i + d = n. This total length n is called a code word. These code words are further compared to find the number of bits they differ by. The number of bit positions in which consecutive code words differ is called the hamming distance.
There are different methods of remote application communication available that you can use over a network, involving the following types:
- Network protocol stack
- Remote Procedure Call (RPC)
- Remote Method Invocation (RMI)
- Message queuing services (sockets)
- Stream-oriented services
We will discuss these in detail in subsequent sections. In this chapter, we will try to understand the basics of networking technology, how it evolved over a period of time, and how Java provides support for it. We will follow this up with different practical examples. Now let's look into the basics of networking:
As depicted in the preceding diagram, TCP and UDP are part of the transport layer. Java supports programming in these protocols through API sockets. When we write a Java program, we do programming at the application layer. We do not worry about the TCP and UDP layers as they are internally taken care of by java.net packages, irrespective of the platform. However, java.net is an exhaustive package that contains many classes. In order to decide which ones to use, you need a basic understanding of networking and the difference between these two protocols.
When you speak over your phone, whatever you speak is delivered to the receiver in the same order without any damage. If any issue occurs during this transmission, the other person experiences voice disturbance. The TCP protocol is analogous to this kind of telephonic communication. This circuit-switching protocol ensures that data is delivered back and forth without any damage and in the same order. TCP is a two-way protocol; hence, data can be sent across in both the directions at the same time. TCP supports point-to-point channel functionality to foster reliable communication between interfaces. Well-known, high-level (application) protocols such as Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Telnet are some examples that are based on the TCP/IP protocol. While browsing internet sites, if data is not transmitted in the proper sequence, the user will see scrambled pages. Thus, TCP can be defined as a connection-based protocol that provides a reliable flow of data between two computers.
Analogous to our postal system, we don't require assured mail delivery every time. For example, if you are sending a business contract to your customer, you may look for assured delivery, but if you are sending flyers, you may not look for assured delivery. Coming to computers, let's say a customer sends a request for the price of a stock. But due to link failure, you are unable to deliver the stock price at that moment. In this case, it is not necessary that you retry sending the same stock price message again; this is because by the time the link is established again, the stock price could have changed. So, in this scenario, the customer will place a request again and the system will deliver the latest stock price. In these kinds of scenario, the system needs to manage the overhead of assured message delivery rather than execute the process again from scratch and send a new message.
In these scenarios, the UDP protocol comes in handy. It sends data in packets, and these packets are independent of each other and do not follow any sequence. If the application interfaces do not follow the TCP protocol for interacting and the interaction is not enforced, then such applications can interact using the UDP protocol.
This way, the UDP protocol can be defined as a nonconnection-based protocol that sends independent packets of data, called datagrams, from one computer to another with no guarantee about their arrival.
The TCP and UDP protocols use ports to map incoming data to a particular process running on a computer. Each computer has a single physical connection to a network through which it sends/receives data to/from the network. To send data to a particular computer, 32-bit IP addresses are used. But once the data is received, how will the computer identify which application it pertains to? The answer is through ports. Thus, a port is the end point of communication in distributed systems. Though this is also used for hardware devices, here we are referring to a software construct that identifies specific processes or network services.
An IP-port-calling socket is never used on its own; it is always associated with an IP address and protocol type used for establishing communication. This way, it completes the destination or source address of a communication session. A port possesses a 16-bit number and ranges from 0 to 65,535; out of this range, 0 to 1,023 are reserved for HTTP, FTP, and other system services.
Transport protocols, such as TCP and UDP, provide the source and destination port numbers in their headers. A process that associates a port's input and output channels through an internet socket with a transport protocol, a port number, and an IP address is called binding.
This way, a port can be defined as a channel for passing the inbound data passed to a specific interface on the host computer using either the TCP or UDP protocol.
A port number can be assumed with a 16-bit positive integer. The following table shows a list of ports along with the processes they can possess with supported services:
Apart from these, any application- or user-defined service that is greater than 1024 will be able to consume a port.
When we send an e-mail to someone, how do we ensure that the e-mail will reach the recipient correctly? The answer is through the e-mail address, right? On the same lines, when a computer wants to talk to another computer, how does it ensure it's talking to the right one? The answer is through the IP address. Once the right computer is located, then which process to connect to is decided at the port number level.
An IP address can be defined as a numerical label assigned to every device (for example, a computer or printer) on a computer network that makes use of IP for communication. It serves two purposes: host or network interface identification and location addressing. Refer to the following figure:
Two versions of IP are in use currently: IP Version 4 and IP Version 6. The IPv4 assignment started in 1980 and was 32 bits with a maximum limit of up to 4,294,967,296. However, due to heavy demand, IPv4 ran out on February 3, 2011, except for some small amounts of address spaces reserved until its transition to another system.
In 1995, IPv6 (the newer system) was devised, during the process of finding advanced technologies and improved mechanisms to generate an internet address, by Internet Engineering Task Force (IETF). The 32-bit address size was increased to 128 in IPv6, and this seemed sufficient for the foreseeable future.
With the evolution of more networks that did not depend on preassigned identification-numbered networks, the early methods of host numbering proved to be insufficient in the case of internet addresses. In 1981, a classful internet network architect was introduced to replace the internet address mechanism, which changed the way of addressing. This network design permitted you to assign a higher number of separate network numbers in addition to the improved subnet designing.
Network and host separation can be observed in an IP address through either the subnet or CIDR prefix. IPv4 uses the subnet mask terminology, whereas the CIDR representation is used by both IPv4 and IPv6. This representation of an IP address is denoted with a slash followed by a number in its decimal position (bits). This representation of the network parts is denoted as the routing prefix. A sample value of the IPv4 address and the corresponding subnet mask are 192.0.2.1 and 255.255.255.0, respectively.
You may have noticed that every computer generally has a hostname and a numeric IP address. Both of these form the unique identifier of that computer in fully qualified form. For example, the URL of Google's home page is www.google.com and its IP address is 74.125.200.147, so you can access Google using both of these over an IP network. The identity of any computing device on a network is referred to as Domain Name Services (DNS); it's a name and an alias for the IP address. In Java, an IP address is represented by the InetAddress class. Using the getHostName() method of this class, one can get the hostname; also, the getAddress() method provides the numeric address.