NNTP over MBone

Ville Öhman
Helsinki University of Technology
Telecommunications Software and Multimedia Laboratory
Ville.Ohman@iki.fi

Abstract

Internet news articles are distributed using NNTP. Due to huge growth of the Internet and number of users, NNTP has turned out to be far from optimal for article distribution in large scale. MBone multicasting might be a more efficient solution for article distribution than NNTP. This causes several serious problems and this paper is trying to discuss these problems and their possible solutions.


1. Introduction
1.1. Internet News
1.2. Multicasting on the Internet
2. NNTP protocol
2.1. Structure of the Usenet
2.2. NNTP Commands and Responses
2.3. Problems with Network Utilization
3. MBone
3.1. Structure of the MBone network
3.2. Routers and Tunnels
3.3. Applications
4. Solution
4.1. Requirements
4.2. Problems
4.3. Topology
4.4. Lost Messages: Observing
4.5. Lost Datagrams: Observing
4.6. Lost Messages / Datagrams: Requesting
4.7. Result
5. Conclusions and Discussion
References


1. Introduction

1.1. Internet News

Internet News is one of the most widespread Internet services. It is implemented using NNTP (Network News Transport Protocol), which specifies a protocol for the distribution, inquiry, retrieval, and posting of news articles using reliable stream-based transmission of news among the Internet community [ 2 ]. Like WWW (World Wide Web) and other Internet services, the number of users of these services has grown rapidly for several years and is still growing fast. NNTP protocol is not perfectly scalable for that kind of evolution. It has been observed that NNTP traffic between Usenet hosts causes much overhead, which is due to non-scalability of NNTP protocol. To make NNTP working better there should be some way to change it, or replace it with a totally new protocol. One proposal for a new NNTP protocol is NNTP Version 2, which is an Internet Draft obsoleting RFC-977.

1.2. Multicasting on the Internet

Multicasting on the Internet means group-oriented communication between hosts willing to participate the group. A host sending an IP (Internet Protocol) packet to a multicast group means logically the same as sending the packet to every participant using unicast. However, the advantage of using multicast is its scalability to large networks with large or even huge number of users. The packets sent are transported to every participant with minimum network overhead using only one copy of the packet in a single network line.

Currently the most popular multicasting technology is MBone . It doesn't require every router to be a multicast capable, which means that applications everywhere on the Internet can take part of MBone communication [ 1 ]. This makes it possible to write cost-effective multipart applications that take advantage of this technology. One such an application could possibly be a new version of network news transport protocol, but there is a lot of problems with it.

2. NNTP Protocol

For many years, the Internet community has supported the distribution of network news to participants all over the world. This has been done by Usenet hosts that are logically connected together to form a network. NNTP protocol [ 2 ] is utilized between these hosts and between hosts and users using Internet news. Standard form of Usenet messages is defined in [ 3 ]. The objective of this paragraph is to give some general information about the NNTP protocol to the reader not aware of it.

2.1. Structure of the Usenet

Usenet host network consists of hosts that service news users and exchange news articles and newly created groups between hosts. The user interface is an interactive TCP-based (Transmission Control Protocol) [ 5 ] stream between user's host and NNTP server. Typically a user news reading program connects to the port number 119 of the server and the session is maintained until the user program disconnects.

A Usenet host exchanges news articles using an interactive mechanism for deciding which are to be transmitted. A host desiring new news or newsgroups, or if it has new news or newsgroups to send, will typically contact one or more of its neighbours using NNTP. That is the principle of distributing articles on the Internet. The topology of the Usenet is hierarchial, eg. hosts on the highest level communicate between their neighbours, but hosts lower level of the hiearchy communicate only in a hierarchial order depending on the service contracts.

2.2. NNTP Commands and Responses

Commands are ASCII (American Standards Code for Information Interchange) words in some cases followed by parameter(s). Responses are of two kinds, textual and status. Status responses are numbers represented in ASCII. After certain status numbers, text containing article body etc. may follow.

2.3. Problems with Network Utilization

Internet-wide exchange of articles consumes much network bandwidth no matter what protocol is used. NNTP is not, however, very effective protocol. Things that make NNTP unefficient are (at least):

IHAVE messages are sent to neighbours whether the neighbour host has or doesn't have that article. In a mesh topology, that can cause problems with unneccessary sent IHAVE messages. Every message must be acknowleged before new messages can be sent (idle RQ) [ 7 ] This causes unefficiency when the network delay increases. A host having more than one article to send could encapsulate these articles to one message and thus save bandwidth. The usage of TCP [ 5 ] causes lots of acknowledgements to be sent which uses also some amount of bandwidth.

3. MBone

MBone stands for the Virtual Multicast Backbone On the interNEt [ 1 ]. MBone is a technology that enables distribution of and access to real-time interactive multimedia on the Internet. Distributing such isochronous media in a large-scale manner over Internet was not feasible before the MBone was invented and deployed. The MBone was first deployed in the first few routers in 1992, and has since experienced fast growth.

3.1. Structure of the MBone Network

Figure 1. shows the difference between traditional unicast IP routing and multicast IP routing. Traditional routers are marked with r-n and multicast-capable routers are marked with mr-n. A traditional router r-1 sees the class D addressed multicast packet as a standard IP packet, and therefore routes the packet appropriately.

Figure 1. Internet routing

Class D addresses (first byte between 224 and 239) are used for IP multicasting. The address space assigned for use by the MBone includes the range 224.2.*.*. [ 1 ]

3.2. Routers and Tunnels

Some new routers have native multicast packet routing, but there are lots of routers which cannot route multicast packets correctly. To use MBone technology today everywhere on the Internet, tunnels are needed between multicast-capable islands of the Internet. Most of the MBone tunnels on the Internet today are encapsulated tunnels. IP multicast packets traversing an encapsulated tunnel are characterized by their IP source and destination addresses being the IP addresses of the tunnel endpoint multicast routers. [ 1 ]

3.3. Applications

A typical MBone application is not very sensitive to network errors. A network TV or radio user can usually live with a short break in a media stream. These are happening infrequently due to unreliability of the UDP (User Datagram Protocol) [ 6 ] protocol. TCP [ 5 ] or other reliable protocols cannot be used because there is no mechanism to handle acknowledgements. Applications not tolerating missing packets must handle these situations somehow or they must use some other technology. One possibility is to combinate unreliable but effective MBone UDP stream with reliable but not multicast capable TCP stream or streams to form a system that has both important properties: efficiency and reliability. That kind of solution is presented in the next paragraph.

4. Solution

To make it possible to take advantage of Mbone properties, a new news interchange protocol must be developed. This new, not existing protocol is here referred to as mnews which might, if specified, be a replacement of NNTP. An old standard format for Usenet messages could be used, only the Path line would not have the same meaning as before.

4.1. Requirements

There are several requirements that a mnews protocol must meet. Here is a list of them:

  1. Efficiency. All the packets that are sent to every mnews host are sent using MBone.
  2. Efficiency. If there are no detected errors, other than MBone communication between hosts is needed as little as possible.
  3. Locality. All errors are handled as locally as possible.
  4. Reliability. Mechanism that ensures that all messages are correctly delivered.
  5. Usefulness. Mnews protocol must use less bandwidth that NNTP to make the use of it reasonable.
  6. Limited distribution. There should be some way to limit article distribution.

4.2. Problems

It is not possible to use reliable transport protocol, such as TCP, in IP multicasting. Therefore, unreliable, datagram-based UDP protocol must be used. This will cause several problems to be solved:

4.3 Topology

Today MBone administration is not neccesserely as easy as it shoud be for everyone to use it, and not every operating system platform supports MBone. Therefore, it is better to leave the administration issues for professionals and connect client machines to the servers using NNTP over traditional TCP stream. This makes it possible to use existing news reading software as before.

All the servers, here referred as mnews hosts or servers, are connected to MBone. An Internet class D multicast address should be reserved for global news traffic. This same address can be used for local news systems, but ttl value must be set as low as needed to prevent local articles and groups flowing out. Private (for example company wide) news systems may not be needed to change anyhow due to the little usage and fast local area networks. They could continue using NNTP inside and communication to other news servers could be done through a mnews gateway server.

To meet the requirements listed in paragraph 4.1 a pure Mbone connection is absolutely not enough for mnews protocol. Additional connections are needed for error recovering. These connections cannot be handled by MBone, because this would lead to misuse of the capacity in networks all over the Internet. One possibility could be dynamic channel allocation to the sender host and requesting missing data. This might work well if the error occurred near receiving party, otherwise too many hosts would try to connect and this would cause serious network overloading near sending party. The optimal solution might be a hierarchy of mnews hosts connected together with TCP streams. That would mean that the hosts should have a some kind of property of routing retransmission requests.

4.4. Lost Messages: Observing

One solution for observing lost messages is to label them with a sender host identity and a host private sequence number. A receiving host is capable to realize that a message or messages are missing if it keeps a sequence number for each host sending messages. That means that every mnews host must have a database of all other mnews hosts (which have sent something in their lifetime). The size of the database would not be any problem, a database with tens or even hundreds of thousends of rows is rather easily handled today.

A lost message is observed as late as the next message arrives from the same host. That may not be very good behauvior when some host is sending messages rarely. If that message is dropped by the Internet, some hosts will not receive that article in an arbitrary long time.This problem can be solved by sending periodically (with sufficiently large period) empty timeout messages that contain only the current sequence number. The number of sent messages of this kind should be limited so that it is possible to reach a small probability enough for an undelivered message anywhere on the Internet, but not to send them without a reason during possible long periods of inactivity.

Figure 2. Lost message detection

Figure 2. shows how a lost message is detected. Black hosts are mnews hosts that are connected to MBone. White hosts are all other hosts, eg. user hosts or old NNTP hosts. Phases are described briefly here.

  1. A host using mnews sends an article to mnews host.
  2. A mnews host multicasts it, and for reason or other, the network drops it.
  3. The same host sends an another article or a timeout message with current sequence number.
  4. Message is delivered successfully. Hosts receiving it will detect missing article by comparing the current sequence number and the previously received sequence number in the database.

Messages can be dropped not only because of normal infrequently happening network failures, but also due to serious and time-consuming network failures. These failures can prevent a mnews host receiving very much messages that are delivered elsewhere on the Internet during the failure. The detection process of these messages is similar, however.

4.5. Lost Datagrams: Observing

A long message may consist of several UDP datagrams. Therefore, a datagram must itself contain a sequence number that states its position in the message and the total number of datagrams of the message. Receiving host can also rearrange datagrams if they are arrived out of order. If one datagram of a message has been received, a timer may be started to detect a dropped datagram(s).

4.6. Lost Messages / Datagrams: Requesting

A host that detects lost message or datagram shoud have a mechanism to request a copy of lost data. This should be done as locally as possible to keep the Internet-wide traffic as minimal as possible. Therefore, a mnews host configuration file or similar must specify its nearest mnews hosts. These hosts are used for requesting missing data. Missing data is requested transitively from mnews hosts as long as missing data is found. Then it is transported to all mnews hosts in the path. Datagrams might not be requested themselves, a missing datagram means that the whole message is requested and retransmitted, because the sender might not want to keep accounts about boundaries of the datagrams.

Figure 3. mnews hosts request missing message

Figure 3. explains what happens after the scenario described in the figure 2. has occurred. Mnews hosts detecting missing message(s) request them (1). A mnews host they are requesting doesn't know anything about that message, so it forwards recursively the request (2) and responds after its request is responded.

Requesting data is possible only if mnews hosts have some request routing properties, eg. they must know which of the neighbour mnews host is probably the best place to send the request. This is not trivial, because there cannot be a correct Path line in the article header, like in Usenet [ 3 ] messages.

4.7. Result

The requirements (4.1.) and the solutions of them are discussed here one by one.

Requirement 1. is fully satisfied. There is never such a situation that something is distributed everywhere unefficiently (eg. without MBone). If no errors occur, which might usually be likely, there is no need for other communication (such as acknowledgements or IHAVE messages) between mnews hosts and that is a major improvement to NNTP.

Satisfaction of requirement 2. depends on how mnews hosts exchange the routing information of request messages. If that routing information is static, this requirement is fully satisfied (no overhead), otherwise some algorithm between the mnews hosts for dynamic routing table construction must be exist and it uses some (propably little) bandwidth.

Requirement 3. is satisfied, because retransmission requests are sent using TCP only to one of the neighbour mnews host. That host acts similarly, keeping the problem as local as possible.

Reliable delivery of messages is achieved and the requirement 4. is satisfied, because retransmission is requested after missing data. If very much data is missing (due to serious network failure or setting up a server) retransmission is requested in a similar manner.

The satisfaction of requirement 5. is unknown, because the bandwidth usage depends on the average error rate of the Internet and the topology of mnews host network. It might use much less bandwidth than NNTP because the error rates of the Internet are usually quite low.

Requirement 6. can be satisfied if setting of low ttl values is acceptable. This depends on the topology of the local internet.

5. Conclusions and Discussion

MBone technology has become some kind of standard for Internet multicasting. To take an advantage of it, applications receiving data must tolerate unreliable data transfer. Interactive voice or video stream eg. can usually do without full reliability.

The idea of both effective and reliable multicasting might be useful in some more advanced applications. This paper discussed the possibility of news distribution using MBone and things that could make it reliable. Reliability requires a topology of reliable channels between neighbour hosts and routing of retransmission requests. Routing of requests are not discussed here and that would be the major problem to be solved next if these ideas presented here were used as a basis of a new news distribution hierarchy.

Another thing that might be worth thinking is other Internet services: Is there other services that use much bandwidth and distribute data globally and require reliable streams? If so, the same kind of composite technology of MBone and reliable streams might be used there as well. One example of that kind of technology might be IRC (Internet Relay Chat).

To test the ideas of this paper a network simulation might be a good idea. That would require some kind of network simulation software and an alpha implementation of mnews protocol. In the same way, traditional NNTP protocol could be simulated. After simulations, more about the performance and other differences could be said.

References

[1]
Vinay Kumar. MBone Interactive Multimedia on the Internet. New Riders Publishing, 1996.
[2]
Brian Kantor and Phil Lapsley. Network News Transfer Protocol. RFC-977, UCSD & UCB, February 1986.
[3]
M. Horton and R. Adams. Standard for Interchange of Usenet Messages. RFC-1036, AT&T Bell Laboratories, December 1987.
[4]
Douglas E. Comer. Internetworking with TCP/IP. Volume I. Prentice-Hall, 1995.
[5]
Defence Advanced Research Projects Agency. Transmission Control Protocol. RFC-793, Information Sciences Institute, University of Southern California, September 1981.
[6]
J. Postel. User Datagram Protocol. RFC-768, , August 1980.
[7]
Fred Halsall. Data Communications, Computer Networks and Open Systems. Addison-Wesley, 1996.