In Search of Reliable Internet Measurement Data

Newspapers and magazines frequently report growth rates of Internet usage, number of users, hosts, and domains that seem to be beyond all expectations. Growth rates are expected to accelerate exponentially. However, Internet measurement data are anything thant reliable and often quite fantastic constructs, that are nevertheless jumped upon by many media and decision makers because the technical difficulties in measuring Internet growth or usage are make reliable measurement techniques impossible.

Equally, predictions that the Internet is about to collapse lack any foundation whatsoever. The researchers at the Internet Performance Measurement and Analysis Project (IPMA) compiled a list of news items about Internet performance and statistics and a few responses to them by engineers.

Size and Growth

In fact, "today's Internet industry lacks any ability to evaluate trends, identity performance problems beyond the boundary of a single ISP (Internet service provider, M. S.), or prepare systematically for the growing expectations of its users. Historic or current data about traffic on the Internet infrastructure, maps depicting ... there is plenty of measurement occurring, albeit of questionable quality", says K. C. Claffy in his paper Internet measurement and data analysis: topology, workload, performance and routing statistics (http://www.caida.org/Papers/Nae/, Dec 6, 1999). Claffy is not an average researcher; he founded the well-known Cooperative Association for Internet Data Analysis (CAIDA).

So his statement is a slap in the face of all market researchers stating otherwise.
In a certain sense this is ridiculous, because since the inception of the ARPANet, the offspring of the Internet, network measurement was an important task. The very first ARPANet site was established at the University of California, Los Angeles, and intended to be the measurement site. There, Leonard Kleinrock further on worked on the development of measurement techniques used to monitor the performance of the ARPANet (cf. Michael and Ronda Hauben, Netizens: On the History and Impact of the Net). And in October 1991, in the name of the Internet Activities Board Vinton Cerf proposed guidelines for researchers considering measurement experiments on the Internet stated that the measurement of the Internet. This was due to two reasons. First, measurement would be critical for future development, evolution and deployment planning. Second, Internet-wide activities have the potential to interfere with normal operation and must be planned with care and made widely known beforehand.
So what are the reasons for this inability to evaluate trends, identity performance problems beyond the boundary of a single ISP? First, in early 1995, almost simultaneously with the worldwide introduction of the World Wide Web, the transition of the stewardship role of the National Science Foundation over the Internet into a competitive industry (bluntly spoken: its privatization) left no framework for adequate tracking and monitoring of the Internet. The early ISPs were not very interested in gathering and analyzing network performance data, they were struggling to meet demands of their rapidly increasing customers. Secondly, we are just beginning to develop reliable tools for quality measurement and analysis of bandwidth or performance. CAIDA aims at developing such tools.
"There are many estimates of the size and growth rate of the Internet that are either implausible, or inconsistent, or even clearly wrong", K. G. Coffman and Andrew, both members of different departments of AT & T Labs-Research, state something similar in their paper The Size and Growth Rate of the Internet, published in First Monday. There are some sources containing seemingly contradictory information on the size and growth rate of the Internet, but "there is no comprehensive source for information". They take a well-informed and refreshing look at efforts undertaken for measuring the Internet and dismantle several misunderstandings leading to incorrect measurements and estimations. Some measurements have such large error margins that you might better call them estimations, to say the least. This is partly due to the fact that data are not disclosed by every carrier and only fragmentarily available.
What is measured and what methods are used? Many studies are devoted to the number of users; others look at the number of computers connected to the Internet or count IP addresses. Coffman and Odlyzko focus on the sizes of networks and the traffic they carry to answer questions about the size and the growth of the Internet.
You get the clue of their focus when you bear in mind that the Internet is just one of many networks of networks; it is only a part of the universe of computer networks. Additionally, the Internet has public (unrestricted) and private (restricted) areas. Most studies consider only the public Internet, Coffman and Odlyzko consider the long-distance private line networks too: the corporate networks, the Intranets, because they are convinced (that means their assertion is put forward, but not accompanied by empirical data) that "the evolution of the Internet in the next few years is likely to be determined by those private networks, especially by the rate at which they are replaced by VPNs (Virtual Private Networks) running over the public Internet. Thus it is important to understand how large they are and how they behave." Coffman and Odlyzko check other estimates by considering the traffic generated by residential users accessing the Internet with a modem, traffic through public peering points (statistics for them are available through CAIDA and the National Laboratory for Applied Network Research), and calculating the bandwidth capacity for each of the major US providers of backbone services. They compare the public Internet to private line networks and offer interesting findings. The public Internet is currently far smaller, in both capacity and traffic, than the switched voice network (with an effective bandwidth of 75 Gbps at December 1997), but the private line networks are considerably larger in aggregate capacity than the Internet: about as large as the voice network in the U. S. (with an effective bandwidth of about 330 Gbps at December 1997), they carry less traffic. On the other hand, the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet. In the future, growth in Internet traffic will predominantly derive from people staying longer and from multimedia applications, because they consume more bandwidth, both are the reason for unanticipated amounts of data traffic.

Hosts

The Internet Software Consortium's Internet Domain Survey is one of the most known efforts to count the number of hosts on the Internet. Happily the ISC informs us extensively about the methods used for measurements, a policy quite rare on the Web. For the most recent survey the number of IP addresses that have been assigned a name were counted. At first sight it looks simple to get the accurate number of hosts, but practically an assigned IP address does not automatically correspond an existing host. In order to find out, you have to send a kind of message to the host in question and wait for a reply. You do this with the PING utility. (For further explanations look here: Art. PING, in: Connected: An Internet Encyclopaedia) But to do this for every registered IP address is an arduous task, so ISC just pings a 1% sample of all hosts found and make a projection to all pingable hosts. That is ISC's new method; its old method, still used by RIPE, has been to count the number of domain names that had IP addresses assigned to them, a method that proved to be not very useful because a significant number of hosts restricts download access to their domain data.
Despite the small sample, this method has at least one flaw: ISC's researchers just take network numbers into account that have been entered into the tables of the IN-ADDR.ARPA domain, and it is possible that not all providers know of these tables. A similar method is used for Telcordia's Netsizer.

Internet Weather

Like daily weather, traffic on the Internet, the conditions for data flows, are monitored too, hence called Internet weather. One of the most famous Internet weather report is from The Matrix, Inc. Another one is the Internet Traffic Report displaying traffic in values between 0 and 100 (high values indicate fast and reliable connections). For weather monitoring response ratings from servers all over the world are used. The method used is to "ping" servers (as for host counts, e. g.) and to compare response times to past ones and to response times of servers in the same reach.

Hits, Page Views, Visits, and Users

Let us take a look at how these hot lists of most visited Web sites may be compiled. I say, may be, because the methods used for data retrieval are mostly not fully disclosed.
For some years it was seemingly common sense to report requested files from a Web site, so called "hits". A method not very useful, because a document can consist of several files: graphics, text, etc. Just compile a document from some text and some twenty flashy graphical files, put it on the Web and you get twenty-one hits per visit; the more graphics you add, the more hits and traffic (not automatically to your Web site) you generate.
In the meantime page views, also called page impressions are preferred, which are said to avoid these flaws. But even page views are not reliable. Users might share computers and corresponding IP addresses and host names with others, she/he might access not the site, but a cached copy from the Web browser or from the ISP's proxy server. So the server might receive just one page request although several users viewed a document.

Especially the editors of some electronic journals (e-journals) rely on page views as a kind of ratings or circulation measure, Rick Marin reports in the New York Times. Click-through rates - a quantitative measure - are used as a substitute for something of intrinsically qualitative nature: the importance of a column to its readers, e. g. They may read a journal just for a special column and not mind about the journal's other contents. Deleting this column because of not receiving enough visits may cause these readers to turn their backs on their journal.
More advanced, but just slightly better at best, is counting visits, the access of several pages of a Web site during one session. The problems already mentioned apply here too. To avoid them, newspapers, e.g., establish registration services, which require password authentication and therefore prove to be a kind of access obstacle.
But there is a different reason for these services. For content providers users are virtual users, not unique persons, because, as already mentioned, computers and IP addresses can be shared and the Internet is a client-server system; in a certain sense, in fact computers communicate with each other. Therefore many content providers are eager to get to know more about users accessing their sites. On-line registration forms or WWW user surveys are obvious methods of collecting additional data, sure. But you cannot be sure that information given by users is reliable, you can just rely on the fact that somebody visited your Web site. Despite these obstacles, companies increasingly use data capturing. As with registration services cookies come here into play.

For

If you like to play around with Internet statistics instead, you can use Robert Orenstein's Web Statistics Generator to make irresponsible predictions or visit the Internet Index, an occasional collection of seemingly statistical facts about the Internet.

Measuring the Density of IP Addresses

Measuring the Density of IP Addresses or domain names makes the geography of the Internet visible. So where on earth is the most density of IP addresses or domain names? There is no global study about the Internet's geographical patterns available yet, but some regional studies can be found. The Urban Research Initiative and Martin Dodge and Narushige Shiode from the Centre for Advanced Spatial Analysis at the University College London have mapped the Internet address space of New York, Los Angeles and the United Kingdom (http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/telecom.html and http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/gisruk98.html).
Dodge and Shiode used data on the ownership of IP addresses from RIPE, Europe's most important registry for Internet numbers.





TEXTBLOCK 1/6 // URL: http://world-information.org/wio/infostructure/100437611791/100438658352
 
Economic structure; transparent customers

Following the dynamics of informatised economies, the consumption habits and lifestyles if customers are of great interest. New technologies make it possible to store and combine collected data of an enormous amount of people.

User profiling helps companies understand what potential customers might want. Often enough, such data collecting takes place without the customer's knowledge and amounts to spying.

"Much of the information collection that occurs on the Internet is invisible to the consumer, which raises serious questions of fairness and informed consent."

(David Sobel, Electronic Privacy Information Center)

TEXTBLOCK 2/6 // URL: http://world-information.org/wio/infostructure/100437611726/100438658925
 
"Stealth Sites"

"Stealth sites" account for a particular form of hidden advertisement. Stealth sites look like magazines, nicely designed and featuring articles on different topics, but in reality are set up for the sole purpose of featuring a certain companies products and services. "About Wines" for example is a well-done online magazine, featuring articles on food and travel and also publishes articles on wine, which surprisingly all happen to be from Seagram.

TEXTBLOCK 3/6 // URL: http://world-information.org/wio/infostructure/100437611652/100438657995
 
Economic structure; introduction



"Globalization is to no small extent based upon the rise of rapid global communication networks. Some even go so far as to argue that "information has replaced manufacturing as the foundation of the economy". Indeed, global media and communication are in some respects the advancing armies of global capitalism."

(Robert McChesney, author of "Rich Media, Poor Democracy")

"Information flow is your lifeblood."

(Bill Gates, founder of Microsoft)

The usefulness of information and communication technologies increases with the number of people who use them. The more people form part of communication networks, the greater the amount of information that is produced. Microsoft founder Bill Gates dreams of "friction free capitalism", a new stage of capitalism in which perfect information becomes the basis for the perfection of the markets.

But exploitative practices have not disappeared. Instead, they have colonised the digital arena where effective protective regulation is still largely absent.

Following the dynamics of informatised economies, the consumption habits and lifestyles if customers are of great interest. New technologies make it possible to store and combine collected data of an enormous amount of people.

User profiling helps companies understand what potential customers might want. Often enough, such data collecting takes place without the customer's knowledge and amounts to spying.

"Much of the information collection that occurs on the Internet is invisible to the consumer, which raises serious questions of fairness and informed consent."

(David Sobel, Electronic Privacy Information Center)

TEXTBLOCK 4/6 // URL: http://world-information.org/wio/infostructure/100437611726/100438658916
 
Advertising and the Content Industry - The Coca-Cola Case

Attempts to dictate their rules to the media has become a common practice among marketers and the advertising industry. Similar as in the Chrysler case, where the company demanded that magazines give advance notice about controversial articles, recent attempts to put pressure on content providers have been pursued by the Coca-Cola Company.

According to a memo published by the New York Post, Coca-Cola demands a free ad from any publication that publishes a Coke ad adjacent to stories on religion, politics, disease, sex, food, drugs, environmental issues, health, or stories that employ vulgar language. "Inappropriate editorial matter" will result in the publisher being liable for a "full make good," said the memo by Coke advertising agency McCann-Erickson. Asked about this practice, a Coke spokes person said the policy has long been in effect.

(Source: Odwyerpr.com: Coke Dictates nearby Editorial. http://www.odwyerpr.com)

TEXTBLOCK 5/6 // URL: http://world-information.org/wio/infostructure/100437611652/100438657998
 
Definition

During the last 20 years the old Immanuel Wallerstein-paradigm of center - periphery and semi-periphery found a new costume: ICTs. After Colonialism, Neo-Colonialism and Neoliberalism a new method of marginalization is emerging: the digital divide.

"Digital divide" describes the fact that the world can be divided into people who
do and people who do not have access to (or the education to handle with) modern information technologies, e.g. cellular telephone, television, Internet. This digital divide is concerning people all over the world, but as usually most of all people in the formerly so called third world countries and in rural areas suffer; the poor and less-educated suffer from that divide.
More than 80% of all computers with access to the Internet are situated in larger cities.

"The cost of the information today consists not so much of the creation of content, which should be the real value, but of the storage and efficient delivery of information, that is in essence the cost of paper, printing, transporting, warehousing and other physical distribution means, plus the cost of the personnel manpower needed to run these `extra' services ....Realizing an autonomous distributed networked society, which is the real essence of the Internet, will be the most critical issue for the success of the information and communication revolution of the coming century of millennium."
(Izumi Aizi)

for more information see:
http://www.whatis.com/digital_divide.htm

TEXTBLOCK 6/6 // URL: http://world-information.org/wio/infostructure/100437611730/100438659300
 
Internet Software Consortium

The Internet Software Consortium (ISC) is a nonprofit corporation dedicated to the production of high-quality reference implementations of Internet standards that meet production standards. Its goal is to ensure that those reference implementations are properly supported and made freely available to the Internet community.

http://www.isc.org

INDEXCARD, 1/8
 
McCann Erickson

Alfred W. Erickson founded the advertising agency McCann Erickson in 1902. In1913 McCann opened a San Francisco office and a Detroit office that moved to Cleveland in 1915. With operations in 127 countries, McCann reaches across the globe and continues to expand its capabilities through start-up units and acquisitions. McCann has recently added creative resources in the local, pan-regional and global arenas and also extended its expertise in specialized marketing categories, such as business-to-business and high-tech communications.

INDEXCARD, 2/8
 
Integrated circuit

Also called microcircuit, the integrated circuit is an assembly of electronic components, fabricated as a single unit, in which active semiconductor devices (transistors and diodes) and passive devices (capacitors and resistors) and their interconnections are built up on a chip of material called a substrate (most commonly made of silicon). The circuit thus consists of a unitary structure with no connecting wires. The individual circuit elements are microscopic in size.

INDEXCARD, 3/8
 
Telephone

The telephone was not invented by Alexander Graham Bell, as is widely held to be true, but by Philipp Reiss, a German teacher. When he demonstrated his invention to important German professors in 1861, it was not enthusiastically greeted. Because of this dismissal, no financial support for further development was provided to him.

And here Bell comes in: In 1876 he successfully filed a patent for the telephone. Soon afterwards he established the first telephone company.

INDEXCARD, 4/8
 
United Brands Company

American corporation formed in 1970 in the merger of United Fruit Company and AMK Corporation. United Fruit Company, the main company, was founded in 1899 producing and marketing bananas grown in the Caribbean islands, Central America, and Colombia. The principal founder was Minor C. Keith, who had begun to acquire banana plantations and to build a railroad in Costa Rica as early as 1872. In 1884 he contracted with the Costa Rican government to fund the national debt and to lay about 50 more miles of track. In return he received, for 99 years, full rights to these rail lines and 800,000 acres of virgin land, tax exempt for 20 years. By 1930 it had absorbed 20 rival firms and became the largest employer in Central America. As a foreign corporation of conspicuous size, United Fruit sometimes became the target of popular attacks. The Latin-American press often referred to it as el pulpo ("the octopus"), accusing it of exploiting labourers, bribing officials, and influencing governments during the period of Yankee "dollar diplomacy" in the first decades of the 20th century.

INDEXCARD, 5/8
 
Punch card, 1801

Invented by Joseph Marie Jacquard, an engineer and architect in Lyon, France, the punch cards laid the ground for automatic information processing. For the first time information was stored in binary format on perforated cardboard cards. In 1890 Hermann Hollerith used Joseph-Marie Jacquard's punch card technology for processing statistical data retrieved from the US census in 1890, thus speeding up data analysis from eight to three years. His application of Jacquard's invention was also used for programming computers and data processing until electronic data processing was introduced in the 1960's. - As with writing and calculating, administrative purposes account for the beginning of modern automatic data processing.

Paper tapes are a medium similar to Jacquard's punch cards. In 1857 Sir Charles Wheatstone applied them as a medium for the preparation, storage, and transmission of data for the first time. By their means, telegraph messages could be prepared off-line, sent ten times quicker (up to 400 words per minute), and stored. Later similar paper tapes were used for programming computers.

INDEXCARD, 6/8
 
Intelsat

Intelsat, the world's biggest communication satellite services provider, is still mainly owned by governments, but will be privatised during 2001, like Eutelsat. A measure already discussed 1996 at an OECD competition policy roundtable in 1996. Signatory of the Intelsat treaty for the United States of America is Comsat, a private company listed on the New York Stock Exchange. Additionally Comsat is one of the United Kingdom's signatories. Aggregated, Comsat owns about 20,5% of Intelsat already and is Intelsat's biggest shareholder. In September 1998 Comsat agreed to merge with Lockheed Martin. After the merger, Lockheed Martin will hold at least 49% of Comsat share capital.

http://www.intelsat.int/index.htm

http://www.eutelsat.org/
http://www.oecd.org//daf/clp/roundtables/SATS...
http://www.comsat.com/
http://www.nyse.com/
http://www.comsat.com/
http://www.comsat.com/
http://www.comsat.com/
http://www.comsat.com/
INDEXCARD, 7/8
 
NSFNet

Developed under the auspices of the National Science Foundation (NSF), NSFnet served as the successor of the ARPAnet as the main network linking universities and research facilities until 1995, when it was replaced it with a commercial backbone network. Being research networks, ARPAnet and NSFnet served as testing grounds for future networks.

INDEXCARD, 8/8