In Search of Reliable Internet Measurement Data

Newspapers and magazines frequently report growth rates of Internet usage, number of users, hosts, and domains that seem to be beyond all expectations. Growth rates are expected to accelerate exponentially. However, Internet measurement data are anything thant reliable and often quite fantastic constructs, that are nevertheless jumped upon by many media and decision makers because the technical difficulties in measuring Internet growth or usage are make reliable measurement techniques impossible.

Equally, predictions that the Internet is about to collapse lack any foundation whatsoever. The researchers at the Internet Performance Measurement and Analysis Project (IPMA) compiled a list of news items about Internet performance and statistics and a few responses to them by engineers.

Size and Growth

In fact, "today's Internet industry lacks any ability to evaluate trends, identity performance problems beyond the boundary of a single ISP (Internet service provider, M. S.), or prepare systematically for the growing expectations of its users. Historic or current data about traffic on the Internet infrastructure, maps depicting ... there is plenty of measurement occurring, albeit of questionable quality", says K. C. Claffy in his paper Internet measurement and data analysis: topology, workload, performance and routing statistics (http://www.caida.org/Papers/Nae/, Dec 6, 1999). Claffy is not an average researcher; he founded the well-known Cooperative Association for Internet Data Analysis (CAIDA).

So his statement is a slap in the face of all market researchers stating otherwise.
In a certain sense this is ridiculous, because since the inception of the ARPANet, the offspring of the Internet, network measurement was an important task. The very first ARPANet site was established at the University of California, Los Angeles, and intended to be the measurement site. There, Leonard Kleinrock further on worked on the development of measurement techniques used to monitor the performance of the ARPANet (cf. Michael and Ronda Hauben, Netizens: On the History and Impact of the Net). And in October 1991, in the name of the Internet Activities Board Vinton Cerf proposed guidelines for researchers considering measurement experiments on the Internet stated that the measurement of the Internet. This was due to two reasons. First, measurement would be critical for future development, evolution and deployment planning. Second, Internet-wide activities have the potential to interfere with normal operation and must be planned with care and made widely known beforehand.
So what are the reasons for this inability to evaluate trends, identity performance problems beyond the boundary of a single ISP? First, in early 1995, almost simultaneously with the worldwide introduction of the World Wide Web, the transition of the stewardship role of the National Science Foundation over the Internet into a competitive industry (bluntly spoken: its privatization) left no framework for adequate tracking and monitoring of the Internet. The early ISPs were not very interested in gathering and analyzing network performance data, they were struggling to meet demands of their rapidly increasing customers. Secondly, we are just beginning to develop reliable tools for quality measurement and analysis of bandwidth or performance. CAIDA aims at developing such tools.
"There are many estimates of the size and growth rate of the Internet that are either implausible, or inconsistent, or even clearly wrong", K. G. Coffman and Andrew, both members of different departments of AT & T Labs-Research, state something similar in their paper The Size and Growth Rate of the Internet, published in First Monday. There are some sources containing seemingly contradictory information on the size and growth rate of the Internet, but "there is no comprehensive source for information". They take a well-informed and refreshing look at efforts undertaken for measuring the Internet and dismantle several misunderstandings leading to incorrect measurements and estimations. Some measurements have such large error margins that you might better call them estimations, to say the least. This is partly due to the fact that data are not disclosed by every carrier and only fragmentarily available.
What is measured and what methods are used? Many studies are devoted to the number of users; others look at the number of computers connected to the Internet or count IP addresses. Coffman and Odlyzko focus on the sizes of networks and the traffic they carry to answer questions about the size and the growth of the Internet.
You get the clue of their focus when you bear in mind that the Internet is just one of many networks of networks; it is only a part of the universe of computer networks. Additionally, the Internet has public (unrestricted) and private (restricted) areas. Most studies consider only the public Internet, Coffman and Odlyzko consider the long-distance private line networks too: the corporate networks, the Intranets, because they are convinced (that means their assertion is put forward, but not accompanied by empirical data) that "the evolution of the Internet in the next few years is likely to be determined by those private networks, especially by the rate at which they are replaced by VPNs (Virtual Private Networks) running over the public Internet. Thus it is important to understand how large they are and how they behave." Coffman and Odlyzko check other estimates by considering the traffic generated by residential users accessing the Internet with a modem, traffic through public peering points (statistics for them are available through CAIDA and the National Laboratory for Applied Network Research), and calculating the bandwidth capacity for each of the major US providers of backbone services. They compare the public Internet to private line networks and offer interesting findings. The public Internet is currently far smaller, in both capacity and traffic, than the switched voice network (with an effective bandwidth of 75 Gbps at December 1997), but the private line networks are considerably larger in aggregate capacity than the Internet: about as large as the voice network in the U. S. (with an effective bandwidth of about 330 Gbps at December 1997), they carry less traffic. On the other hand, the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet. In the future, growth in Internet traffic will predominantly derive from people staying longer and from multimedia applications, because they consume more bandwidth, both are the reason for unanticipated amounts of data traffic.

Hosts

The Internet Software Consortium's Internet Domain Survey is one of the most known efforts to count the number of hosts on the Internet. Happily the ISC informs us extensively about the methods used for measurements, a policy quite rare on the Web. For the most recent survey the number of IP addresses that have been assigned a name were counted. At first sight it looks simple to get the accurate number of hosts, but practically an assigned IP address does not automatically correspond an existing host. In order to find out, you have to send a kind of message to the host in question and wait for a reply. You do this with the PING utility. (For further explanations look here: Art. PING, in: Connected: An Internet Encyclopaedia) But to do this for every registered IP address is an arduous task, so ISC just pings a 1% sample of all hosts found and make a projection to all pingable hosts. That is ISC's new method; its old method, still used by RIPE, has been to count the number of domain names that had IP addresses assigned to them, a method that proved to be not very useful because a significant number of hosts restricts download access to their domain data.
Despite the small sample, this method has at least one flaw: ISC's researchers just take network numbers into account that have been entered into the tables of the IN-ADDR.ARPA domain, and it is possible that not all providers know of these tables. A similar method is used for Telcordia's Netsizer.

Internet Weather

Like daily weather, traffic on the Internet, the conditions for data flows, are monitored too, hence called Internet weather. One of the most famous Internet weather report is from The Matrix, Inc. Another one is the Internet Traffic Report displaying traffic in values between 0 and 100 (high values indicate fast and reliable connections). For weather monitoring response ratings from servers all over the world are used. The method used is to "ping" servers (as for host counts, e. g.) and to compare response times to past ones and to response times of servers in the same reach.

Hits, Page Views, Visits, and Users

Let us take a look at how these hot lists of most visited Web sites may be compiled. I say, may be, because the methods used for data retrieval are mostly not fully disclosed.
For some years it was seemingly common sense to report requested files from a Web site, so called "hits". A method not very useful, because a document can consist of several files: graphics, text, etc. Just compile a document from some text and some twenty flashy graphical files, put it on the Web and you get twenty-one hits per visit; the more graphics you add, the more hits and traffic (not automatically to your Web site) you generate.
In the meantime page views, also called page impressions are preferred, which are said to avoid these flaws. But even page views are not reliable. Users might share computers and corresponding IP addresses and host names with others, she/he might access not the site, but a cached copy from the Web browser or from the ISP's proxy server. So the server might receive just one page request although several users viewed a document.

Especially the editors of some electronic journals (e-journals) rely on page views as a kind of ratings or circulation measure, Rick Marin reports in the New York Times. Click-through rates - a quantitative measure - are used as a substitute for something of intrinsically qualitative nature: the importance of a column to its readers, e. g. They may read a journal just for a special column and not mind about the journal's other contents. Deleting this column because of not receiving enough visits may cause these readers to turn their backs on their journal.
More advanced, but just slightly better at best, is counting visits, the access of several pages of a Web site during one session. The problems already mentioned apply here too. To avoid them, newspapers, e.g., establish registration services, which require password authentication and therefore prove to be a kind of access obstacle.
But there is a different reason for these services. For content providers users are virtual users, not unique persons, because, as already mentioned, computers and IP addresses can be shared and the Internet is a client-server system; in a certain sense, in fact computers communicate with each other. Therefore many content providers are eager to get to know more about users accessing their sites. On-line registration forms or WWW user surveys are obvious methods of collecting additional data, sure. But you cannot be sure that information given by users is reliable, you can just rely on the fact that somebody visited your Web site. Despite these obstacles, companies increasingly use data capturing. As with registration services cookies come here into play.

For

If you like to play around with Internet statistics instead, you can use Robert Orenstein's Web Statistics Generator to make irresponsible predictions or visit the Internet Index, an occasional collection of seemingly statistical facts about the Internet.

Measuring the Density of IP Addresses

Measuring the Density of IP Addresses or domain names makes the geography of the Internet visible. So where on earth is the most density of IP addresses or domain names? There is no global study about the Internet's geographical patterns available yet, but some regional studies can be found. The Urban Research Initiative and Martin Dodge and Narushige Shiode from the Centre for Advanced Spatial Analysis at the University College London have mapped the Internet address space of New York, Los Angeles and the United Kingdom (http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/telecom.html and http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/gisruk98.html).
Dodge and Shiode used data on the ownership of IP addresses from RIPE, Europe's most important registry for Internet numbers.





TEXTBLOCK 1/5 // URL: http://world-information.org/wio/infostructure/100437611791/100438658352
 
Challenges for Copyright by ICT: Digital Content Providers

Providers of digital information might be confronted with copyright related problems when using some of the special features of hypertext media like frames and hyperlinks (which both use third party content available on the Internet to enhance a webpage or CD ROM), or operate a search engine or online directory on their website.

Framing

Frames are often used to help define, and navigate within, a content provider's website. Still, when they are used to present (copyrighted) third party material from other sites issues of passing off and misleading or deceptive conduct, as well as copyright infringement, immediately arise.

Hyperlinking

It is generally held that the mere creation of a hyperlink does not, of itself, infringe copyright as usually the words indicating a link or the displayed URL are unlikely to be considered a "work". Nevertheless if a link is clicked on the users browser will download a full copy of the material at the linked address creating a copy in the RAM of his computer courtesy of the address supplied by the party that published the link. Although it is widely agreed that the permission to download material over the link must be part of an implied license granted by the person who has made the material available on the web in the first place, the scope of this implied license is still the subject of debate. Another option that has been discussed is to consider linking fair use.

Furthermore hyperlinks, and other "information location tools", like online directories or search engines could cause their operators trouble if they refer or link users to a site that contains infringing material. In this case it is yet unclear whether providers can be held liable for infringement.

TEXTBLOCK 2/5 // URL: http://world-information.org/wio/infostructure/100437611725/100438659590
 
Timeline 1970-2000 AD

1971 IBM's work on the Lucifer cipher and the work of the NSA lead to the U.S. Data Encryption Standard (= DES)

1976 Whitfield Diffie and Martin Hellman publish their book New Directions in Cryptography, playing with the idea of public key cryptography

1977/78 the RSA algorithm is developed by Ron Rivest, Adi Shamir and Leonard M. Adleman and is published

1984 Congress passes Comprehensive Crime Control Act

- The Hacker Quarterly is founded

1986 Computer Fraud and Abuse Act is passed in the USA

- Electronic Communications Privacy Act

1987 Chicago prosecutors found Computer Fraud and Abuse Task Force

1988 U.S. Secret Service covertly videotapes a hacker convention

1989 NuPrometheus League distributes Apple Computer software

1990 - IDEA, using a 128-bit key, is supposed to replace DES

- Charles H. Bennett and Gilles Brassard publish their work on Quantum Cryptography

- Martin Luther King Day Crash strikes AT&T long-distance network nationwide


1991 PGP (= Pretty Good Privacy) is released as freeware on the Internet, soon becoming worldwide state of the art; its creator is Phil Zimmermann

- one of the first conferences for Computers, Freedom and Privacy takes place in San Francisco

- AT&T phone crash; New York City and various airports get affected

1993 the U.S. government announces to introduce the Clipper Chip, an idea that provokes many political discussions during the following years

1994 Ron Rivest releases another algorithm, the RC5, on the Internet

- the blowfish encryption algorithm, a 64-bit block cipher with a key-length up to 448 bits, is designed by Bruce Schneier

1990s work on quantum computer and quantum cryptography

- work on biometrics for authentication (finger prints, the iris, smells, etc.)

1996 France liberates its cryptography law: one now can use cryptography if registered

- OECD issues Cryptography Policy Guidelines; a paper calling for encryption exports-standards and unrestricted access to encryption products

1997 April European Commission issues Electronic Commerce Initiative, in favor of strong encryption

1997 June PGP 5.0 Freeware widely available for non-commercial use

1997 June 56-bit DES code cracked by a network of 14,000 computers

1997 August U.S. judge assesses encryption export regulations as violation of the First Amendment

1998 February foundation of Americans for Computer Privacy, a broad coalition in opposition to the U.S. cryptography policy

1998 March PGP announces plans to sell encryption products outside the USA

1998 April NSA issues a report about the risks of key recovery systems

1998 July DES code cracked in 56 hours by researchers in Silicon Valley

1998 October Finnish government agrees to unrestricted export of strong encryption

1999 January RSA Data Security, establishes worldwide distribution of encryption product outside the USA

- National Institute of Standards and Technologies announces that 56-bit DES is not safe compared to Triple DES

- 56-bit DES code is cracked in 22 hours and 15 minutes

1999 May 27 United Kingdom speaks out against key recovery

1999 Sept: the USA announce to stop the restriction of cryptography-exports

2000 as the German government wants to elaborate a cryptography-law, different organizations start a campaign against that law

- computer hackers do no longer only visit websites and change little details there but cause breakdowns of entire systems, producing big economic losses

for further information about the history of cryptography see:
http://www.clark.net/pub/cme/html/timeline.html
http://www.math.nmsu.edu/~crypto/Timeline.html
http://fly.hiwaay.net/~paul/cryptology/history.html
http://www.achiever.com/freehmpg/cryptology/hocryp.html
http://all.net/books/ip/Chap2-1.html
http://cryptome.org/ukpk-alt.htm
http://www.iwm.org.uk/online/enigma/eni-intro.htm
http://www.achiever.com/freehmpg/cryptology/cryptofr.html
http://www.cdt.org/crypto/milestones.shtml

for information about hacker's history see:
http://www.farcaster.com/sterling/chronology.htm:

TEXTBLOCK 3/5 // URL: http://world-information.org/wio/infostructure/100437611776/100438658960
 
Copyright Management and Control Systems: Metering

Metering systems allow copyright owners to ensure payment to or at the time of a consumer's use of the work. Those technologies include:

Hardware Devices

Those have to be acquired and installed by the user. For example under a debit card approach, the user purchases a debit card that is pre-loaded with a certain amount of value. After installation, the debit card is debited automatically as the user consumes copyrighted works.

Digital Certificates

Hereby a certification authority issues to a user an electronic file that identifies the user as the owner of a public key. Those digital certificates, besides information on the identity of the holder can also include rights associated with a particular person. Vendors can so control access system resources, including copyrighted files, by making them available only to users who can provide a digital certificate with specified rights (e.g. access, use, downloading).

Centralized Computing

Under this approach all of the executables remain at the server. Each time the executable is used, the user's computer must establish contact with the server, allowing the central computer to meter access.

Access Codes

Access code devices permit users to "unlock" protective mechanisms (e.g. date bombs or functional limitations) embedded in copyrighted works. Copyright owners can meter the usage of their works, either by unlocking the intellectual property for a one-time license fee or by requiring periodic procurement of access codes.

Copyright Clearinghouses

Under this approach copyright owners would commission "clearinghouses" with the ability to license the use of their works. A user would pay a license fee to obtain rights concerning the intellectual property.


TEXTBLOCK 4/5 // URL: http://world-information.org/wio/infostructure/100437611725/100438659615
 
Implant technology

Kevin Warwick at the University of Reading works on implant technologies which could enhance or modify functions of the limbs and the brain, or bring back functionalities lost, for example, in an accident or as a consequence of a stroke. Implants are also used for identification in "intelligent buildings" where they serve to control "personnel flows". However, the real potential of electronic implants seems to lie in the field of electronic drugs. The basics of the brain computer interface are already explored, and there are now efforts to electronically modify the function of the mind. Large software and IT companies are sponsoring this research which could result in the commercialisation of electronic drugs, functioning as anti-depressants, pain killers and the like. Evidently, the same technologies can also be used as narcotic drugs or to modify people's behaviour. The functioning of body and mind can be adapted to pre-defined principles and ideals, their autonomous existence reduced and subjected to direct outside control.

TEXTBLOCK 5/5 // URL: http://world-information.org/wio/infostructure/100437611777/100438658731
 
Cooperative Association of Internet Data Analysis (CAIDA)

Based at the University of California's San Diego Supercomputer Center, CAIDA supports cooperative efforts among the commercial, government and research communities aimed at promoting a scalable, robust Internet infrastructure. It is sponsored by the Defense Advanced Research Project Agency (DARPA) through its Next Generation Internet program, by the National Science Foundation, Cisco, Inc., and Above.net.

INDEXCARD, 1/4
 
PGP

A cryptographic software application that was developed by Phil Zimmerman at the Massachusetts Institute of Technology. Pretty Good Privacy (PGP) is a cryptographic product family that enables people to securely exchange messages, and to secure files, disk volumes and network connections with both privacy and strong authentication.

INDEXCARD, 2/4
 
AT&T

AT&T Corporation provides voice, data and video communications services to large and small businesses, consumers and government entities. AT&T and its subsidiaries furnish domestic and international long distance, regional, local and wireless communications services, cable television and Internet communications services. AT&T also provides billing, directory and calling card services to support its communications business. AT&T's primary lines of business are business services, consumer services, broadband services and wireless services. In addition, AT&T's other lines of business include network management and professional services through AT&T Solutions and international operations and ventures. In June 2000, AT&T completed the acquisition of MediaOne Group. With the addition of MediaOne's 5 million cable subscribers, AT&T becomes the country's largest cable operator, with about 16 million customers on the systems it owns and operates, which pass nearly 28 million American homes. (source: Yahoo)

Slogan: "It's all within your reach"

Business indicators:

Sales 1999: $ 62.391 bn (+ 17,2 % from 1998)

Market capitalization: $ 104 bn

Employees: 107,800

Corporate website: http://www.att.com http://www.att.com/
INDEXCARD, 3/4
 
DMCA

The DMCA (Digital Millennium Copyright Act) was signed into law by U.S. President Clinton in 1998 and implements the two 1996 WIPO treaties (WIPO Performances and Phonograms Treaty and WIPO Copyright Treaty). Besides other issues the DMCA addresses the influence of new technologies on traditional copyright. Of special interest in the context of the digitalization of intellectual property are the titles no. 2, which refers to the limitation on the liability of online service providers for copyright infringement (when certain conditions are met), no. 3, that creates an exemption for making a copy of a computer program in case of maintenance and repair, and no. 4 which is concerned with the status of libraries and webcasting. The DCMA has been widely criticized for giving copyright-holders even more power and damage the rights and freedom of consumers, technological innovation, and the free market for information.

INDEXCARD, 4/4