Online data capturing

Hardly a firm today can afford not to engage in electronic commerce if it does not want to be swept out of business by competitors. "Information is everything" has become something like the Lord's prayer of the New Economy. But how do you get information about your customer online? Who are the people who visit a website, where do they come from, what are they looking for? How much money do they have, what might they want to buy? These are key questions for a company doing electronic business. Obviously not all of this information can be obtained by monitoring the online behaviour of web users, but there are always little gimmicks that, when combined with common tracking technologies, can help to get more detailed information about a potential customer. These are usually online registration forms, either required for entry to a site, or competitions, sometimes a combination of the two. Obviously, if you want to win that weekend trip to New York, you want to provide your contact details.

The most common way of obtaining information about a user online is a cookie. However, a cookie by itself is not sufficient to identify a user personally. It merely identifies the computer to the server by providing its IP number. Only combined with other data extraction techniques, such as online registration, can a user be identified personally ("Register now to get the full benefit of xy.com. It's free!")

But cookies record enough information to fine-tune advertising strategies according to a user's preferences and interests, e.g. by displaying certain commercial banners rather than others. For example, if a user is found to respond to a banner of a particular kind, he / she may find two of them at the next visit. Customizing the offers on a website to the particular user is part of one-to-one marketing, a type of direct marketing. But one-to-one marketing can go further than this. It can also offer different prices to different users. This was done by Amazon.com in September 2000, when fist-time visitors were offered cheaper prices than regular customers.

One-to-one marketing can create very different realities that undermine traditional concepts of demand and supply. The ideal is a "frictionless market", where the differential between demand and supply is progressively eliminated. If a market is considered a structure within which demand / supply differentials are negotiated, this amounts to the abolition of the established notion of the nature of a market. Demand and supply converge, desire and it fulfilment coincide. In the end, there is profit without labour. However, such a structure is a hermetic structure of unfreedom.

It can only function when payment is substituted by credit, and the exploitation of work power by the exploitation of data. In fact, in modern economies there is great pressure to increase spending on credit. Using credit cards and taking up loans generates a lot of data around a person's economic behaviour, while at the same restricting the scope of social activity and increasing dependence. On the global level, the consequences of credit spirals can be observed in many of the developing countries that have had to abandon most of their political autonomy. As the data body economy advances, this is also the fate of people in western societies when they are structurally driven into credit spending. It shows that data bodies are not politically neutral.

The interrelation between data, profit and unfreedom is frequently overlooked by citizens and customers. Any company in a modern economy will apply data collecting strategies for profit, with dependence and unfreedom as a "secondary effect". The hunger for data has made IT companies eager to profit from e-business rather resourceful. "Getting to know the customer" - this is a catchphrase that is heard frequently, and which suggests that there are no limits to what a company may want to about a customer. In large online shops, such as amazon.com, where customer's identity is accurately established by the practice of paying with credit cards, an all business happens online, making it easy for the company to accurately profile the customers.

But there are more advanced and effective ways of identification. The German company Sevenval has developed a new way of customer tracking which works with "virtual domains". Every visitor of a website is assigned an 33-digit identification number which the browser understands as part of the www address, which will then read something like http://XCF49BEB7E97C00A328BF562BAAC75FB2.sevenval.com. Therefore, this tracking method, which is advertised by Sevenval as a revolutionary method capable of tracking the exact and complete path of a user on a website, can not be simple switched off. In addition, the method makes it possible for the identity of a user can travel with him when he / she visits one of the other companies linked to the site in question. As in the case of cookies, this tracking method by itself is not sufficient to identify a user personally. Such an identification only occurs once a customer pays with a credit card, or decides to participate in a draw, or voluntarily completes a registration form.

Bu there are much less friendly ways of extracting data from a user and feeding the data body. Less friendly means: these methods monitor users in situations where the latter are likely not to want to be monitored. Monitoring therefore takes place in a concealed manner. One of these monitoring methods are so-called web bugs. These are tiny graphics, not more than 1 x 1 pixel in size, and therefore invisible on a screen, capable of monitoring an unsuspecting user's e-mails or movements on a website. Leading corporations such as Barnes and Noble, eToys, Cooking.com, and Microsoft have all used web bugs in advertising campaigns. Richard Smith has compiled a web bugs FAQ site that contains detailed information and examples of web bugs in use.

Bugs monitoring users have also been packaged in seemingly harmless toys made available on the Internet. For example, Comet Systems offers cursor images which have been shown to collect user data and send them back to the company's server. These little images replace the customary white arrow of a mouse with a little image of a baseball, a cat, an UFO, etc. large enough to carry a bug collecting user information. The technology is offered as a marketing tool to companies looking for a "fun, new way to interact with their audience".

The cursor image technology relies on what is called a GUID (global unique identifier). This is an identification number which is assigned to a customer at the time of registration, or when downloading a product. Many among the online community were alarmed when in 1999 it was discovered that Microsoft assigned GUIDS without their customer's knowledge. Following protests, the company was forced to change the registration procedure, assuring that under no circumstances would these identification numbers be used for tracking or marketing.

However, in the meantime, another possible infringement on user anonymity by Microsoft was discovered, when it as found out that MS Office documents, such as Word, Excel or Powerpoint, contain a bug that is capable of tracking the documents as they are sent through the net. The bug sends information about the user who opens the document back to the originating server. A document that contains the bug can be tracked across the globe, through thousands of stopovers. In detailed description of the bug and how it works can be found at the Privacy Foundation's website. Also, there is an example of such a bug at the Privacy Center of the University of Denver.

Of course there are many other ways of collecting users' data and creating appropriating data bodies which can then be used for economic purposes. Indeed, as Bill Gates commented, "information is the lifeblood of business". The electronic information networks are becoming the new frontier of capitalism.

TEXTBLOCK 1/4 // URL: http://world-information.org/wio/infostructure/100437611761/100438659686
 
In Search of Reliable Internet Measurement Data

Newspapers and magazines frequently report growth rates of Internet usage, number of users, hosts, and domains that seem to be beyond all expectations. Growth rates are expected to accelerate exponentially. However, Internet measurement data are anything thant reliable and often quite fantastic constructs, that are nevertheless jumped upon by many media and decision makers because the technical difficulties in measuring Internet growth or usage are make reliable measurement techniques impossible.

Equally, predictions that the Internet is about to collapse lack any foundation whatsoever. The researchers at the Internet Performance Measurement and Analysis Project (IPMA) compiled a list of news items about Internet performance and statistics and a few responses to them by engineers.

Size and Growth

In fact, "today's Internet industry lacks any ability to evaluate trends, identity performance problems beyond the boundary of a single ISP (Internet service provider, M. S.), or prepare systematically for the growing expectations of its users. Historic or current data about traffic on the Internet infrastructure, maps depicting ... there is plenty of measurement occurring, albeit of questionable quality", says K. C. Claffy in his paper Internet measurement and data analysis: topology, workload, performance and routing statistics (http://www.caida.org/Papers/Nae/, Dec 6, 1999). Claffy is not an average researcher; he founded the well-known Cooperative Association for Internet Data Analysis (CAIDA).

So his statement is a slap in the face of all market researchers stating otherwise.
In a certain sense this is ridiculous, because since the inception of the ARPANet, the offspring of the Internet, network measurement was an important task. The very first ARPANet site was established at the University of California, Los Angeles, and intended to be the measurement site. There, Leonard Kleinrock further on worked on the development of measurement techniques used to monitor the performance of the ARPANet (cf. Michael and Ronda Hauben, Netizens: On the History and Impact of the Net). And in October 1991, in the name of the Internet Activities Board Vinton Cerf proposed guidelines for researchers considering measurement experiments on the Internet stated that the measurement of the Internet. This was due to two reasons. First, measurement would be critical for future development, evolution and deployment planning. Second, Internet-wide activities have the potential to interfere with normal operation and must be planned with care and made widely known beforehand.
So what are the reasons for this inability to evaluate trends, identity performance problems beyond the boundary of a single ISP? First, in early 1995, almost simultaneously with the worldwide introduction of the World Wide Web, the transition of the stewardship role of the National Science Foundation over the Internet into a competitive industry (bluntly spoken: its privatization) left no framework for adequate tracking and monitoring of the Internet. The early ISPs were not very interested in gathering and analyzing network performance data, they were struggling to meet demands of their rapidly increasing customers. Secondly, we are just beginning to develop reliable tools for quality measurement and analysis of bandwidth or performance. CAIDA aims at developing such tools.
"There are many estimates of the size and growth rate of the Internet that are either implausible, or inconsistent, or even clearly wrong", K. G. Coffman and Andrew, both members of different departments of AT & T Labs-Research, state something similar in their paper The Size and Growth Rate of the Internet, published in First Monday. There are some sources containing seemingly contradictory information on the size and growth rate of the Internet, but "there is no comprehensive source for information". They take a well-informed and refreshing look at efforts undertaken for measuring the Internet and dismantle several misunderstandings leading to incorrect measurements and estimations. Some measurements have such large error margins that you might better call them estimations, to say the least. This is partly due to the fact that data are not disclosed by every carrier and only fragmentarily available.
What is measured and what methods are used? Many studies are devoted to the number of users; others look at the number of computers connected to the Internet or count IP addresses. Coffman and Odlyzko focus on the sizes of networks and the traffic they carry to answer questions about the size and the growth of the Internet.
You get the clue of their focus when you bear in mind that the Internet is just one of many networks of networks; it is only a part of the universe of computer networks. Additionally, the Internet has public (unrestricted) and private (restricted) areas. Most studies consider only the public Internet, Coffman and Odlyzko consider the long-distance private line networks too: the corporate networks, the Intranets, because they are convinced (that means their assertion is put forward, but not accompanied by empirical data) that "the evolution of the Internet in the next few years is likely to be determined by those private networks, especially by the rate at which they are replaced by VPNs (Virtual Private Networks) running over the public Internet. Thus it is important to understand how large they are and how they behave." Coffman and Odlyzko check other estimates by considering the traffic generated by residential users accessing the Internet with a modem, traffic through public peering points (statistics for them are available through CAIDA and the National Laboratory for Applied Network Research), and calculating the bandwidth capacity for each of the major US providers of backbone services. They compare the public Internet to private line networks and offer interesting findings. The public Internet is currently far smaller, in both capacity and traffic, than the switched voice network (with an effective bandwidth of 75 Gbps at December 1997), but the private line networks are considerably larger in aggregate capacity than the Internet: about as large as the voice network in the U. S. (with an effective bandwidth of about 330 Gbps at December 1997), they carry less traffic. On the other hand, the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet. In the future, growth in Internet traffic will predominantly derive from people staying longer and from multimedia applications, because they consume more bandwidth, both are the reason for unanticipated amounts of data traffic.

Hosts

The Internet Software Consortium's Internet Domain Survey is one of the most known efforts to count the number of hosts on the Internet. Happily the ISC informs us extensively about the methods used for measurements, a policy quite rare on the Web. For the most recent survey the number of IP addresses that have been assigned a name were counted. At first sight it looks simple to get the accurate number of hosts, but practically an assigned IP address does not automatically correspond an existing host. In order to find out, you have to send a kind of message to the host in question and wait for a reply. You do this with the PING utility. (For further explanations look here: Art. PING, in: Connected: An Internet Encyclopaedia) But to do this for every registered IP address is an arduous task, so ISC just pings a 1% sample of all hosts found and make a projection to all pingable hosts. That is ISC's new method; its old method, still used by RIPE, has been to count the number of domain names that had IP addresses assigned to them, a method that proved to be not very useful because a significant number of hosts restricts download access to their domain data.
Despite the small sample, this method has at least one flaw: ISC's researchers just take network numbers into account that have been entered into the tables of the IN-ADDR.ARPA domain, and it is possible that not all providers know of these tables. A similar method is used for Telcordia's Netsizer.

Internet Weather

Like daily weather, traffic on the Internet, the conditions for data flows, are monitored too, hence called Internet weather. One of the most famous Internet weather report is from The Matrix, Inc. Another one is the Internet Traffic Report displaying traffic in values between 0 and 100 (high values indicate fast and reliable connections). For weather monitoring response ratings from servers all over the world are used. The method used is to "ping" servers (as for host counts, e. g.) and to compare response times to past ones and to response times of servers in the same reach.

Hits, Page Views, Visits, and Users

Let us take a look at how these hot lists of most visited Web sites may be compiled. I say, may be, because the methods used for data retrieval are mostly not fully disclosed.
For some years it was seemingly common sense to report requested files from a Web site, so called "hits". A method not very useful, because a document can consist of several files: graphics, text, etc. Just compile a document from some text and some twenty flashy graphical files, put it on the Web and you get twenty-one hits per visit; the more graphics you add, the more hits and traffic (not automatically to your Web site) you generate.
In the meantime page views, also called page impressions are preferred, which are said to avoid these flaws. But even page views are not reliable. Users might share computers and corresponding IP addresses and host names with others, she/he might access not the site, but a cached copy from the Web browser or from the ISP's proxy server. So the server might receive just one page request although several users viewed a document.

Especially the editors of some electronic journals (e-journals) rely on page views as a kind of ratings or circulation measure, Rick Marin reports in the New York Times. Click-through rates - a quantitative measure - are used as a substitute for something of intrinsically qualitative nature: the importance of a column to its readers, e. g. They may read a journal just for a special column and not mind about the journal's other contents. Deleting this column because of not receiving enough visits may cause these readers to turn their backs on their journal.
More advanced, but just slightly better at best, is counting visits, the access of several pages of a Web site during one session. The problems already mentioned apply here too. To avoid them, newspapers, e.g., establish registration services, which require password authentication and therefore prove to be a kind of access obstacle.
But there is a different reason for these services. For content providers users are virtual users, not unique persons, because, as already mentioned, computers and IP addresses can be shared and the Internet is a client-server system; in a certain sense, in fact computers communicate with each other. Therefore many content providers are eager to get to know more about users accessing their sites. On-line registration forms or WWW user surveys are obvious methods of collecting additional data, sure. But you cannot be sure that information given by users is reliable, you can just rely on the fact that somebody visited your Web site. Despite these obstacles, companies increasingly use data capturing. As with registration services cookies come here into play.

For

If you like to play around with Internet statistics instead, you can use Robert Orenstein's Web Statistics Generator to make irresponsible predictions or visit the Internet Index, an occasional collection of seemingly statistical facts about the Internet.

Measuring the Density of IP Addresses

Measuring the Density of IP Addresses or domain names makes the geography of the Internet visible. So where on earth is the most density of IP addresses or domain names? There is no global study about the Internet's geographical patterns available yet, but some regional studies can be found. The Urban Research Initiative and Martin Dodge and Narushige Shiode from the Centre for Advanced Spatial Analysis at the University College London have mapped the Internet address space of New York, Los Angeles and the United Kingdom (http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/telecom.html and http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/gisruk98.html).
Dodge and Shiode used data on the ownership of IP addresses from RIPE, Europe's most important registry for Internet numbers.





TEXTBLOCK 2/4 // URL: http://world-information.org/wio/infostructure/100437611791/100438658352
 
Who owns the Internet and who is in charge?

The Internet/Matrix still depends heavily on public infrastructure and there is no dedicated owner of the whole Internet/Matrix, but the networks it consists of are run and owned by corporations and institutions. Access to the Internet is usually provided by Internet Service Providers (ISPs) for a monthly fee. Each network is owned by someone and has a network operation center from where it is centrally controlled, but the Internet/Matrix is not owned by any single authority and has no network operation center of its own. No legal authority determines how and where networks can be connected together, this is something the managers of networks have to agree about. So there is no way to ever gain ultimate control of the Matrix/Internet.
The in some respects decentralized Matrix/Internet architecture and administration do not imply that there are no authorities for oversight and common standards for sustaining basic operations, for administration: There are authorities for IP number and domain name registrations, e.g.
Ever since the organizational structures for Internet administration have changed according to the needs to be addressed. Up to now, administration of the Internet is a collaborative undertaking of several loose cooperative bodies with no strict hierarchy of authority. These bodies make decisions on common guidelines, as communication protocols, e.g., cooperatively, so that compatibility of software is guaranteed. But they have no binding legal authority, nor can they enforce the standards they have agreed upon, nor are they wholly representative for the community of Internet users. The Internet has no official governing body or organization; most parts are still administered by volunteers.
Amazingly, there seems to be an unspoken and uncodified consent of what is allowed and what is forbidden on the Internet that is widely accepted. Codifications, as the so-called Netiquette, are due to individual efforts and mostly just expressively stating the prevailing consent. Violations of accepted standards are fiercely rejected, as reactions to misbehavior in mailing lists and newsgroups prove daily.
Sometimes violations not already subject to law become part of governmental regulations, as it was the case with spamming, the unsolicited sending of advertising mail messages. But engineers proved to be quicker and developed software against spamming. So, in some respects, the Internet is self-regulating, indeed.
For a detailed report on Internet governance, click here.

TEXTBLOCK 3/4 // URL: http://world-information.org/wio/infostructure/100437611791/100438658447
 
Virtual cartels; mergers

In parallel to the deregulation of markets, there has been a trend towards large-scale mergers which ridicules dreams of increased competition.

Recent mega-mergers and acquisitions include

SBC Communications - Ameritech, $ 72,3 bn

Bell Atlantic - GTE, $ 71,3

AT&T - Media One, $ 63,1

AOL - Time Warner, $ 165 bn

MCI Worldcom - Spring, $ 129 bn

The total value of all major mergers since the beginnings of the 1990s has been 20 trillion Dollars, 2,5 times the size of the USA's GIP.

The AOL- Time Warner reflects a trend which can be observed everywhere: the convergence of the ICT and the content industries. This represents the ultimate advance in complete market domination, and a alarming threat to independent content.

"Is TIME going to write something negative about AOL? Will AOL be able to offer anything other than CNN sources? Is the Net becoming as silly and unbearable as television?"

(Detlev Borchers, journalist)

TEXTBLOCK 4/4 // URL: http://world-information.org/wio/infostructure/100437611709/100438658959
 
Division of labor

The term refers to the separation of a work process into a number of tasks, with each task performed by a separate person or group of persons. It is most often applied to mass production systems, where it is one of the basic organizing principles of the assembly line. Breaking down work into simple, repetitive tasks eliminates unnecessary motion and limits the handling of tools and parts. The consequent reduction in production time and the ability to replace craftsmen with lower-paid, unskilled workers result in lower production costs and a less expensive final product. The Scottish economist Adam Smith saw in this splitting of tasks a key to economic progress by providing a cheaper and more efficient means of producing economic goods.

INDEXCARD, 1/2
 
Writing

Writing and calculating came into being at about the same time. The first pictographs carved into clay tablets are used for administrative purposes. As an instrument for the administrative bodies of early empires, who began to rely on the collection, storage, processing and transmission of data, the skill of writing was restricted to a few. Being more or less separated tasks, writing and calculating converge in today's computers.

Letters are invented so that we might be able to converse even with the absent, says Saint Augustine. The invention of writing made it possible to transmit and store information. No longer the ear predominates; face-to-face communication becomes more and more obsolete for administration and bureaucracy. Standardization and centralization become the constituents of high culture and vast empires as Sumer and China.

INDEXCARD, 2/2