1950s: The Beginnings of Artificial Intelligence (AI) Research

With the development of the electronic computer in 1941 and the stored program computer in 1949 the conditions for research in artificial intelligence (AI) were given. Still, the observation of a link between human intelligence and machines was not widely observed until the late 1950s.

A discovery that influenced much of the early development of AI was made by Norbert Wiener. He was one of the first to theorize that all intelligent behavior was the result of feedback mechanisms. Mechanisms that could possibly be simulated by machines. A further step towards the development of modern AI was the creation of The Logic Theorist. Designed by Newell and Simon in 1955 it may be considered the first AI program.

The person who finally coined the term artificial intelligence and is regarded as the father of AI is John McCarthy. In 1956 he organized a conference "The Dartmouth summer research project on artificial intelligence" to draw the talent and expertise of others interested in machine intelligence for a month of brainstorming. In the following years AI research centers began forming at the Carnegie Mellon University as well as the Massachusetts Institute of Technology (MIT) and new challenges were faced: 1) the creation of systems that could efficiently solve problems by limiting the search and 2) the construction of systems that could learn by themselves.

One of the results of the intensified research in AI was a novel program called The General Problem Solver, developed by Newell and Simon in 1957 (the same people who had created The Logic Theorist). It was an extension of Wiener's feedback principle and capable of solving a greater extent of common sense problems. While more programs were developed a major breakthrough in AI history was the creation of the LISP (LISt Processing) language by John McCarthy in 1958. It was soon adopted by many AI researchers and is still in use today.


TEXTBLOCK 1/3 // URL: http://world-information.org/wio/infostructure/100437611663/100438659360
 
In Search of Reliable Internet Measurement Data

Newspapers and magazines frequently report growth rates of Internet usage, number of users, hosts, and domains that seem to be beyond all expectations. Growth rates are expected to accelerate exponentially. However, Internet measurement data are anything thant reliable and often quite fantastic constructs, that are nevertheless jumped upon by many media and decision makers because the technical difficulties in measuring Internet growth or usage are make reliable measurement techniques impossible.

Equally, predictions that the Internet is about to collapse lack any foundation whatsoever. The researchers at the Internet Performance Measurement and Analysis Project (IPMA) compiled a list of news items about Internet performance and statistics and a few responses to them by engineers.

Size and Growth

In fact, "today's Internet industry lacks any ability to evaluate trends, identity performance problems beyond the boundary of a single ISP (Internet service provider, M. S.), or prepare systematically for the growing expectations of its users. Historic or current data about traffic on the Internet infrastructure, maps depicting ... there is plenty of measurement occurring, albeit of questionable quality", says K. C. Claffy in his paper Internet measurement and data analysis: topology, workload, performance and routing statistics (http://www.caida.org/Papers/Nae/, Dec 6, 1999). Claffy is not an average researcher; he founded the well-known Cooperative Association for Internet Data Analysis (CAIDA).

So his statement is a slap in the face of all market researchers stating otherwise.
In a certain sense this is ridiculous, because since the inception of the ARPANet, the offspring of the Internet, network measurement was an important task. The very first ARPANet site was established at the University of California, Los Angeles, and intended to be the measurement site. There, Leonard Kleinrock further on worked on the development of measurement techniques used to monitor the performance of the ARPANet (cf. Michael and Ronda Hauben, Netizens: On the History and Impact of the Net). And in October 1991, in the name of the Internet Activities Board Vinton Cerf proposed guidelines for researchers considering measurement experiments on the Internet stated that the measurement of the Internet. This was due to two reasons. First, measurement would be critical for future development, evolution and deployment planning. Second, Internet-wide activities have the potential to interfere with normal operation and must be planned with care and made widely known beforehand.
So what are the reasons for this inability to evaluate trends, identity performance problems beyond the boundary of a single ISP? First, in early 1995, almost simultaneously with the worldwide introduction of the World Wide Web, the transition of the stewardship role of the National Science Foundation over the Internet into a competitive industry (bluntly spoken: its privatization) left no framework for adequate tracking and monitoring of the Internet. The early ISPs were not very interested in gathering and analyzing network performance data, they were struggling to meet demands of their rapidly increasing customers. Secondly, we are just beginning to develop reliable tools for quality measurement and analysis of bandwidth or performance. CAIDA aims at developing such tools.
"There are many estimates of the size and growth rate of the Internet that are either implausible, or inconsistent, or even clearly wrong", K. G. Coffman and Andrew, both members of different departments of AT & T Labs-Research, state something similar in their paper The Size and Growth Rate of the Internet, published in First Monday. There are some sources containing seemingly contradictory information on the size and growth rate of the Internet, but "there is no comprehensive source for information". They take a well-informed and refreshing look at efforts undertaken for measuring the Internet and dismantle several misunderstandings leading to incorrect measurements and estimations. Some measurements have such large error margins that you might better call them estimations, to say the least. This is partly due to the fact that data are not disclosed by every carrier and only fragmentarily available.
What is measured and what methods are used? Many studies are devoted to the number of users; others look at the number of computers connected to the Internet or count IP addresses. Coffman and Odlyzko focus on the sizes of networks and the traffic they carry to answer questions about the size and the growth of the Internet.
You get the clue of their focus when you bear in mind that the Internet is just one of many networks of networks; it is only a part of the universe of computer networks. Additionally, the Internet has public (unrestricted) and private (restricted) areas. Most studies consider only the public Internet, Coffman and Odlyzko consider the long-distance private line networks too: the corporate networks, the Intranets, because they are convinced (that means their assertion is put forward, but not accompanied by empirical data) that "the evolution of the Internet in the next few years is likely to be determined by those private networks, especially by the rate at which they are replaced by VPNs (Virtual Private Networks) running over the public Internet. Thus it is important to understand how large they are and how they behave." Coffman and Odlyzko check other estimates by considering the traffic generated by residential users accessing the Internet with a modem, traffic through public peering points (statistics for them are available through CAIDA and the National Laboratory for Applied Network Research), and calculating the bandwidth capacity for each of the major US providers of backbone services. They compare the public Internet to private line networks and offer interesting findings. The public Internet is currently far smaller, in both capacity and traffic, than the switched voice network (with an effective bandwidth of 75 Gbps at December 1997), but the private line networks are considerably larger in aggregate capacity than the Internet: about as large as the voice network in the U. S. (with an effective bandwidth of about 330 Gbps at December 1997), they carry less traffic. On the other hand, the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet. In the future, growth in Internet traffic will predominantly derive from people staying longer and from multimedia applications, because they consume more bandwidth, both are the reason for unanticipated amounts of data traffic.

Hosts

The Internet Software Consortium's Internet Domain Survey is one of the most known efforts to count the number of hosts on the Internet. Happily the ISC informs us extensively about the methods used for measurements, a policy quite rare on the Web. For the most recent survey the number of IP addresses that have been assigned a name were counted. At first sight it looks simple to get the accurate number of hosts, but practically an assigned IP address does not automatically correspond an existing host. In order to find out, you have to send a kind of message to the host in question and wait for a reply. You do this with the PING utility. (For further explanations look here: Art. PING, in: Connected: An Internet Encyclopaedia) But to do this for every registered IP address is an arduous task, so ISC just pings a 1% sample of all hosts found and make a projection to all pingable hosts. That is ISC's new method; its old method, still used by RIPE, has been to count the number of domain names that had IP addresses assigned to them, a method that proved to be not very useful because a significant number of hosts restricts download access to their domain data.
Despite the small sample, this method has at least one flaw: ISC's researchers just take network numbers into account that have been entered into the tables of the IN-ADDR.ARPA domain, and it is possible that not all providers know of these tables. A similar method is used for Telcordia's Netsizer.

Internet Weather

Like daily weather, traffic on the Internet, the conditions for data flows, are monitored too, hence called Internet weather. One of the most famous Internet weather report is from The Matrix, Inc. Another one is the Internet Traffic Report displaying traffic in values between 0 and 100 (high values indicate fast and reliable connections). For weather monitoring response ratings from servers all over the world are used. The method used is to "ping" servers (as for host counts, e. g.) and to compare response times to past ones and to response times of servers in the same reach.

Hits, Page Views, Visits, and Users

Let us take a look at how these hot lists of most visited Web sites may be compiled. I say, may be, because the methods used for data retrieval are mostly not fully disclosed.
For some years it was seemingly common sense to report requested files from a Web site, so called "hits". A method not very useful, because a document can consist of several files: graphics, text, etc. Just compile a document from some text and some twenty flashy graphical files, put it on the Web and you get twenty-one hits per visit; the more graphics you add, the more hits and traffic (not automatically to your Web site) you generate.
In the meantime page views, also called page impressions are preferred, which are said to avoid these flaws. But even page views are not reliable. Users might share computers and corresponding IP addresses and host names with others, she/he might access not the site, but a cached copy from the Web browser or from the ISP's proxy server. So the server might receive just one page request although several users viewed a document.

Especially the editors of some electronic journals (e-journals) rely on page views as a kind of ratings or circulation measure, Rick Marin reports in the New York Times. Click-through rates - a quantitative measure - are used as a substitute for something of intrinsically qualitative nature: the importance of a column to its readers, e. g. They may read a journal just for a special column and not mind about the journal's other contents. Deleting this column because of not receiving enough visits may cause these readers to turn their backs on their journal.
More advanced, but just slightly better at best, is counting visits, the access of several pages of a Web site during one session. The problems already mentioned apply here too. To avoid them, newspapers, e.g., establish registration services, which require password authentication and therefore prove to be a kind of access obstacle.
But there is a different reason for these services. For content providers users are virtual users, not unique persons, because, as already mentioned, computers and IP addresses can be shared and the Internet is a client-server system; in a certain sense, in fact computers communicate with each other. Therefore many content providers are eager to get to know more about users accessing their sites. On-line registration forms or WWW user surveys are obvious methods of collecting additional data, sure. But you cannot be sure that information given by users is reliable, you can just rely on the fact that somebody visited your Web site. Despite these obstacles, companies increasingly use data capturing. As with registration services cookies come here into play.

For

If you like to play around with Internet statistics instead, you can use Robert Orenstein's Web Statistics Generator to make irresponsible predictions or visit the Internet Index, an occasional collection of seemingly statistical facts about the Internet.

Measuring the Density of IP Addresses

Measuring the Density of IP Addresses or domain names makes the geography of the Internet visible. So where on earth is the most density of IP addresses or domain names? There is no global study about the Internet's geographical patterns available yet, but some regional studies can be found. The Urban Research Initiative and Martin Dodge and Narushige Shiode from the Centre for Advanced Spatial Analysis at the University College London have mapped the Internet address space of New York, Los Angeles and the United Kingdom (http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/telecom.html and http://www.geog.ucl.ac.uk/casa/martin/internetspace/paper/gisruk98.html).
Dodge and Shiode used data on the ownership of IP addresses from RIPE, Europe's most important registry for Internet numbers.





TEXTBLOCK 2/3 // URL: http://world-information.org/wio/infostructure/100437611791/100438658352
 
1940s - Early 1950s: First Generation Computers

Probably the most important contributor concerning the theoretical basis for the digital computers that were developed in the 1940s was Alan Turing, an English mathematician and logician. In 1936 he created the Turing machine, which was originally conceived as a mathematical tool that could infallibly recognize undecidable propositions. Although he instead proved that there cannot exist any universal method of determination, Turing's machine represented an idealized mathematical model that reduced the logical structure of any computing device to its essentials. His basic scheme of an input/output device, memory, and central processing unit became the basis for all subsequent digital computers.

The onset of the Second World War led to an increased funding for computer projects, which hastened technical progress, as governments sought to develop computers to exploit their potential strategic importance.

By 1941 the German engineer Konrad Zuse had developed a computer, the Z3, to design airplanes and missiles. Two years later the British completed a secret code-breaking computer called Colossus to decode German messages and by 1944 the Harvard engineer Howard H. Aiken had produced an all-electronic calculator, whose purpose was to create ballistic charts for the U.S. Navy.

Also spurred by the war the Electronic Numerical Integrator and Computer (ENIAC), a general-purpose computer, was produced by a partnership between the U.S. government and the University of Pennsylvania (1943). Consisting of 18.000 vacuum tubes, 70.000 resistors and 5 million soldered joints, the computer was such a massive piece of machinery (floor space: 1,000 square feet) that it consumed 160 kilowatts of electrical power, enough energy to dim lights in an entire section of a bigger town.

Concepts in computer design that remained central to computer engineering for the next 40 years were developed by the Hungarian-American mathematician John von Neumann in the mid-1940s. By 1945 he created the Electronic Discrete Variable Automatic Computer (EDVAC) with a memory to hold both a stored program as well as data. The key element of the Neumann architecture was the central processing unit (CPU), which allowed all computer functions to be coordinated through a single source. One of the first commercially available computers to take advantage of the development of the CPU was the UNIVAC I (1951). Both the U.S. Census bureau and General Electric owned UNIVACs (Universal Automatic Computer).

Characteristic for first generation computers was the fact, that instructions were made-to-order for the specific task for which the computer was to be used. Each computer had a different binary-coded program called a machine language that told it how to operate. Therefore computers were difficult to program and limited in versatility and speed. Another feature of early computers was that they used vacuum tubes and magnetic drums for storage.

TEXTBLOCK 3/3 // URL: http://world-information.org/wio/infostructure/100437611663/100438659338
 
Feedback

In biology, feedback refers to a response within a system (molecule, cell, organism, or population) that influences the continued activity or productivity of that system. In essence, it is the control of a biological reaction by the end products of that reaction. Similar usage prevails in mathematics, particularly in several areas of communication theory. In every instance, part of the output is fed back as new input to modify and improve the subsequent output of a system. The importance of feedback mechanisms has also been pointed out by Norbert Wiener. He theorized that all intelligent behavior is the result of feedback mechanisms and that they could therefore contribute to the creation artificial intelligence.

INDEXCARD, 1/3
 
World Wide Web (WWW)

Probably the most significant Internet service, the World Wide Web is not the essence of the Internet, but a subset of it. It is constituted by documents that are linked together in a way you can switch from one document to another by simply clicking on the link connecting these documents. This is made possible by the Hypertext Mark-up Language (HTML), the authoring language used in creating World Wide Web-based documents. These so-called hypertexts can combine text documents, graphics, videos, sounds, and Java applets, so making multimedia content possible.

Especially on the World Wide Web, documents are often retrieved by entering keywords into so-called search engines, sets of programs that fetch documents from as many servers as possible and index the stored information. (For regularly updated lists of the 100 most popular words that people are entering into search engines, click here). No search engine can retrieve all information on the whole World Wide Web; every search engine covers just a small part of it.

Among other things that is the reason why the World Wide Web is not simply a very huge database, as is sometimes said, because it lacks consistency. There is virtually almost infinite storage capacity on the Internet, that is true, a capacity, which might become an almost everlasting too, a prospect, which is sometimes consoling, but threatening too.

According to the Internet domain survey of the Internet Software Consortium the number of Internet host computers is growing rapidly. In October 1969 the first two computers were connected; this number grows to 376.000 in January 1991 and 72,398.092 in January 2000.

World Wide Web History Project, http://www.webhistory.org/home.html

http://www.searchwords.com/
http://www.islandnet.com/deathnet/
http://www.salonmagazine.com/21st/feature/199...
INDEXCARD, 2/3
 
Expert system

Expert systems are advanced computer programs that mimic the knowledge and reasoning capabilities of an expert in a particular discipline. Their creators strive to clone the expertise of one or several human specialists to develop a tool that can be used by the layman to solve difficult or ambiguous problems. Expert systems differ from conventional computer programs as they combine facts with rules that state relations between the facts to achieve a crude form of reasoning analogous to artificial intelligence. The three main elements of expert systems are: (1) an interface which allows interaction between the system and the user, (2) a database (also called the knowledge base) which consists of axioms and rules, and (3) the inference engine, a computer program that executes the inference-making process. The disadvantage of rule-based expert systems is that they cannot handle unanticipated events, as every condition that may be encountered must be described by a rule. They also remain limited to narrow problem domains such as troubleshooting malfunctioning equipment or medical image interpretation, but still have the advantage of being much lower in costs compared with paying an expert or a team of specialists.

INDEXCARD, 3/3