Foundations of computer science

Pdf File 506.03 KByte, 10 Pages

Foundations of Computer Science

Computer Science Tripos Part 1a

Lawrence C Paulson Computer Laboratory University of Cambridge

Copyright c 2000 by Lawrence C. Paulson


1 Introduction


2 Recursive Functions


3 O Notation: Estimating Costs in the Limit


4 Lists


5 More on Lists


6 Sorting


7 Datatypes and Trees


8 Dictionaries and Functional Arrays


9 Queues and Search Strategies


10 Functions as Values


11 List Functionals


12 Polynomial Arithmetic


13 Sequences, or Lazy Lists


14 Elements of Procedural Programming


15 Linked Data Structures



Foundations of Computer Science


This course has two objectives. First (and obvious) is to teach programming. Second is to present some fundamental principles of computer science, especially algorithm design. Most students will have some programming experience already, but there are few people whose programming cannot be improved through greater knowledge of basic principles. Please bear this point in mind if you have extensive experience and find parts of the course rather slow.

The programming in this course is based on the language ML and mostly concerns the functional programming style. Functional programs tend to be shorter and easier to understand than their counterparts in conventional languages such as C. In the space of a few weeks, we shall be able to cover most of the forms of data structures seen in programming. The course also covers basic methods for estimating efficiency.

Courses in the Computer Laboratory are now expected to supply a Learning Guide to suggest extra reading, discussion topics, exercises and past exam questions. For this course, such material is attached at the end of each lecture. Extra reading is mostly drawn from my book ML for the Working Programmer (second edition), which also contains many exercises. The only relevant exam questions are from the June 1998 papers for Part 1A.

Thanks to Stuart Becker, Silas Brown, Frank King, Joseph Lord, James Margetson and Frank Stajano for pointing out errors in these notes. Please inform me of further errors and of passages that are particularly hard to understand. If I use your suggestion, I'll acknowledge it in the next printing.

Suggested Reading List

My own book is, naturally, closest in style to these notes. Ullman's book is another general introduction to ML. The Little MLer is a rather quirky tutorial on recursion and types. Harrison is of less direct relevance, but worth considering. See Introduction to Algorithms for O-notation.

? Paulson, Lawrence C. (1996). ML for the Working Programmer. Cambridge University Press (2nd ed.).

? Ullman, Jeffrey D. (1993) Elements of ML Programming. Prentice Hall.

? Mattias Felleisen and Daniel P. Friedman (1998). The Little MLer. MIT Press.

? Harrison, Rachel (1993). Abstract Data Types in Standard ML. Wiley.

? Thomas H. Cormen, Charles E. Leiserson and Ronald L. Rivest (1990). Introduction to Algorithms. MIT Press.


Foundations of Computer Science


Slide 101

Computers: a child can use them; NOBODY can fully understand them Master complexity through levels of abstraction Focus on 2 or 3 levels at most! Recurring issues:

? what services to provide at each level ? how to implement them using lower-level services ? the interface: how the two levels should communicate

A basic concept in computer science is that large systems can only be understood in levels, with each level further subdivided into functions or services of some sort. The interface to the higher level should supply the advertised services. Just as important, it should block access to the means by which those services are implemented. This abstraction barrier allows one level to be changed without affecting levels above. For example, when a manufacturer designs a faster version of a processor, it is essential that existing programs continue to run on it. Any differences between the old and new processors should be invisible to the program.


Foundations of Computer Science


Example I: Dates

Slide 102

Abstract level: names for dates over a certain range Concrete level: typically 6 characters: YYMMDD Date crises caused by INADEQUATE internal formats:

? Digital's PDP-10: using 12-bit dates (good for at most 11 years) ? 2000 crisis: 48 bits could be good for lifetime of universe!


? information can be represented in many ways ? get it wrong, and you will pay

Digital Equipment Corporation's date crisis occurred in 1975. The PDP10 was a 36-bit mainframe computer. It represented dates using a 12-bit format designed for the tiny PDP-8. With 12 bits, one can distinguish 212 = 4096 days or 11 years.

The most common industry format for dates uses six characters: two for the year, two for the month and two for the day. The most common "solution" to the year 2000 crisis is to add two further characters, thereby altering file sizes. Others have noticed that the existing six characters consist of 48 bits, already sufficient to represent all dates over the projected lifetime of the universe:

248 = 2.8 ? 1014 days = 7.7 ? 1011 years!

Mathematicians think in terms of unbounded ranges, but the representation we choose for the computer usually imposes hard limits. A good programming language like ML lets one easily change the representation used in the program. But if files in the old representation exist all over the place, there will still be conversion problems. The need for compatibility with older systems causes problems across the computer industry.


Foundations of Computer Science


Example II: Floating-Point Numbers

Slide 103

Computers have integers like 1066 and reals like 1.066 ? 103

A floating-point number is represented by two integers For either sort of number, there could be different precisions The concept of DATA TYPE:

? how a value is represented ? the suite of available operations

Floating point numbers are what you get on any pocket calculator. Internally, a float consists of two integers: the mantissa (fractional part) and the exponent. Complex numbers, consisting of two reals, might be provided. We have three levels of numbers already!

Most computers give us a choice of precisions, too. In 32-bit precision, integers typically range from 231 - 1 (namely 2,147,483,647) to -231; reals are accurate to about six decimal places and can get as large as 1035 or so. For reals, 64-bit precision is often preferred. How do we keep track of so many kinds of numbers? If we apply floating-point arithmetic to an integer, the result is undefined and might even vary from one version of a chip to another.

Early languages like Fortran required variables to be declared as integer or real and prevented programmers from mixing both kinds of number in a computation. Nowadays, programs handle many different kinds of data, including text and symbols. Modern languages use the concept of data type to ensure that a datum undergoes only those operations that are meaningful for it.

Inside the computer, all data are stored as bits. Determining which type a particular bit pattern belongs to is impossible unless some bits have been set aside for that very purpose (as in languages like Lisp and Prolog). In most languages, the compiler uses types to generate correct machine code, and types are not stored during program execution.

Download Pdf File