Monday, October 15, 2007

Static typing
In computer science, a type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact. A type indicates a set of values that have the same sort of generic meaning or intended purpose (although some types, such as abstract types and function types, might not be represented as values in the running computer program). Type systems vary significantly between languages with, perhaps, the most important variations being their compile-time syntactic and run-time operational implementations.
A compiler may use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. For example, in many C compilers the "float" data type is represented in 32 bits, in accordance with the IEEE specification for single-precision floating point numbers. Thus, C uses floating-point-specific operations on those values (floating-point addition, multiplication, etc.).
The depth of type constraints and the manner of their evaluation affects the typing of the language. Further, a programming language may associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design.

Basis
The process of verifying and enforcing the constraints of types – type checking – may occur either at compile-time (a static check) or run-time (a dynamic check). If a language enforces type rules strongly (that is, generally allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.

Type checking
In most compiled computer languages, like C, C++ and Pascal, the data type of every variable, parameter and function return value is known at compile time. The programmer has to provide the type information through declarations. This is known as static typing (but some statically typed languages may include implicit declarations). A programming language is statically typed if type checking may be performed without testing equivalence of run-time expressions. A statically typed programming language respects the distinction between run-time and compile-time phases of processing. A language has a compile-time phase if separate modules of a program can be type checked separately (separate compilation), without information about all modules that exist at run time. Static type-checking becomes a primary task of the semantic analysis carried out by a compiler.

Static typing
In languages with dynamic data typing, like Lisp, Perl, and JavaScript, the data types are not declared. They are not known until execution time. The advantage of dynamic data typing is more flexibility and less work for the programmer. But often the data type declarations help organizing and understanding a program. A programming language is dynamically typed if the language supports run-time (dynamic) dispatch on tagged data. A programming language is dependently typed, if the phase distinction is violated and consequently the type checking requires testing equivalence of run-time expressions.
To see how type tag checking works, consider the following pseudocode example:
In this example, (1) declares the name x; (2) associates the integer value 5 to the name x; and (3) associates the string value "hi" to the name x. In most statically typed systems, this code fragment would be illegal, because (2) and (3) bind x to values of inconsistent type.
By contrast, a purely dynamically typed system would permit the above program to execute, since type tags are attached to values, not variables. The implementation of a dynamically typed language catches programmer errors related to the misuse of values—type errors—at the time of the computation of the erroneous statement or expression. In other words, dynamic typing catches errors during program execution.
A typical implementation of dynamic typing keeps all program values "tagged" with a type tag, and checks the tag before using any value in an operation. For example:
In this code fragment, (1) binds the value 5 to x; (2) binds the value "hi" to y; and (3) attempts to add x to y. In a dynamically typed language, the value bound to x might be a pair (integer, 5), and the value bound to y might be a pair (string, "hi"). When the program attempts to execute line 3, the language implementation checks the type tags integer and string, and if the operation + (addition) is not defined over these two types it signals an error.

Dynamic typing
Some statically typed languages have a "back door" in the language that enables programmers to write code that does not statically type check. For example, Java and C-style languages have "casting".
The presence of static typing in a programming language does not necessarily imply the absence of dynamic typing mechanisms. For example, Java uses static typing, but certain operations require the support of runtime type tests, which are a form of dynamic typing. See programming language for more discussion of the interactions between static and dynamic typing.

Combinations of dynamic and static typing
The choice between static and dynamic typing requires some trade-offs.
Static typing finds type errors reliably at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus what proportion of those bugs which are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased. Advocates of strongly typed languages such as ML and Haskell have suggested that almost all bugs can be considered type errors, if the types used in a program are sufficiently well declared by the programmer or inferred by the compiler.
Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce optimized machine code. Further, compilers in statically typed languages can find shortcuts more easily. Some dynamically-typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization.
By contrast, dynamic typing may allow compilers and interpreters to run more quickly, since changes to source code in dynamically-typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle.
Statically-typed languages which lack type inference (such as Java) require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or drift out of synchronization. However, a language can be statically typed without requiring type declarations, so this is not a consequence of static typing.
Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be used as an additional mechanism for communicating the intentions of the library developer.
Dynamic typing allows constructs that some static type systems would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible (however, the typing within that evaluated code might remain static). Furthermore, dynamic typing accommodates transitional code and prototyping, such as allowing a string to be used in place of a data structure. Recent enhancements (e.g. Haskell Generalized algebraic data types) to statically typed languages have allowed eval functions to be written in a type-safe way.
Dynamic typing typically makes metaprogramming more powerful and easier to use. For example, C++ templates are typically more cumbersome to write than the equivalent Ruby or Python code. More advanced run-time constructs such as metaclasses and introspection are often more difficult to use in statically-typed languages.

Static and dynamic type checking in practice

Main article: strongly-typed programming language Strong and weak typing

Main article: Type safety Safely and unsafely typed systems

Main article: Polymorphism (computer science) Polymorphism and types

Main article: Duck typing Duck typing
Many type systems have been created that are specialized for use in certain environments, with certain types of data, or for out-of-band static program analysis. Frequently these are based on ideas from formal type theory and are only available as part of prototype research systems.

Specialized type systems
Dependent types are based on the idea of using scalars or values to more precisely describe the type of some other value. For example, "matrix(3,3)" might be the type of a 3×3 matrix. We can then define typing rules such as the following rule for matrix multiplication:
matrix_multiply : matrix(k,m) × matrix(m,n) → matrix(k,n)
where k, m, n are arbitrary positive integer values. A variant of ML called Dependent ML has been created based on this type system, but because type-checking conventional dependent types is undecidable, not all programs using them can be type-checked without some kind of limitations.

Dependent types
Linear types, based on the theory of linear logic, and also known as uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. These are valuable for describing large immutable values such as strings, files, and so on, because any operation that simultaneously destroys a linear object and creates a similar object (such as 'str = str + "a"') can be optimized "under the hood" into an in-place mutation. Normally this is not possible because such mutations could cause side effects on parts of the program holding other references to the object, violating referential transparency. They're also used in the prototype operating system Singularity for interprocess communication, statically ensuring that processes cannot share objects in shared memory in order to prevent race conditions. The Clean language (a Haskell like language) uses this type system in order to gain a lot of speed while remaining safe.

Linear types
Intersection types are types describing values that belong to both of two other given types with overlapping value sets. For example, in C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the intersection type of these two types would have range 0 to 127. Such an intersection type could be safely passed into functions expecting either signed or unsigned chars, because it is compatible with both types.
Intersection types are useful for describing overloaded function types: for example, if "int → int" is the type of functions taking an integer argument and returning an integer, and "float → float" is the type of functions taking a float argument and returning a float, then the intersection of these two types can be used to describe functions that do one or the other, based on what type of input they're given. Such a function could be passed into another function expecting an "int → int" function safely; it simply wouldn't use the "float → float" functionality.
In a subclassing hierarchy, the intersection of a type and an ancestor type (such as its parent) is the most derived type. The intersection of sibling types is empty.
The Forsythe language includes a general implementation of intersection types. A restricted form is refinement types.

Intersection types
Union types are types describing values that belong to either of two types. For example, in C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the union of these two types would have range -128 to 255. Any function handling this union type would have to deal with integers in this complete range. More generally, the only valid operations on a union type are operations that are valid on both types being unioned. C's "union" concept is similar to union types, but is not typesafe because it permits operations that are valid on either type, rather than both. Union types are important in program analysis, where they are used to represent symbolic values whose exact type is not known.
In a subclassing hierarchy, the union of a type and an ancestor type (such as its parent) is the ancestor type. The union of sibling types is a subtype of their common ancestor (that is, all operations permitted on their common ancestor are permitted on the union type, but they may also have other valid operations in common).

Union types
Existential types are frequently used to represent modules and abstract data types because of their ability to separate implementation from interface. For example, in C pseudocode, the type "T = ∃X { X a; int f(X); }" describes a module interface that has a data member of type X and a function that takes a parameter of the same type X and returns an integer. This could be implemented in different ways, for example:
These types are both subtypes of the more general existential type T and correspond to concrete implementation types, so any value of one of these types is a value of type T. Given a value "t" of type "T", we know that "t.f(t.a)" is well-typed, regardless of what the abstract type X is. This gives flexibility for choosing types suited to a particular implementation while clients that use only values of the interface type — the existential type — are isolated from these choices.

intT { int a; int f(int); }
floatT { float a; int f(float); } Existential types
Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate each variable with a particular type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on how programmers use those variables. For example, given a function f(x,y) which adds x and y together, the compiler can infer that x and y must be numbers -- since addition is only defined for numbers. Therefore, any call to f elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would signal an error.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply a type of floating-point; while [1, 2, 3] might imply a list of integers; typically an array.

Explicit or implicit declaration and inference
A type of types is a kind. Kinds appear explicitly in typeful programming, such as a type constructor in the Haskell programming language, which returns a simple type after being applied to enough simple types. For example, the type constructor Either has the kind * -> * -> * and its application Either String Integer is a simple type (kind *). However, in most programming languages type construction is implicit and hard coded in the grammar, there is no notion of kind as a first class entity.
Types fall into several broad categories:

primitive types — the simplest kind of type, e.g. integer and floating-point number
integral types — types of whole numbers, e.g. integers and natural numbers
floating point types — types of numbers in floating-point representation
composite types — types composed of basic types, e.g. arrays or records. Abstract data types have attributes of both composite types and interfaces, depending on context.
subtype
derived type
object types, e.g. type variable
partial type
recursive type
function types, e.g. binary functions
universally quantified types, such as parameterized types
existentially quantified types, such as modules
refinement types, types which identify subsets of other types
dependent types, types which depend on run-time values
ownership types, types which describe or constrain the structure of object-oriented systems Types of types
A type-checker for a statically typed language must verify that the type of any expression is consistent with the type expected by the context in which that expression appears. For instance, in an assignment statement of the form x := e, the inferred type of the expression e must be consistent with the declared or inferred type of the variable x. This notion of consistency, called compatibility, is specific to each programming language.
Clearly, if the type of e and the type of x are the same and assignment is allowed for that type, then this is a valid expression. In the simplest type systems, therefore, the question of whether two types are compatible reduces to that of whether they are equal (or equivalent). Different languages, however, have different criteria for when two type expressions are understood to denote the same type. These different equational theories of types vary widely, two extreme cases being structural type systems, in which any two types are equivalent that describe values with the same structure, and nominative type systems, in which no two syntactically distinct type expressions denote the same type (i.e., types must have the same "name" in order to be equal).
In languages with subtyping, the compatibility relation is more complex. In particular, if A is a subtype of B, then a value of type A can be used in a context where one of type B is expected, even if the reverse is not true. Like equivalence, the subtype relation is defined differently for each programming language, with many variations possible. The presence of parametric or ad hoc polymorphism in a language may also have implications for type compatibility.

No comments: