Python Essentials
上QQ阅读APP看书,第一时间看更新

Simple assignment and variables

We've seen a few examples of the essential Python assignment statement in previous chapters. The statement includes a variable, =, and an expression. Since a single object is an expression, we can write:

>>> pi = 3.14

This will create the floating-point literal 3.14 and assign this object to a variable named pi.

Variable names must follow the rules in section 2.3, Identifiers and Keywords, of the Python Language Reference. The reference manual uses the Unicode character class definitions provided in the unicodedata module.

Interesting background information on the problem of programming language identifiers is available in Unicode Standard Annex 31, Unicode Identifier and Pattern Syntax. This shows how the Python problem of how "what is an identifier?" fits into the larger context of other programming languages and the variety of natural languages used around the world.

In Python, identifiers have a small set of start characters; these are chosen to allow a lexical scanner to determine what kinds of characters can follow. If identifiers began with digits, it would be rather complex to distinguish identifiers from numbers. Consequently, identifiers must begin with a letter or _. After the initial character, Python allows an identifier to continue with characters that may come from a larger set of characters: letters, digits, and _.

What do we really mean by "letter" or "digit"? In earlier versions of Python, these terms were defined by the Latin-based ASCII alphabet. Using Unicode means that the terms now have more inclusive definitions.

Python defines the identifier starting character as belonging to the following Unicode categories: uppercase letters (Lu), lowercase letters (Ll), title case letters (Lt), modifier letters (Lm), other letters (Lo), and letter numbers (Nl). Python also includes the small set of characters in the Other_ID_Start category. The set of characters defined by these categories is large. Latin letters in the ranges a-z and A-Z, for example, are in this set. When writing more mathematically-oriented programs, the Greek letters α-ω and A-Ω can also be used as identifier start characters. We can write this:

>>> π = 355/113

This assigns the result of the expression to the variable, π. Some programmers find that their OS keyboard interface makes letters outside a single national alphabet awkward to use; consequently, they suggest focusing on Latin letters for programming.

Identifiers can continue with any of the letters defined in the previous paragraph, the _ character, and characters from the following categories: nonspacing marks (Mn), spacing combining marks (Mc), decimal numbers (Nd), and connector punctuations (Pc). This allows us to include ordinary decimal digits as well as other "combining" marks that modify the previous character. For example:

This shows the character GREEK SMALL LETTER PI followed by the COMBINING DIACRITICAL CIRCUMFLEX to create a "pi-hat" variable, Simple assignment and variables. It may be awkward to type for some developers, but it also may fit nicely with a population genomics formulae which use this symbol combination. The Inheritance By Descent estimator, for example, uses Simple assignment and variables. The expression shown earlier involves two other variables, p_2 and p_1, which use more common Latin letters, _, and digits.

Note that variable names that begin and end with __ (two underscores) are reserved by Python for special purposes. For example, we have global variables such as __name__, __debug__, and __file__ which are set when our script starts running.

There's no reason for our application to ever create new names which begin and end with __. We're not prohibited from creating such variables, but any name that we might adopt could be used by some internal feature of Python.

Tip

It's best to assume that at all names beginning and ending with __ (double underscore) are reserved by Python and do something special. Even if the name is not used in the current release, that doesn't mean it won't be used in a future release.