
PostgreSQL database components
A PostgreSQL database could be considered as a container for database schema; the database must contain at least one schema. A database schema is used to organize the database objects in a manner similar to namespaces in high programing languages.
Schema
Object names can be reused in different schema without conflict. The schema contains all the database named objects, including tables, views, functions, aggregates, indexes, sequences, triggers, data types, domains, and ranges.

PostgreSQL schema as a database object container
By default, there is a schema called public in the template databases. That means, all the newly created databases also contain this schema. All users, by default, can access this schema implicitly. Again this is inherited from the template databases. Allowing this access pattern stimulates the situation where the server is not schema-aware. This is useful in small companies where there is no need to have complex security. Also, this enables smooth transition from the non-schema-aware databases.
When a user wants to access a certain object, he needs to specify the schema name and the object name separated by a period(.
). If the database search_path
setting does not contain this name, or if the developer likes to use full qualified names (for example, to select all the entries in pg_database
in the pg_catalog
schema), one needs to write the following command:
SELECT * FROM pg_catalog.pg_database;
Alternatively you can also use the following command:
TABLE pg_catalog.pg_database;
Qualified database object names are sometimes tedious to write, so many developers prefer to use the unqualified object name, which is composed of only the object name without the schema. PostgreSQL provides a search_path
setting that is similar to the using
directive in the C++ language. The search path is composed of schemas that are used by the server to search for the object. The default search path, as shown in the following code, is $user, public
. If there is a schema with the same name as the user, then it will be used first to search for objects or creating new objects. If the object is not found in the schemas specified in the search_path
, then an error will be thrown:
SHOW search_path; -------------- $user,public
Schemas are used for the following reasons:
- Control authorization: In a multi-user database environment, one can use schemas to group objects based on roles.
- Organize database objects: One can organize the database objects in groups based on the business logic. For example, historical and auditing data could be logically grouped and organized in a specific schema.
- Maintain third-party SQL code: The extensions available in the contribution package can be used with several applications. Maintaining these extensions in separate schemas enables the developer to reuse these extensions, and to update them easily.
In the car web portal, let us assume that we would like to create a schema named car_portal_app
, owned by car_portal_app
role. This can be done as follows:
CREATE SCHEMA car_portal_app AUTHORIZATION car_portal_app; --The schema owner is the same as the schema name if not given CREATE SCHEMA AUTHORIZATION car_portal_app;
For more information about the syntax of the CREATE SCHEMA
command, one can use the psql \h
meta-command, which displays the psql client tool inline help, or take a look at the PostgreSQL manual at http://www.postgresql.org/docs/current/static/sql-createschema.html.
Table
The CREATE TABLE
SQL statement is very rich. It can be used for several purposes such as cloning a table, which is handy for database refactoring to create the uninstallation script to rollback changes. Also, it can be used to materialize the result of the SELECT
SQL statement to boost performance, or for temporarily storing the data for later use.
In PostgreSQL, tables can be of different types:
- Permanent table: The table life cycle starts with table creation and ends with table dropping.
- Temporary table: The table life cycle is the user session. This is used often with procedural languages to model some business logic.
- Unlogged table: Operations on unlogged tables are much faster than on permanent tables, because data is not written into the
WAL
files. Unlogged tables are not crash-safe. Also, since streaming replication is based on shipping the log files, unlogged tables cannot be replicated to the slave node. - Child table: A child table is a table that inherits one or more tables. The inheritance is often used with constraint exclusion to physically partition the data on the hard disk and to gain performance in retrieving a subset of data that has a certain value.
- The create table syntax is quite long; the full syntax of create table can be found at http://www.postgresql.org/docs/current/static/sql-createtable.html. The create table SQL command normally requires the following inputs:
- Table name of the created table.
- The table type.
- The table storage parameters. These parameters are used to control the table storage allocation and several other administrative tasks.
- The table columns, including the data type, default values, and constraint.
- The cloned table name and the options to clone the table.
When designing a database table, one should take care in picking the appropriate data type. When the database goes to production, changing the data type of a column might become a very costly operation, especially for heavily loaded tables. The cost often comes from locking the table, and in some cases, rewriting it. When picking a data type, consider a balance between the following factors:
- Extensibility: Can the maximum length of a type be increased or decreased without a full table rewrite and a full table scan?
- Data type size: Going for a safe option such as choosing big integers instead of integers, will cause more storage consumption.
PostgreSQL provides a very extensive set of data types. Some of the native data type categories are:
- Numeric type
- Character type
- Date and time types
These data types are almost common for all relational databases. Moreover, they are often sufficient for modeling traditional applications.
The following table shows various numeric types:

Numeric data types
PostgreSQL supports various mathematical operators and functions, such as geometric functions and bitwise operations. The smallint
data type can be used to save disk space, while bigint
can be used if the integer range is not sufficient.
Serial types, namely smallserial, serial, and bigserial are wrappers on top of smallint
, int
, and biginteger
respectively. Serial types are often used as surrogate keys, and by default, they are not allowed to have a null
value. The serial type utilizes the sequences behind the scene. A sequence is a database object that is used to generate sequences by specifying the minimum, maximum, and increment values.
For example, the following code creates a table customer
with a column customer_id
:
CREATE TABLE customer ( customer_id SERIAL );
This will generate the following code behind the scene:
CREATE SEQUENCE custome_customer_id_seq; CREATE TABLE customer ( customer_id integer NOT NULL DEFAULT nextval('customer_customer_id_seq') ); ALTER SEQUENCE customer_customer_id_seq OWNED BY customer.Customer_id;
When creating a column with type serial, remember the following things:
- A sequence will be created with the name
tableName_columnName_seq
. In the preceding example, the sequence name iscustomer_customer_id_seq
. - The column will have a
Not Null
value constraint. - The column will have a default value generated by the function
nextval()
. - The sequence will be owned by the column, which means that the sequence will be dropped automatically if the column is dropped.
Tip
The preceding example shows how PostgreSQL renames an object if the object name is not specified explicitly; PostgreSQL renames objects using the {tablename}_{columnname(s)}_{suffix}
pattern, where the suffixes pkey
, key
, excl
, idx
, fkey
, and check
which stand for a primary key constraint, a unique constraint, an exclusion constraint, an index, a foreign key constraint, and a check constraint respectively.
A common mistake when using the serial type is forgetting to grant proper permissions to the generated sequence.
Similar to the C language, the result of an integer expression is also an integer. So, the results of the mathematical operations 3/2
and 1/3
is 1
and 0
respectively. Thus, the fractional part is always truncated. Unlike C language, the postgres rounds off the numbers when casting a double value to int:
postgres=# SELECT CAST (5.9 AS INT) AS rounded_up, CAST(5.1 AS INTEGER) AS rounded_down, 5.5::INT AS another_syntax; rounded_up | rounded_down | another_syntax ------------+--------------+---------------- 6 | 5 | 6 (1 row) postgres=# SELECT 2/3 AS "2/3", 1/3 AS "1/3", 3/2 AS "3/2"; 2/3 | 1/3 | 3/2 -----+-----+----- 0 | 0 | 1 (1 row)
The numeric and decimal types are recommended for storing monetary and other amounts where precision is required. There are three forms for defining a numeric or a decimal value:
- Numeric (precision, scale)
- Numeric (precision)
- Numeric
Precision is the total number of digits, while scale is the number of digits of the fraction part. For example, the number 12.344 has a precision of five and a scale of three. If a numeric type is used to define a column type without precision or scale, then the column can store any number with any precision and scale.
Floating point and double precision are inexact; that means that the values in some cases cannot be represented in the internal binary format, and are stored as approximation.
The following table shows various character types:

Character data types
PostgreSQL provides two general text types, which are char(n)
and varchar(n)
data types, where n is the number of characters allowed. In the char
data type, if a value is less than the specified length, then trailing spaces are padded at the end of the value. Operations on the char
data types ignore the trailing spaces. Take a look at the following example:
postgres=# SELECT 'a'::CHAR(2) = 'a '::CHAR(2); ?column? ---------- t (1 row) postgres=# SELECT length('a '::CHAR(10)); length -------- 1 (1 row)
It is not recommended to perform binary operations on varchar
or text and char strings due to trailing spaces.
For both char
and varchar
data types, if the string is longer than the maximum allowed length, then:
- An error will be raised in the case of insert or update unless the extra characters are all spaces. In the latter case, the string will be truncated.
- In the case of casting, extra characters will be truncated automatically without raising an error.
The following example shows how mixing different data types might cause problems:
postgres=# SELECT 'a '::VARCHAR(2)='a '::text; ?column? ---------- t (1 row) postgres=# SELECT 'a '::CHAR(2)='a '::text; ?column? ---------- f (1 row) postgres=# SELECT 'a '::CHAR(2)='a '::VARCHAR(2); ?column? ---------- t (1 row) postgres=# SELECT length ('a '::CHAR(2)); length -------- 1 (1 row) postgres=# SELECT length ('a '::VARCHAR(2)); length -------- 2 (1 row)
The preceding example shows that 'a '::CHAR(2)
equals to 'a '::VARCHAR(2)
, but both have different lengths, which is not logical. Also, it shows that 'a '::CHAR(2)
is not equal to 'a '::text
. Finally, 'a '::VARCHAR(2)
equals 'a '::text
. The preceding example causes confusion because if a variable a
is equal to b
and b
is equal to c
, then a
is equal to c
according to mathematics.
The PostgreSQL text storage size depends on several factors, namely, the length of the text value, and the text decoding and compression. The text data type can be considered as an unlimited varchar()
type. The maximum text size that can be stored is 1 GB, which is the maximum column size.
For fixed length strings, the character data type and the character varying data type consume the same amount of hard disk space. For variable length character, the character varying data type consumes less space, because character type appends the string with space. The following code shows the storage consumption for fixed and variable length texts for the character and character varying data types. It simply creates two tables, populates the tables with fictional data using fixed and variable length strings, and finally gets the table size in a human readable form:
CREATE TABLE char_size_test ( size CHAR(10) ); CREATE TABLE varchar_size_test( size varchar(10) ); WITH test_data AS ( SELECT substring(md5(random()::text), 1, 10) FROM generate_series (1, 1000000) ), cahr _data_insert AS ( INSERT INTO char_size_test SELECT * FROM test_data ) INSERT INTO varchar_size_test SELECT * FROM test_date; -- Get the table size in human readable form SELECT pg_size_pretty(pg_relation_size ('char_size_test')) AS char_size_test , pg_size_pretty(pg_relation_size ('varchar_size_test')) AS varchar_size_test; -- Delete the tables data TRUNCATE char_size_test; TRUNCATE varchar_size_test; -- Insert data with fixed length WITH test_date AS ( SELECT substring(md5(random()::text), 1, (random()* 10)::int) FROM generate_series (1, 1000000) ), cahr _data_insert AS ( INSERT INTO char_size_test SELECT * FROM test_date ) INSERT INTO varchar_size_test SELECT * FROM test_date; SELECT pg_size_pretty(pg_relation_size ('char_size_test')) AS char_size_test , pg_size_pretty(pg_relation_size ('varchar_size_test')) AS varchar_size_test; -- Create tables
The varchar
data type can be emulated by the text
data type and a check constraint to check the text length. For example, the following code snippets are semantically equivalent:
CREATE TABLE emulate_varchar( test VARCHAR(4) ); --semantically equivalent to CREATE TABLE emulate_varchar ( test TEXT, CONSTRAINT test_length CHECK (length(test) <= 4) );
In PostgreSQL, there is no difference in performance between the different character types, so it is recommended to use the text
data type. It allows the developer to react quickly to the changes in business requirements. For example, one common business case is changing the text length, such as changing the length of a customer ticket number from six to eight characters due to length limitation, or changing how certain information are stored in the database. In such a scenario, if the data type is text
, this could be done by amending the check constraint without altering the table structure.
The date and time
data types are commonly used to describe the occurrence of events such as birth date. PostgreSQL supports the following date and time
types:

Date/time data types
PostgreSQL stores timestamp with and without time zone in the universal coordinated time (UTC) format, and only time is stored without the time zone. This explains the identical storage size for both timestamp with time zone and time stamp without time zone.
There are two approaches for handling timestamp correctly. The first approach is to use timestamp without time zone, and let the client side handle the time zone differences. This is useful for in-house development, applications with only one time zone, and when the clients know the time zone differences.
The other approach is to use timestamp with time zone. The following are some of the best practices to avoid the common pitfalls when using timestamptz
:
- Make sure to set the default time zone for all connections. This is done by setting the time zone configuration in the
postgresql.conf
file. Since PostgreSQL stores the timestamp with the time zone in UTC format internally, it is a good practice to set the default connection to UTC as well. Also, UTC helps in overcoming the potential problems due to Daylight Saving Time (DST). - The time zone should be specified in each CRUD operation.
- Do not perform operations on timestamp without time zone and timestamp with time zone, this will normally lead to wrong results due to implicit conversion.
- Do not invent your own conversion; instead, use the database server to convert between the different time zones.
- Investigate the data types of high level languages to determine which type could be used with PostgreSQL without extra handling.
PostgreSQL has two important settings: timezone
and datestyle
. The datestyle
has two purposes:
- Setting the display format: The
datestyle
specifies thetimestamp
andtimestamptz
rendering style. - Interpreting ambiguous data: The
datestyle
specifies how to interprettimestamp
andtimestamptz
.
The views pg_timezone_names
and pg_timezone_abbrevs
provide a list of the time zone names and abbreviations respectively. They also provide information regarding the time offset from UTC, and if the time zone is a DST. For example, the following code snippet sets the timezone
setting to Jerusalem, and then retrieves the local date and time in Jerusalem:
postgres=> SET timezone TO 'Asia/jerusalem'; SET postgres=> SELECT now(); now ------------------------------- 2014-08-27 23:49:49.611633+03 (1 row)
The PostgreSQL AT TIME ZONE
statement converts the timestamp
with or without the timezone
to a specified time zone; its behavior depends on the converted type. The following example clarifies this construct:
postgres=> SHOW timezone; TimeZone ---------- UTC (1 row) postgres=> \x Expanded display is on. postgres=> SELECT now(), now()::timestamp, now() AT TIME ZONE 'CST', now()::timestamp AT TIME ZONE 'CST'; -[ RECORD 1 ]-------------------------- now | 2014-08-27 21:00:18.36009+00 now | 2014-08-27 21:00:18.36009 timezone | 2014-08-27 15:00:18.36009 timezone | 2014-08-28 03:00:18.36009+00
The function now()
returns the current timestamp with the time zone in the UTC format. Notice that the time zone offsite is +00
. When casting the time stamp with the time zone to timestamp
as in now()::timestamp
, the time zone offsite is truncated. The now() AT TIME ZONE 'CST'
expression converts the timestamp with the time zone UTC to timestamp in the specified time zone CST
. Since the central standard time offset is -6
, then six hours are deducted. The last expression now()::timestamp AT TIME ZONE 'CST'
is reinterpreted as a timestamp as being in that time zone CST
for the purpose of converting it to the connection default time zone UTC
. So, the last expression is equivalent to the following:
postgres=> SELECT ('2014-08-27 21:00:18.36009'::timestamp AT time zone 'CST' AT TIME ZONE 'UTC')::timestamptz; -[ RECORD 1 ]-------------------------- timezone | 2014-08-28 03:00:18.36009+00
One can summarize the conversion between the timestamp with and without the time zone, as follows:
- The expression
timestamp without time zone AT TIME ZONE x
is interpreted as follows: thetimestamp
will be converted from the time zonex
to the session time zone. - The expression
timestamp with time zone AT TIME ZONE x
converts atimestamptz
into atimestamp
at the specified time zonex
. The final result type istimestamp
.
The date is recommended when there is no need to specify the time such as birth date, holidays, absence days, and so on.
Time with time zone storage is 12 bytes, 8 bytes are used to store the time, and 4 bytes are used to store the time zone. The time without time zone consumes only 8 bytes. Conversions between time zones can be made using the AT TIME ZONE
construct.
Finally, the interval data type is very important in handling the timestamp operations as well as describing some business cases. From the point of view of functional requirements, the interval data type can represent a period of time such as estimation time for the completion of a certain task. The result type of the basic arithmetic operations such as +
and -
on timestamp, timestamptz, time, and time with time zone is of the type interval. The result of the same operations on date type is an integer. The following example shows timestamptz
and date subtraction. Notice the format of the specifying intervals:
SELECT '2014-09-01 23:30:00.000000+00'::timestamptz -'2014-09-01 22:00:00.000000+00'::timestamptz = Interval '1 hour, 30 minutes'; ?column? ---------- t (1 row) postgres=> SELECT '11-10-2014'::date -'10-10-2014'::date = 1; ?column? ---------- t (1 row)
At this stage, one can convert the logical model of the car web portal presented in Chapter 1, Relational Databases to a physical model. To help the developer to create a table, one can follow this minimal check list:
- What is the primary key?
- What is the default value for each column?
- What is the type of each column?
- What are the constraints on each column or set of columns?
- Are permissions set correctly on tables, sequences, and schemas?
- Are foreign keys specified with the proper actions?
- What is the data life cycle?
- What are the operations allowed on the data?
For creating the car web portal schema, the formal relational model will not be applied strictly. Also, surrogate keys will be used instead of natural keys for the following reasons:
- Natural keys can change; one can change the current e-mail address to another one. Using a surrogate key guarantees that if a row is referenced by another row, then this reference is not lost, because the surrogate key has not changed.
- Incorrect assumptions about natural keys. Let us take e-mail address as an example. The general assumption about an e-mail address is that it identifies a person uniquely. This is not true; some e-mail service providers set policies such as e-mail expiration based on activity. Private companies might have general e-mail addresses such as
contact@
..,support@
..., and so on. The same is applicable to phone and mobile numbers. - Surrogate keys can be used to support a temporal database design within the relational database world. For example, some companies have a very strict security requirement, and data should be versioned for each operation.
- Surrogate keys often use compact data types such as integers. This allows for better performance than composite natural keys.
- Surrogate keys can be used in PostgreSQL to eliminate the effect of cross column statistic limitation. PostgreSQL collects statistics per single column. In some cases, this is not convenient because columns might be correlated. In this case, PostgreSQL gives a wrong estimation to the planner, and thus, imperfect execution plans are generated.
- Surrogate keys are better supported than the natural keys by object relational mappers such as hibernate.
Despite all these advantages of surrogate keys, it also has a few disadvantages:
- A surrogate key is auto generated, and the generation of the value might give different results. For example, one inserts a data in a test database and staging a database, and after the comparison of data, the data was not identical.
- A surrogate key is not descriptive. From the communication point of view, it is easier to refer to a person by a name instead of an auto generated number.
In the web car portal ER diagram, there is an entity with the name user
. Since user
is a reserved keyword, the name account
will be used for creating the table. Note that to create a database object using a PostgreSQL keyword, the name should be quoted. The following example shows how to create a table user:
postgres=# \set VERBOSITY 'verbose' postgres=# CREATE TABLE user AS SELECT 1; ERROR: 42601: syntax error at or near "user" LINE 1: CREATE TABLE user AS SELECT 1; ^ LOCATION: scanner_yyerror, src\backend\parser\scan.l:1053 postgres=# CREATE TABLE "user" AS SELECT 1; SELECT 1 postgres=#
You can find the full list of reserved words at http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html.
To create a table account, one can execute the following command:
CREATE TABLE account ( account_id SERIAL PRIMARY KEY, first_name TEXT NOT NULL, last_name TEXT NOT NULL, email TEXT NOT NULL UNIQUE, password TEXT NOT NULL, CHECK(first_name !~ '\s' AND last_name !~ '\s'), CHECK (email ~* '^\w+@\w+[.]\w+$'), CHECK (char_length(password)>=8) );
To summarize the user
table:
- The
account_id
is defined as the primary key with type serial. Theaccount_id
is naturally unique and not null. - The attributes
first_name
,last_name
,email
, andpassword
are not allowed to have null values. - The
first_name
and thelast_name
attributes are not allowed to have spaces. - The password should be at least eight characters in length. In reality, the password length is handled in business logic, since passwords should not be stored in a plain text format in the database. For more information about securing the data, have a look at Chapter 8, PostgreSQL Security.
- The e-mail should match a certain regex expression. Note that the e-mail regular expression is really simplistic.
Behind the scene, the following objects are created:
To create the seller account, one can execute the following statement:
CREATE TABLE seller_account ( seller_account_id SERIAL PRIMARY KEY, account_id INT UNIQUE NOT NULL REFERENCES account(account_id), number_of_advertizement advertisement INT DEFAULT 0, user_ranking float, total_rank float );
As we can see, the seller account has a one-to-one relationship with the account. This is enforced by the account_id
that consists of NOT NULL
and UNIQUE
constraints. Also, in this case, one can model the seller account as follows by marking the account_id
as the primary key:
CREATE TABLE seller_account ( account_id INT PRIMARY KEY REFERENCES account(account_id) ... );
The first design is more flexible and less ambiguous. First of all, the requirement might change, and the user account and the seller account relation might change from one-to-one to one-to-many. For example, the user concept might be generalized to handle companies where the company has several seller accounts.
Also, if a table references both the account and seller_account
tables using a common column name such as account_id
, then it will be difficult to distinguish which table is being referenced.