psycopg2 - Documentation

What is psycopg2?

psycopg2 is a popular PostgreSQL adapter for Python. It’s a fully compliant, mature, and feature-rich library that allows Python programs to interact with PostgreSQL databases. It provides a simple and efficient way to execute SQL queries, fetch results, and manage database connections. psycopg2 is known for its performance, stability, and extensive support for PostgreSQL features, including advanced features like transactions, prepared statements, and notifications.

Why use psycopg2?

Several factors make psycopg2 a preferred choice for PostgreSQL interaction in Python:

Installation and Setup

psycopg2 is typically installed using pip, the Python package installer:

pip install psycopg2-binary

The psycopg2-binary package is recommended as it includes pre-built wheels for faster installation. If you encounter issues or require compiling from source, you might need to install the psycopg2 package instead (this requires development tools including PostgreSQL’s development libraries). Refer to the psycopg2 documentation for platform-specific instructions if needed. Ensure you have the correct PostgreSQL client libraries installed for your operating system.

System Requirements

The core requirement is Python 3.7 or higher. To use psycopg2, you’ll also need to have PostgreSQL installed on your system, along with the appropriate client libraries. These libraries depend on your operating system:

Connecting to PostgreSQL

Connecting to your PostgreSQL database using psycopg2 involves creating a connection object using the connect() function:

import psycopg2

try:
    conn = psycopg2.connect(
        host="your_db_host",
        database="your_db_name",
        user="your_db_user",
        password="your_db_password",
        port="your_db_port"  # Optional, defaults to 5432
    )
    print("Connected to PostgreSQL successfully!")

    # ... further database operations ...

    conn.close()

except psycopg2.Error as e:
    print(f"PostgreSQL error: {e}")

Replace the placeholders (your_db_host, your_db_name, etc.) with your actual database credentials. The port parameter is optional and defaults to 5432, the standard PostgreSQL port. Error handling is crucial; the try...except block catches potential psycopg2.Error exceptions during the connection process. Always close the connection using conn.close() when finished.

Basic Usage

Connecting to a Database

Connecting to a PostgreSQL database is the first step in any psycopg2 application. This is done using the psycopg2.connect() function, which takes several keyword arguments specifying the database connection parameters:

import psycopg2

try:
    conn = psycopg2.connect(
        host="your_db_host",
        database="your_db_name",
        user="your_db_user",
        password="your_db_password",
        port=5432  # Optional, defaults to 5432
    )
    print("Connected to PostgreSQL successfully!")
except psycopg2.Error as e:
    print(f"Error connecting to PostgreSQL: {e}")
    exit(1) #Exit with error code

Replace the placeholder values with your database server’s host, database name, username, and password. The port argument specifies the port number the database server listens on; it defaults to 5432. Robust error handling is essential; the try...except block catches potential connection errors.

Creating a Cursor

Before executing SQL queries, you need to create a cursor object. A cursor acts as an interface to the database, allowing you to execute queries and fetch results. You create a cursor using the cursor() method of the connection object:

cur = conn.cursor()

This creates a standard cursor. For enhanced performance with multiple queries, consider using conn.cursor('server_side') for server-side cursors. See the psycopg2 documentation for details on cursor types.

Executing Queries

To execute an SQL query, use the execute() method of the cursor object.

cur.execute("SELECT version()")

This executes the SELECT version() query. For queries with parameters (to prevent SQL injection vulnerabilities), use parameterized queries:

cur.execute("SELECT * FROM users WHERE username = %s", ("johndoe",))

Note the use of a tuple, even for a single parameter.

Fetching Data

After executing a query that returns data, you can fetch the results using methods like fetchone(), fetchmany(), or fetchall():

#Fetch one row
row = cur.fetchone()
print(row)

#Fetch multiple rows (e.g., 100 rows at a time)
rows = cur.fetchmany(100)
for row in rows:
    print(row)

#Fetch all remaining rows
rows = cur.fetchall()
for row in rows:
    print(row)

Remember that fetchone() returns a single row (or None if no more rows exist), fetchmany() returns a list of rows (up to the specified number), and fetchall() retrieves all remaining rows at once.

Handling Errors

psycopg2 raises exceptions when errors occur. Use try...except blocks to gracefully handle these errors:

try:
    cur.execute("INVALID SQL QUERY")
except psycopg2.Error as e:
    print(f"Database error: {e}")
    conn.rollback() # Rollback transaction if necessary

The conn.rollback() method is important if an error occurs within a transaction, ensuring data consistency. Inspect the exception object (e) for details about the error.

Closing Connections and Cursors

It’s crucial to close both the cursor and the connection when you’re finished with them to release resources:

cur.close()
conn.close()

Failure to close connections can lead to resource leaks and database connection problems. The with statement provides a convenient way to ensure proper closure even if exceptions occur:

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        # Database operations here...
        # conn and cur are automatically closed when exiting the 'with' block

This approach guarantees that the connection and cursor are always closed, regardless of whether exceptions are raised during database operations.

Working with Data

Inserting Data

Inserting data into a PostgreSQL table involves using an INSERT INTO statement. Parameterization is crucial to prevent SQL injection vulnerabilities:

import psycopg2

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        data = ("Jane Doe", 30, "jane.doe@example.com")
        cur.execute("INSERT INTO users (name, age, email) VALUES (%s, %s, %s)", data)
        conn.commit() # Commit changes to the database

The conn.commit() method saves the changes to the database. Without commit(), the insertion is only temporary within the current transaction. If an error occurs, conn.rollback() will undo any changes.

Updating Data

Updating existing data uses the UPDATE statement:

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        cur.execute("UPDATE users SET age = %s WHERE email = %s", (35, "jane.doe@example.com"))
        conn.commit()

This updates the age of the user with the email address “jane.doe@example.com”. Always include a WHERE clause to specify which rows to update to prevent accidental modifications of unintended data.

Deleting Data

Deleting data uses the DELETE statement:

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        cur.execute("DELETE FROM users WHERE email = %s", ("jane.doe@example.com",))
        conn.commit()

This deletes the user with the specified email address. Again, a WHERE clause is essential for precise deletion.

Retrieving Data with Different Methods

psycopg2 offers various ways to retrieve data, depending on your needs:

Example using fetchmany() for efficient processing of a large table:

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT * FROM large_table")
        while True:
            rows = cur.fetchmany(1000)  # Fetch 1000 rows at a time
            if not rows:
                break  # Exit the loop when no more rows are available
            for row in rows:
                # Process each row
                print(row)

Data Types and Mapping

psycopg2 automatically handles type mapping between Python and PostgreSQL data types. Generally, it maps Python types to their PostgreSQL equivalents. For example, Python int maps to PostgreSQL INTEGER, Python str maps to PostgreSQL TEXT, etc. Consult the psycopg2 documentation for a complete mapping. For custom mappings or complex types, you might need to use adapt() and register_adapter() functions.

Handling NULL values

PostgreSQL’s NULL values are represented as None in Python when fetched using psycopg2.

with psycopg2.connect(...) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT name, age FROM users WHERE id = 1")
        row = cur.fetchone()
        name = row[0]
        age = row[1] #age will be None if the age is NULL in database

        if age is None:
            print("Age is not specified for this user.")
        else:
            print(f"User's age: {age}")

Always check for None when dealing with database fields that can potentially have NULL values to avoid errors. Remember to handle None appropriately in your application logic.

Advanced Techniques

Transactions

Transactions ensure data integrity by grouping multiple database operations into a single unit of work. If all operations succeed, the changes are committed; otherwise, they’re rolled back, maintaining consistency. Psycopg2 supports transactions using the conn.commit() and conn.rollback() methods:

with psycopg2.connect(...) as conn:
    cur = conn.cursor()
    try:
        cur.execute("BEGIN") # Start a transaction

        cur.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
        cur.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")

        conn.commit() # Commit the transaction if successful
    except psycopg2.Error as e:
        conn.rollback() # Rollback if any error occurs
        print(f"Transaction failed: {e}")
    finally:
        cur.close()

The BEGIN statement starts a transaction. commit() saves changes; rollback() undoes them. The try...except...finally block ensures that the cursor is always closed, regardless of success or failure. AUTOCOMMIT is off by default in psycopg2; you must explicitly manage transactions.

Prepared Statements

Prepared statements enhance performance and security by pre-compiling SQL queries. They prevent SQL injection and reduce parsing overhead for repeated queries with varying parameters:

with psycopg2.connect(...) as conn:
    cur = conn.cursor()
    cur.prepare("select_user", "SELECT * FROM users WHERE id = %s")
    cur.execute("select_user", (1,))
    user = cur.fetchone()
    print(user)

Here, “select_user” is the prepared statement name, and %s acts as a placeholder. psycopg2 handles parameter binding securely and efficiently. Consider prepared statements for queries executed repeatedly with different inputs.

Working with Binary Data

psycopg2 efficiently handles binary data (e.g., images, files). Use the psycopg2.Binary adapter to send binary data to the database:

with psycopg2.connect(...) as conn:
    cur = conn.cursor()
    with open("image.jpg", "rb") as f:
        binary_data = f.read()
    cur.execute("INSERT INTO images (image_data) VALUES (%s)", (psycopg2.Binary(binary_data),))
    conn.commit()

psycopg2.Binary() wraps the binary data, ensuring correct transmission to PostgreSQL. Retrieving binary data involves reading it from the result set and converting it back as needed.

Using COPY

The COPY command provides a highly efficient way to import/export large amounts of data to/from PostgreSQL. Psycopg2 offers support for it through the copy_expert() method:

with psycopg2.connect(...) as conn:
    cur = conn.cursor()
    with open("data.csv", "r") as f:
        cur.copy_expert("COPY my_table FROM STDIN WITH CSV HEADER DELIMITER AS ','", f)
    conn.commit()

This copies data from data.csv into my_table. copy_expert allows fine-grained control over the COPY command options. Refer to PostgreSQL documentation for options like DELIMITER, HEADER, etc.

Asynchronous Operations

For high-throughput applications requiring concurrent database access, consider using asyncpg, a fully asynchronous PostgreSQL driver built on top of asyncio. While not directly part of psycopg2, asyncpg provides asynchronous operations that avoid blocking the event loop, leading to improved performance in asynchronous environments.

Connection Pooling

Connection pooling is crucial for efficient database access in multi-threaded or multi-process applications. It reduces the overhead of establishing new connections for each request by reusing connections from a pool. Psycopg2 itself doesn’t include a built-in connection pool, but you can use third-party libraries like psycopg2-pool or sqlalchemy’s connection pooling features for managing a pool of connections. This can significantly improve performance, especially in high-concurrency scenarios.

Error Handling and Debugging

Common Errors

Several common errors can arise when using psycopg2. Understanding these helps in quicker debugging:

Debugging Techniques

Effective debugging strategies for psycopg2 applications include:

Exception Handling

Robust exception handling is vital in psycopg2 applications. Use try...except blocks to gracefully handle errors and prevent application crashes:

try:
    with psycopg2.connect(...) as conn:
        with conn.cursor() as cur:
            cur.execute("SELECT * FROM non_existent_table")  # Potential error
            rows = cur.fetchall()
except psycopg2.ProgrammingError as e:
    print(f"SQL error: {e}")
except psycopg2.OperationalError as e:
    print(f"Connection error: {e}")
except Exception as e: #Catch any other exceptions
    print(f"An unexpected error occurred: {e}")
finally:
  # Always close the connection and cursor.
  if 'conn' in locals() and conn: conn.close()
  if 'cur' in locals() and cur: cur.close()

This example demonstrates handling specific psycopg2 exceptions and a general Exception to catch other potential issues. The finally block guarantees resource cleanup.

Logging

The logging module provides a structured way to record events. This is especially valuable for tracking errors, debugging, and monitoring in production environments:

import logging

logging.basicConfig(filename='app.log', level=logging.ERROR, 
                    format='%(asctime)s - %(levelname)s - %(message)s')

try:
    # ... your database interaction code ...
except psycopg2.Error as e:
    logging.error(f"Database error: {e}")

This configures logging to write error messages to app.log. Adjust the level and format parameters as needed. Logging facilitates post-mortem analysis and troubleshooting.

Security Considerations

Password Management

Never hardcode database passwords directly into your Python code. This poses a significant security risk. Use environment variables or a secure configuration management system to store and retrieve passwords. Here’s how to use environment variables:

import os
import psycopg2

db_password = os.environ.get("POSTGRES_PASSWORD")

if db_password is None:
    print("POSTGRES_PASSWORD environment variable not set!")
    exit(1)

try:
    conn = psycopg2.connect(
        host="your_db_host",
        database="your_db_name",
        user="your_db_user",
        password=db_password,
        port=5432
    )
    # ... your code ...
except psycopg2.Error as e:
    print(f"Database connection error: {e}")

This example retrieves the password from the POSTGRES_PASSWORD environment variable. Ensure the environment variable is properly set before running the application. Consider using a dedicated secrets management solution for enhanced security in production environments.

SQL Injection Prevention

SQL injection is a serious vulnerability. Always use parameterized queries or prepared statements to prevent attackers from injecting malicious SQL code into your queries:

Incorrect (vulnerable to SQL injection):

username = input("Enter username: ")
query = f"SELECT * FROM users WHERE username = '{username}'"  # VERY DANGEROUS!
cur.execute(query)

Correct (using parameterized queries):

username = input("Enter username: ")
cur.execute("SELECT * FROM users WHERE username = %s", (username,))

Parameterized queries treat user inputs as data, not as executable code. Psycopg2 handles the proper escaping and sanitization, eliminating the risk of SQL injection.

Secure Connection Configuration

Secure database connection configuration involves several aspects:

conn = psycopg2.connect(
    ...,
    sslmode="require", #Enforce SSL connection
    sslcert='client-cert.pem', # Path to client certificate
    sslkey='client-key.pem', # Path to client key
    sslrootcert='server-ca.pem' # Path to server CA certificate
)

By following these security best practices, you can significantly reduce the risk of vulnerabilities in your psycopg2 applications. Remember that security is a continuous process, requiring regular review and updates.

Extending psycopg2

Extending the library

While psycopg2 is a feature-rich library, there might be situations where you need to extend its functionality. This can involve adding support for custom data types, creating new connection features, or integrating with other systems. Extending psycopg2 typically requires a deeper understanding of its internal workings and involves modifying its source code or creating wrapper functions. The psycopg2 source code is available on GitHub, and the documentation includes information on contributing to the project. Note that extending the core library requires careful testing to ensure compatibility and stability. It’s often simpler to achieve desired functionality through custom adapters or wrapper functions than directly modifying psycopg2’s core code.

Creating custom adapters

psycopg2 provides mechanisms for handling custom data types. If you’re working with a data type not natively supported by psycopg2, you can create a custom adapter. This adapter defines how the custom type is converted between its Python representation and its PostgreSQL equivalent. This is often the preferred method for extending data handling within psycopg2 without directly modifying its source.

Example of creating a custom adapter for a Point object:

import psycopg2
from psycopg2.extensions import register_adapter, AsIs

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __str__(self):
        return f"({self.x}, {self.y})"


def adapt_point(point):
    return AsIs(f"POINT({point.x} {point.y})")

register_adapter(Point, adapt_point)

# Usage:
with psycopg2.connect(...) as conn:
    cur = conn.cursor()
    point = Point(10, 20)
    cur.execute("INSERT INTO points (location) VALUES (%s)", (point,))
    conn.commit()

This code defines a Point class and an adapt_point function that converts it into the PostgreSQL POINT type. register_adapter registers the adapter with psycopg2.

Using third-party libraries

Several third-party libraries enhance or extend the capabilities of psycopg2. These libraries often provide additional features, such as connection pooling, ORM integration, or improved asynchronous operations:

Using third-party libraries often simplifies development and improves application performance. Choose libraries that align with your application’s needs and architectural considerations. Always carefully review the documentation and security implications of external libraries before integrating them into your projects.

Appendix

Glossary of Terms

List of Exceptions

psycopg2 raises various exceptions to indicate errors. The most common include:

For a complete list and detailed descriptions of exceptions, refer to the official psycopg2 documentation.

Frequently Asked Questions (FAQ)

This FAQ covers common questions. For more detailed information, consult the psycopg2 documentation.