xlrd - Documentation

What is xlrd?

xlrd is a pure Python library for reading data from Microsoft Excel (®) spreadsheet files (.xls). It supports both older .xls files (using the BIFF 8-12 formats) and does not support newer .xlsx, .xlsm, or .xltx files (which use the OOXML format). xlrd is designed to be reliable and efficient, providing a straightforward interface for accessing spreadsheet data. It’s particularly useful for tasks such as data extraction, analysis, and migration from older Excel files. It focuses solely on reading data; it does not offer any capabilities for writing or modifying Excel files.

Installation and Setup

xlrd is readily available through the Python Package Index (PyPI). The easiest way to install it is using pip:

pip install xlrd

This command will download and install xlrd and its dependencies. No additional configuration is typically required. If you encounter issues, ensure that you have a working Python installation and that pip is correctly configured. You may need administrator or root privileges for the installation to succeed.

Basic Usage Example

This example demonstrates how to open a workbook, access a specific sheet, and read cell values:

import xlrd

# Open the workbook
workbook = xlrd.open_workbook('example.xls')

# Get the first sheet (index 0)
sheet = workbook.sheet_by_index(0)  # or sheet = workbook.sheet_by_name('Sheet1')

# Get the number of rows and columns
nrows = sheet.nrows
ncols = sheet.ncols

# Print the values of the first few cells
for row_idx in range(min(nrows, 5)):  # Limit to 5 rows for brevity
    for col_idx in range(min(ncols, 5)):  # Limit to 5 columns for brevity
        cell_value = sheet.cell_value(row_idx, col_idx)
        print(f"Cell ({row_idx}, {col_idx}): {cell_value}")

workbook.release_resources() #Good practice to release resources.
del workbook # Good practice to release resources.

Remember to replace 'example.xls' with the actual path to your Excel file. This code opens the workbook, selects the first sheet, and iterates through a small portion of the data to print the cell values. More sophisticated data access methods are available, as described in the full xlrd documentation.

Supported Excel Versions

xlrd supports Microsoft Excel versions that use the Binary File Format (BIFF) versions 8, 7, 5, 4, 3, 2, and 1. It does not support the newer .xlsx, .xlsm, or .xltx file formats based on the Open XML Spreadsheet format. If you need to work with these newer file types, consider using libraries like openpyxl.

Opening and Reading Workbooks

Opening an Excel File

The primary function for interacting with xlrd is xlrd.open_workbook(). This function takes the file path to your Excel file as its main argument. Several optional arguments allow for customization:

import xlrd

# Basic usage:
workbook = xlrd.open_workbook('my_excel_file.xls')

# Specifying encoding (if necessary):
workbook = xlrd.open_workbook('my_excel_file.xls', encoding_override="latin-1")

# Opening a file-like object:
with open('my_excel_file.xls', 'rb') as f:
    workbook = xlrd.open_workbook(file_contents=f.read())

#  Using on_demand to load sheets lazily:
workbook = xlrd.open_workbook('my_excel_file.xls', on_demand=True)

The encoding_override argument is crucial if your Excel file uses a character encoding other than the default (usually UTF-8). The file_contents argument lets you read from an in-memory buffer instead of a file path. on_demand=True is useful for very large files where you might not need to load all sheets immediately; sheets will be loaded only when accessed. Refer to the full xlrd documentation for a complete list of all possible arguments. Note that failure to specify the correct encoding may lead to garbled characters in your output.

Accessing Workbook Properties

Once a workbook is open, you can access various properties:

import xlrd

workbook = xlrd.open_workbook('my_excel_file.xls')

# Get the number of sheets:
num_sheets = workbook.nsheets

# Get a sheet by index (0-based):
sheet1 = workbook.sheet_by_index(0)

# Get a sheet by name:
sheet2 = workbook.sheet_by_name('Sheet2') #raises error if sheet is not found.

# Get a list of sheet names:
sheet_names = workbook.sheet_names()

# Get workbook information (e.g., author, creation date)

# This requires checking if they are available in your workbook version.
try:
    info = workbook.info()
    print(f"Author: {info.author}")
    print(f"Creation Date: {info.create_ascii}")

except AttributeError:
    print ("This workbook version does not contain author or create_ascii information.")

workbook.release_resources()
del workbook

Error handling is critical, as not all workbooks will contain all properties.

Handling Errors

Various errors can occur when working with xlrd. The most common include:

It’s essential to use try-except blocks to handle these exceptions gracefully:

import xlrd

try:
    workbook = xlrd.open_workbook('my_excel_file.xls')
    sheet = workbook.sheet_by_name('Sheet1')
    # ... process the sheet ...
except xlrd.XLRDError as e:
    print(f"Error reading Excel file: {e}")
except xlrd.SheetNotFoundError:
    print("Sheet 'Sheet1' not found in the workbook.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
finally:
    if 'workbook' in locals() and workbook:
        workbook.release_resources()
        del workbook

Always include proper error handling to prevent your program from crashing due to unexpected file formats or missing sheets.

Closing Workbooks

While Python’s garbage collection will eventually reclaim resources, it’s good practice to explicitly close workbooks when finished, especially when dealing with large files:

import xlrd

workbook = xlrd.open_workbook('my_excel_file.xls')
# ... process the workbook ...
workbook.release_resources()
del workbook

Calling workbook.release_resources() releases the file handles and other resources held by the workbook object, improving efficiency and reducing the risk of resource leaks. del workbook helps ensure the object is removed from memory. This is a key best practice.

Working with Worksheets

Accessing Worksheets by Name and Index

After opening a workbook using xlrd.open_workbook(), you access individual worksheets using either their index (starting from 0) or their name. Both methods are demonstrated below:

import xlrd

workbook = xlrd.open_workbook('my_excel_file.xls')

# Accessing by index:
sheet_by_index = workbook.sheet_by_index(0)  # Gets the first sheet

# Accessing by name:
try:
    sheet_by_name = workbook.sheet_by_name('Sheet1') # Gets the sheet named 'Sheet1'
except xlrd.XLRDError as e:
    print(f"Error: {e}") #Handles the case where the sheet does not exist.


#It is good practice to check if sheets exist before accessing.
sheet_names = workbook.sheet_names()
if "Sheet1" in sheet_names:
    sheet = workbook.sheet_by_name("Sheet1")
    #Do something with the sheet.
else:
    print("Sheet named 'Sheet1' not found.")


workbook.release_resources()
del workbook

Remember that sheet_by_name() will raise a xlrd.XLRDError if the specified sheet name doesn’t exist. Using sheet_names() to check for existence is a safer approach.

Retrieving Worksheet Dimensions

Once you have a worksheet object, you can determine its dimensions using the nrows and ncols attributes:

import xlrd

workbook = xlrd.open_workbook('my_excel_file.xls')
sheet = workbook.sheet_by_index(0)

num_rows = sheet.nrows
num_cols = sheet.ncols

print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

workbook.release_resources()
del workbook

These attributes provide the total number of rows and columns in the worksheet, including any empty rows or columns at the end.

Iterating through Rows and Cells

The most common way to process data in a worksheet is to iterate through its rows and cells. Here’s how you can do it:

import xlrd

workbook = xlrd.open_workbook('my_excel_file.xls')
sheet = workbook.sheet_by_index(0)

for row_index in range(sheet.nrows):
    row = sheet.row(row_index) # Get a list of cell objects in current row.
    for cell_index, cell_obj in enumerate(row):
        cell_value = cell_obj.value
        cell_type = cell_obj.ctype #  Get cell type (xlrd.XL_CELL_NUMBER, xlrd.XL_CELL_TEXT, etc.)

        print(f"Row {row_index + 1}, Column {cell_index + 1}: Value = {cell_value}, Type = {cell_type}")

workbook.release_resources()
del workbook

This code iterates through each row and then through each cell within that row. cell_obj.value provides the cell’s value, and cell_obj.ctype provides its type (e.g., number, text, date, boolean, error, blank). Understanding cell types is crucial for handling data correctly, as you might need to perform type conversions for certain operations. Remember that row and column indices are zero-based. Alternative methods for accessing individual cells directly using sheet.cell_value(rowx, colx) are also available for more efficient random access.

Reading Cell Values

Accessing Cell Values

The core of using xlrd is retrieving the values from individual cells. There are several ways to achieve this:

Method 1: Using sheet.cell_value(rowx, colx): This is the most direct method. rowx and colx are the row and column indices (both zero-based).

import xlrd

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)

cell_value = sheet.cell_value(0, 0)  # Value of cell A1
print(f"Cell A1: {cell_value}")

cell_value = sheet.cell_value(2, 1)  # Value of cell B3
print(f"Cell B3: {cell_value}")

workbook.release_resources()
del workbook

Method 2: Iterating through rows: This is useful for processing entire rows or sheets.

import xlrd

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)

for row_index in range(sheet.nrows):
    for col_index in range(sheet.ncols):
        cell_value = sheet.cell_value(row_index, col_index)
        # ... process cell_value ...
workbook.release_resources()
del workbook

Method 3: Accessing cells via sheet.row(rowx): This returns a list of Cell objects for a given row.

import xlrd

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)

row = sheet.row(0) #Get the first row.
for cell in row:
    print(cell.value)

workbook.release_resources()
del workbook

Data Types and Handling

xlrd represents cell values using several data types. The cell.ctype attribute indicates the cell type:

You should always check cell.ctype before using the cell.value to avoid type errors.

Dealing with Different Cell Formats

Excel allows for various cell formats (number formats, date formats, etc.). xlrd provides access to these formats through the xf_index attribute of the Cell object and the workbook’s xf_list. However, cell.value gives you the value as Python understands it, so direct formatting is generally unnecessary unless you need to format the output string differently than Python’s default behavior. To get a cell’s formatting information, use sheet.cell_xf_index(rowx, colx) to get the index into workbook.xf_list, then examine the relevant XF object.

Handling Errors in Cell Access

Accessing cells outside the bounds of the sheet or attempting to access invalid cells will raise IndexError. Always check the sheet’s dimensions (sheet.nrows, sheet.ncols) before accessing cells to avoid these errors.

import xlrd

try:
    workbook = xlrd.open_workbook('example.xls')
    sheet = workbook.sheet_by_index(0)
    value = sheet.cell_value(sheet.nrows, 0) #Accessing a row beyond the limit.
except IndexError:
    print("IndexError: Tried to access a cell outside the sheet bounds.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
finally:
    if 'workbook' in locals() and workbook:
        workbook.release_resources()
        del workbook

Working with Formulas and Dates

xlrd does not evaluate formulas; it returns the calculated result of formulas as they appear in the Excel file, or an error indicator if there was an error in evaluating the formula in the source file. For date values (xlrd.XL_CELL_DATE), the cell.value is a floating-point number representing the number of days since the Excel epoch (often December 30, 1899, or January 1, 1900, depending on the Excel version and settings). You need to convert this using the xldate_as_tuple() function from xlrd:

import xlrd
import datetime

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)
cell = sheet.cell(0, 0)

if cell.ctype == xlrd.XL_CELL_DATE:
    date_tuple = xlrd.xldate_as_tuple(cell.value, workbook.datemode)
    date_value = datetime.datetime(*date_tuple)
    print(f"Date: {date_value}")

workbook.release_resources()
del workbook

Remember to handle cases where the cell does not contain a date value to avoid errors. workbook.datemode indicates whether the workbook is using the 1900 or 1904 date system.

Advanced Usage

Working with Styles

xlrd provides access to cell styles, allowing you to retrieve formatting information beyond the basic cell type. Each cell has an xf_index attribute pointing to an entry in the workbook’s xf_list. This list contains XF objects, which represent extended formats.

import xlrd

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)

cell = sheet.cell(0, 0)
xf_index = cell.xf_index
xf = workbook.xf_list[xf_index]

# Access style properties (these may vary depending on Excel version and style):
# font index, alignment, border, pattern, protection, etc.
print(f"Font index: {xf.font_index}")
# ...access other properties of xf...

workbook.release_resources()
del workbook

You will need to refer to the Excel file format specification to fully understand the meaning of each attribute within the XF object. The information available varies slightly depending on the Excel version used to create the spreadsheet.

Accessing Cell Formatting Information

While cell.value provides the cell’s content, understanding formatting is crucial for accurate data interpretation. You can access formatting details through the XF object (as shown in the previous section) and also by using the sheet.cell_xf_index(rowx, colx) method which gives you the index of the Extended Format record associated with the cell at that location. This index then lets you look up the formatting details in workbook.xf_list. The formatting information includes font properties, number formats, alignment, borders, and more. Decoding number formats usually requires additional external resources (not provided directly by xlrd) or careful study of how Excel stores formatting information.

Using Named Ranges

xlrd supports accessing named ranges defined within an Excel workbook. Use workbook.name_map which is a dictionary mapping name strings to a tuple of sheet index and cell range.

import xlrd

workbook = xlrd.open_workbook('example.xls')
named_ranges = workbook.name_map

for name, (sheet_index, (row_start, col_start, row_end, col_end)) in named_ranges.items():
    print(f"Named range: {name}, Sheet index: {sheet_index}, Range: ({row_start}, {col_start}, {row_end}, {col_end})")

sheet = workbook.sheet_by_index(sheet_index)
# Access cells within the named range using row_start, col_start, row_end, col_end


workbook.release_resources()
del workbook

Remember that named ranges are not always present, so you should handle the case where workbook.name_map might be empty.

xlrd can extract hyperlink information from cells that contain them. Each cell object has a hyperlink attribute. If a cell contains a hyperlink, this attribute will be a tuple of (URL, label, target sheet, target cell). If there’s no hyperlink, this attribute will be None.

import xlrd

workbook = xlrd.open_workbook('example.xls')
sheet = workbook.sheet_by_index(0)

for row_index in range(sheet.nrows):
    for col_index in range(sheet.ncols):
        cell = sheet.cell(row_index, col_index)
        if cell.hyperlink:
            url, label, target_sheet, target_cell = cell.hyperlink
            print(f"Hyperlink found at ({row_index}, {col_index}): URL = {url}, Label = {label}")

workbook.release_resources()
del workbook

Working with Charts (if supported)

xlrd’s support for charts is extremely limited or nonexistent in many versions. Older versions might offer minimal access to chart information, such as the existence of a chart within a sheet, but the details are not typically well documented or consistently available across different spreadsheet file versions. For robust chart processing you would need a different library. Attempting to access chart data might not produce any meaningful results, and the specific attributes available and their reliability would need careful testing with different Excel files. Directly accessing and parsing the underlying BIFF structures may be required, but this is complex and not recommended unless you have an exceptional need and are deeply familiar with the BIFF format.

Best Practices and Performance

Efficiently Reading Large Files

When working with large Excel files, optimizing your code for performance is crucial. Here are some key strategies:

Memory Management

import xlrd

with open('my_excel_file.xls', 'rb') as f:
    workbook = xlrd.open_workbook(file_contents=f.read())
    # ... process the workbook ...

#The file will be automatically closed and resources released even if exceptions are raised.

Error Handling Strategies

Robust error handling is essential when working with xlrd, especially for production systems. Here’s how to enhance error handling:

By following these best practices and error handling strategies, you can make your xlrd applications more robust, efficient, and reliable when dealing with a wide range of Excel files, including large or complex ones.

Troubleshooting and Common Issues

Error Messages and Solutions

xlrd provides informative error messages, but understanding them requires familiarity with the library and potential issues in Excel files. Here are some common error messages and solutions:

Debugging Techniques

Compatibility Issues

File Corruption Handling

Remember to thoroughly test your code with various Excel files (including ones that might be slightly corrupted) to ensure robustness. Implementing these troubleshooting strategies will make your applications more robust and resilient to errors encountered while reading Excel files.

Appendix: xlrd API Reference

This section provides a concise overview of key classes and methods in the xlrd API. For complete and up-to-date information, refer to the official xlrd documentation.

open_workbook() method

The open_workbook() function is the entry point for working with xlrd. It opens an Excel (.xls) file and returns a Book object.

Signature:

xlrd.open_workbook(filename=None, logfile=None, verbosity=0, use_mmap=True, file_contents=None, encoding_override=None, formatting_info=False, ragged_rows=False, on_demand=False, ignore_workbook_corruption=False)

Parameters:

Returns:

A Book object representing the opened workbook.

Raises:

xlrd.XLRDError if the file cannot be opened or parsed. Other exceptions may be raised depending on the specific problem encountered.

Book Class

The Book class represents an Excel workbook. It provides methods to access sheets, workbook properties, and other information.

Key Attributes and Methods:

Sheet Class

The Sheet class represents a worksheet within a workbook.

Key Attributes and Methods:

Cell Class

The Cell class represents a single cell within a worksheet.

Key Attributes:

This API reference provides a starting point. The full xlrd documentation should be consulted for a complete and detailed description of all classes, methods, and attributes. Remember that the specifics of certain attributes (like those within XF objects) may depend on the version of Excel that created the file.