reader - Data readers for formats which are no-longer supported

Purpose:

This module contains purpose-built data readers for formats which are no longer supported, namely:

  • .xls: pre-Excel 5.0/95 Workbook

Platform:

Linux/Windows | Python 3.6+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

Example:

Example for reading an old-style .xls (pre-Excel 5.0/95) Workbook into a DataFrame:

>>> from utils4.reader import reader
>>> df = reader.read_xls('/path/to/file.xls')
class reader.Reader[source]

Class wrapper for various data reading methods.

For details on each reader, refer to the docstring for that reader.

read_xls(filepath: str, encoding: str = None, sheet_index: int = 0, skiprows: int = 0, skipcols: int = 0, chopcols: int = 0, date_formats: dict = None, errors: str = 'coerce', fill_date_errors: bool = False) pandas.DataFrame[source]

Read a pre-Excel 5.0/95 .XLS file into a DataFrame.

This function is designed to deal with old XLS files which the pandas.read_excel function does not support.

Parameters:
  • filepath (str) – Full path to the file to be read.

  • encoding (str, optional) – Encoding used to read the XLS file. Defaults to None.

  • sheet_index (int, optional) – Index of the sheet to be read, zero-based. Defaults to 0.

  • skiprows (int, optional) – Number of rows to skip (from the beginning of the file). Defaults to 0.

  • skipcols (int, optional) – Number of columns to skip (from the left). Defaults to 0.

  • chopcols (int, optional) – Number of columns to skip/chop (from the right). Defaults to 0.

  • date_formats (dict, optional) – Dictionary of {col_name: strftime_mask}. Defaults to None.

  • errors (str, optional) – Method used by read_csv() to resolve date parsing errors. Defaults to ‘coerce’.

  • fill_date_errors (bool, optional) – Fill coerced NaT date errors with ‘1900-01-01’. Defaults to False.

Logic:

The passed XLS file is opened and parsed by the xlrd library, then read into an in-memory stream buffer, which is passed into pandas.read_csv function for conversion to a DataFrame.

Raises:
  • ValueError – If the file extension is not .xls.

  • IOError – If the workbook does not contain any rows of data.

Returns:

A DataFrame containing the contents of the XLS file.

Return type:

df (pd.DataFrame)