reader - Data readers for formats which are no-longer supported
- Purpose:
This module contains purpose-built data readers for formats which are no longer supported, namely:
.xls: pre-Excel 5.0/95 Workbook
- Platform:
Linux/Windows | Python 3.6+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- Example:
Example for reading an old-style .xls (pre-Excel 5.0/95) Workbook into a DataFrame:
>>> from utils4.reader import reader >>> df = reader.read_xls('/path/to/file.xls')
- class reader.Reader[source]
Class wrapper for various data reading methods.
For details on each reader, refer to the docstring for that reader.
- read_xls(filepath: str, encoding: str = None, sheet_index: int = 0, skiprows: int = 0, skipcols: int = 0, chopcols: int = 0, date_formats: dict = None, errors: str = 'coerce', fill_date_errors: bool = False) pandas.DataFrame [source]
Read a pre-Excel 5.0/95 .XLS file into a DataFrame.
This function is designed to deal with old XLS files which the
pandas.read_excel
function does not support.- Parameters:
filepath (str) – Full path to the file to be read.
encoding (str, optional) – Encoding used to read the XLS file. Defaults to None.
sheet_index (int, optional) – Index of the sheet to be read, zero-based. Defaults to 0.
skiprows (int, optional) – Number of rows to skip (from the beginning of the file). Defaults to 0.
skipcols (int, optional) – Number of columns to skip (from the left). Defaults to 0.
chopcols (int, optional) – Number of columns to skip/chop (from the right). Defaults to 0.
date_formats (dict, optional) – Dictionary of
{col_name: strftime_mask}
. Defaults to None.errors (str, optional) – Method used by
read_csv()
to resolve date parsing errors. Defaults to ‘coerce’.fill_date_errors (bool, optional) – Fill coerced NaT date errors with ‘1900-01-01’. Defaults to False.
- Logic:
The passed XLS file is opened and parsed by the
xlrd
library, then read into an in-memory stream buffer, which is passed intopandas.read_csv
function for conversion to a DataFrame.- Raises:
ValueError – If the file extension is not
.xls
.IOError – If the workbook does not contain any rows of data.
- Returns:
A DataFrame containing the contents of the XLS file.
- Return type:
df (pd.DataFrame)