utils - Collection of general utility-based functions
- Purpose:
Central library for general utility-based methods.
This
utils
module was the starting place of the originalutils
library. Therefore, it’s historically been a ‘dumping-ground’ for general S3DEV utilities and function wrappers specialised to the needs of S3DEV projects, which did not seem to fit in anywhere else. So we’ll be honest, it’s a bit of a melting pot of functions.With the overhaul of the
utils3
library intoutils4
, many of the original functions, which were no longer being used, have been removed in an effort to clean the module’s code base.If you are looking for a function which used to be here, please refer to the last
utils3
release, which is v0.15.1.- Platform:
Linux/Windows | Python 3.6+
- Developer:
J Berendt
- Email:
Note
Any libraries which are not built-in, are imported only if/when the function which uses them is called.
This helps to reduce the packages required by utils4
.
- Example:
For usage examples, please refer to the docstring for each method.
- utils.clean_dataframe(df: pandas.DataFrame)[source]
Clean a
pandas.DataFrame
data structure.- Parameters:
df (pd.DataFrame) – DataFrame to be cleaned.
- Design:
The DataFrame is cleaned in-place. An object is not returned by this function.
The following cleaning tasks are performed:
Column names:
All punctuation characters are removed, with the exception of three characters. See next bullet point.
The
-
,[space]
and_
characters are replaced with an underscore.All column names are converted to lower case.
Data:
All
object
(string) fields, are stripped of leading and trailing whitespace.
- Example:
Example for cleaning a DataFrame:
>>> import pandas as pd # For demonstration only. >>> from utils4 import utils >>> # Define a dirty testing dataset. >>> df = pd.DataFrame({'Column #1': [' Text field 1.', ' Text field 2.', ' Text field 3. ', ' Text field 4. ', ' Text field 5. '], ' COLUmn (2)': [1.0, 2.0, 3.0, '4', '5.0'], 'COLUMN 3 ': [1, 2, 3.0, 4, 5.0]}) >>> utils.clean_dataframe(df) >>> df column_1 column_2 column_3 0 Text field 1. 1.0 1.0 1 Text field 2. 2.0 2.0 2 Text field 3. 3.0 3.0 3 Text field 4. 4 4.0 4 Text field 5. 5.0 5.0
- utils.direxists(path: str, create_path: bool = False) bool [source]
Test if a directory exists. If not, create it, if instructed.
- Parameters:
path (str) – The directory path to be tested.
create_path (bool, optional) – Create the path if it doesn’t exist. Defaults to False.
- Design:
Function designed to test if a directory path exists. If the path does not exist, the path can be created; as determined by the
create_path
parameter.This function extends the built-in
os.path.exists()
function in that the path can be created if it doesn’t already exist, by passing thecreate_path
parameter asTrue
.If the path is created by this function, the function is recursively called to test if the path exists, and will return
True
.If a filename is passed with the path, the filename is automatically stripped from the path before the test begins.
- Example:
Test if a directory exists, and create it if it does not exist:
>>> from utils4 import utils >>> utils.direxists(path='/tmp/path/to_create/file.csv', create_path=True)
- Returns:
True if the directory exists (or was created), otherwise False.
- Return type:
bool
- utils.fileexists(filepath: str, error: str = 'ignore') bool [source]
Test if a file exists. If not, notify the user or raise an error.
- Parameters:
filepath (str) – Full file path to test.
error (bool, optional) –
Action to be taken if the file does not exist. Defaults to ‘ignore’. Options:
'ignore'
: Take no action.'alert'
: Alert the user the filepath does not exist via a simple message to the terminal.'raise'
: Raise aFileNotFoundError
. This will abort all subsequent processing.
- Design:
Function designed check if a file exists. A boolean value is returned to the calling program.
This function extends the built-in
os.path.isfile()
function in that the user can be notified if the path does not exist, or an error can be raised.- Example:
Test if a file exists, using
'ignore'
, the default action:>>> from utils4 import utils >>> if utils.fileexists(filepath='/tmp/path/to/file.csv'): >>> ... >>> else: >>> ...
Test if a file exists, using
'alert'
:>>> from utils4 import utils >>> if utils.fileexists(filepath='/tmp/path/to/file.csv', error='alert'): >>> ... >>> else: >>> ... File not found: /tmp/path/to/file.csv
Test if a file exists, using
'raise'
:>>> from utils4 import utils >>> if utils.fileexists(filepath='/tmp/path/to/file.csv', error='raise'): >>> ... >>> else: >>> ... FileNotFoundError: File not found: /tmp/path/to/file.csv
- Raises:
FileNotFoundError – If the filepath does not exist and the
error
parameter is'raise'
.- Returns:
True if the file exists, otherwise False.
- Return type:
bool
- utils.format_exif_date(datestring: str, input_format: str = '%Y:%m:%d %H:%M:%S', output_format: str = '%Y%m%d%H%M%S', return_datetime: bool = False) datetime | str [source]
Format an exif timestamp.
This function is useful for storing an exif date as a datetime string. For example, extracting the exif data from an image to be stored into a database.
- Parameters:
datestring (str) – The datetime string to be formatted. A typical exif date format is: yyyy:mm:dd hh:mi:ss
input_format (str, optional) – Format mask for the input datetime value. Defaults to ‘%Y:%m:%d %H:%M:%S’.
output_format (str, optional) – Format mask for the output datetime, if returned as a string. Defaults to ‘%Y%m%d%H%M%S’.
return_datetime (bool, optional) – Return a
datetime
object, rather than a formatted string.
- Design:
Function designed to convert the exif date/timestamp from ‘2010:01:31 12:31:18’ (or a caller specified format) to a format specified by the caller.
The default input mask is the standard exif capture datetime format.
- Example:
Convert the exif datetime to the default output string format:
>>> from utils4 import utils >>> formatted = utils.format_exif_date('2010:01:31 12:31:18') >>> formatted '20100131123118'
Convert the exif datetime to a datetime object:
>>> from utils4 import utils >>> formatted = utils.format_exif_date('2010:01:31 12:31:18', return_datetime=True) >>> formatted datetime.datetime(2010, 1, 31, 12, 31, 18)
- Returns:
A formatted datetime string, if the
return_datetime
parameter isFalse
, otherwise adatetime.datetime
object.- Return type:
Union[str, datetime.datetime]
- utils.get_os() str [source]
Get the platform’s OS.
This method is a very thin wrapper around the
platform.system()
function.- Example:
>>> from utils4 import utils >>> myos = utils.get_os() >>> myos 'linux'
- Returns:
A string of the platform’s operating system, in lower case.
- Return type:
str
- utils.getdrivername(driver: str, return_all: bool = False) list [source]
Return a list of ODBC driver names, matching the regex pattern.
- Parameters:
driver (str) – A regex pattern for the ODBC driver you’re searching.
return_all (bool, optional) – If True, all drivers matching the pattern are returned. Defaults to False, which returns only the first driver name.
- Design:
This is a helper function designed to get and return the names of ODBC drivers.
The
driver
parameter should be formatted as a regex pattern. If multiple drivers are found, by default, only the first driver in the list is returned. However, thereturn_all
parameter adjusts this action to return all driver names.This function has a dependency on the
pyodbc
library. Therefore, thetestimport()
function is called beforepyodbc
is imported. If thepyodbc
library is not installed, the user is notified.- Dependencies:
pyodbc
library
- Example:
Get the driver name for the SQL Server ODBC driver:
>>> from utils4 import utils >>> driver = utils.getdrivername(driver='SQL Server.*')
- Troubleshooting:
On Unix-like systems, the following error message:
ImportError: libodbc.so.2: cannot open shared object file: No such file or directory
can be resolved by installing the
unixodbc-dev
package as:$ sudo apt install unixodbc-dev
- Returns:
A list of ODBC drivers, if any were found.
- Return type:
list
- utils.getsitepackages() str [source]
Return the Python installation’s site packages directory.
- Design:
The function first uses the local
get_os()
function to get the system’s OS. The OS is then tested and the site-packages location is returned using the OS-appropriate element from the list returned by the built-insite.getsitepackages()
function.If the OS is not accounted for, or fails the test, a value of ‘unknown’ is returned.
- Rationale:
The need for this function comes out of the observation there are many (many!) different ways on stackoverflow (and other sites) to get the location to which
pip
will install a package, and many of the answers contradict each other. Also, thesite.getsitepackages()
function returns a list of options (in all tested cases); and the Linux / Windows paths are in different locations in this list.- Example:
Get the location of the
site-packages
directory:>>> from utils4 import utils
>>> utils.getsitepackages() '/home/<username>/venvs/py38/lib/python3.8/site-packages'
- Returns:
Full path to the
site-packages
directory.- Return type:
str
- utils.gzip_compress(in_path: str, out_path: str = None, size: int = None) str [source]
Compress a file using
gzip
.- Parameters:
in_path (str) – Full path to the file to be compressed. If the file does not exist, a
FileNotFoundError
is raised.out_path (str, optional) – Full path to the compressed output file. Defaults to None. If this value is
None
a'.gz'
file extension is appended to the path provided to thein_path
parameter.size (int, optional) – Size of the chunk to be read / written during compression. Defaults to 10MiB.
- Example:
Compress a text file:
>>> from utils4 import utils >>> utils.gzip_compress(in_path='/tmp/rand.txt') '/tmp/rand.txt.gz'
Compress a text file, specifying the output path:
>>> from utils4 import utils >>> utils.gzip_compress(in_path='/tmp/rand.txt', out_path='/tmp/rand2.txt.gz') '/tmp/rand2.txt.gz'
- Returns:
Full path to the output file.
- Return type:
str
- utils.gzip_decompress(path: str, encoding: str = 'utf-8', size: int = None) bool [source]
Decompress a
.gz
file usinggzip
.- Parameters:
path (str) – Full path to the file to be decompressed. If the file does not exist, a
FileNotFoundError
is raised.encoding (str, optional) – Encoding to be used to decode the decompressed binary data. Defaults to ‘utf-8’.
size (int, optional) – Size of the chunk to be read / written during decompression. Defaults to 1MiB.
Note
The output path is simply the
path
value with last file extension removed.In general cases, a file compressed using gzip will have a
.gz
extension appended onto the existing filename and extension. For example:data.txt.gz
.Note
Newline Characters:
When the decompressed file is written, the
newline
character is specified as''
, which enables ‘universal newline mode’, whereby the system’s newline character is used. However, the original line endings - those used in the compressed file - are written back to the decompressed file.This method is used to ensure the checksum hash on the original (unzipped) and decompressed file can be compared.
- Example:
Decompress a text file:
>>> from utils4 import utils >>> utils.gzip_decompress(path='/tmp/rand.txt.gz') True
- Returns:
True if the decompression was successful, otherwise False.
- Return type:
bool
- utils.ping(server: str, count: int = 1, timeout: int = 5, verbose: bool = False) bool [source]
Ping an IP address, server or web address.
- Parameters:
server (str) – IP address, server name or web address.
count (int, optional) – The number of ping attempts. Defaults to 1.
timeout (int, optional) – Number of seconds to wait for response. Defaults to 5.
verbose (bool, optional) – Display all stdout and/or stderr output, if the returned status code is non-zero. Defaults to False.
- Design:
Using the platform’s native
ping
command (via asubprocess
call) the host is pinged, and a boolean value is returned to the caller to indicate if the ping was successful.A ping status:
0 returns True
Non-zero returns False
If the server name is preceeded by
\\
or//
, these are stripped out using the built-inos.path.basename()
function.- Example:
Ping the local PC at 127.0.0.1:
>>> from utils4 import utils >>> utils.ping(server='127.0.0.1') True
Ping an unknown server:
>>> from utils4 import utils >>> utils.ping(server='//S3DHOST01', verbose=True) [PingError]: ping: S3DHOST01: Temporary failure in name resolution False
Ping an unreachable IP address:
>>> from utils4 import utils >>> utils.ping(server='192.168.0.99', count=3, verbose=True) [PingError]: PING 192.168.0.99 (192.168.0.99) 56(84) bytes of data. From 192.168.0.XX icmp_seq=1 Destination Host Unreachable From 192.168.0.XX icmp_seq=2 Destination Host Unreachable From 192.168.0.XX icmp_seq=3 Destination Host Unreachable --- 192.168.0.99 ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2037ms False
- Returns:
True if the ping was successful, otherwise False.
- Return type:
bool
- utils.testimport(module_name: str, verbose: bool = True) bool [source]
Test if a Python library is installed.
- Parameters:
module_name (str) – Exact name of the module to be found.
verbose (bool, optional) – Notify if the library is not installed. Defaults to True.
- Design:
This is a small helper function designed to test if a library is installed before trying to import it.
If the library is not intalled the user is notified, if the
verbose
argument is True.- Internal Use:
For example, the
getdrivername()
function uses this function before attempting to import thepyodbc
library.- Example:
Execute a path only if
mymodule
is installed:>>> from utils4 import utils >>> if utils.testimport('mymodule', verbose=True): >>> import mymodule >>> ... >>> else: >>> ...
- Returns:
True if the library is installed, otherwise False.
- Return type:
bool
- utils.unidecode(string: str, **kwargs) str [source]
Attempt to convert a Unicode string object into a 7-bit ASCII string.
- Parameters:
string (str) – The string to be decoded.
**kwargs (dict) – Keyword arguments passed directly into the underlying
unidecode.unidecode()
function.
- Design:
This function is a light wrapper around the
unidecode.unidecode()
function.Per the
unicode
docstring:“Transliterate an Unicode object into an ASCII string.”
Example:
>>> unidecode(u"北亰") "Bei Jing "
“This function first tries to convert the string using ASCII codec. If it fails (because of non-ASCII characters), it falls back to transliteration using the character tables.”
“This is approx. five times faster if the string only contains ASCII characters, but slightly slower than
unidecode.unicode_expect_nonascii()
if non-ASCII characters are present.”- Dependencies:
unidecode
library
- Example:
Convert a Polish address into pure ASCII:
>>> from utils4 import utils >>> addr = 'ul. Bałtów 8a 27-423 Bałtów, woj. świętokrzyskie' >>> utils.unidecode(addr) 'ul. Baltow 8a 27-423 Baltow, woj. swietokrzyskie'
Convert the first line of ‘The Seventh Letter’, by Plato:
>>> from utils4 import utils >>> text = 'Πλάτων τοῖς Δίωνος οἰκείοις τε καὶ ἑταίροις εὖ πράττειν.' >>> utils.unidecode(text) 'Platon tois Dionos oikeiois te kai etairois eu prattein.'
- Returns:
If the
unidecode
library is installed and the passedstring
value is astr
data type, the decoded string is returned, otherwise the original value is returned.- Return type:
str