dpipe.factories¶

class dpipe.factories.AugmentedDataset(dataset, gen_object=None, length=None, training=True)¶

Bases: object

Augments tf.data.Dataset to handle custom configurations

Parameters

dataset (tf.data.Dataset) – Instance of a tf.data.Dataset
length (int, optional) – length of the dataset, defaults to None
training (bool, optional) – defines training/validation flag. If True then the augmented dataset handles training configurations, and if False the augmented dataset handles validation configurations, defaults to True

batch(batch_size)¶

Make dataset batchs of specific batch size

Parameters: batch_size (int) – size of the batches

build()¶: Creates an augmented dataset that contains the arguments to be used in the method tf.keras.model.fit()

cache(filename='')¶

Defines the cache file to store previously loaded samples

Parameters: filename (str, optional) – File name of the file where the loaded samples are stored. The second access will be loaded from the cache, defaults to ‘’

enumerate(start=0)¶

As the build-in enumerate function creates an index next to the sample.

Parameters: start (int, optional) – start count of the enumeration, defaults to 0

filter(filter_fcn)¶

Applies a filter function to all the samples of dataset. Applies lazily.

Parameters: filter_fcn – funcion reference

map(map_func, num_parallel_calls=None)¶

Maps every sample in the dataset by a map function.

Parameters: map_func – function reference

parallelize_extraction(cycle_length=4, block_length=16, num_parallel_calls=- 1, read_fcn=None)¶

Generates a parallel consuming of items in the list of the original object given the reading function. For example reading files, or images.

Parameters

cycle_length – defaults to 4, read TF docs for more details.
block_length – defaults to 16, read TF docs for more details.
num_parallel_calls – defaults to Autotune, read TF docs for more details.
read_fcn – defaults to None, function to process one item in the dataset f(x) or f(x,y) options.

@return:

prefetch(buffer_size)¶

Preloads samples on the tensor flow session i.e. memory to be processed.abs($0)

Parameters: buffer_size (int) – size of the preloaded samples. If batch is specified then it loads a buffer_size of batches. For example, buffer_size=2 with batches of 100 will load 200 samples to the memory.

recompute_length()¶

Recompute the length of the datatase.

This may take long since all the samples must be accessed.

repeat(count=None)¶

Creates a concatenated repeated dataset_builder

Parameters: count (int) – Number of repreatitions

shuffle(buffer_size, seed=None, reshuffle_each_iteration=None)¶

class dpipe.factories.GeneratorBase(obj, getitem_fcn=None, itemlist_name=None, length=None)¶

Bases: object

Wraps an object with a method getitem to make it an iterable class

Parameters

obj (object) – Instance of object that access data
getitem_fcn (str) – Name of the method in the object to access data
length (int) – length of the dataset, if None then infers from len() function, defaults to None

send(ignored_arg)¶

throw(type=None, value=None, traceback=None)¶: Raise a StopIteration

dpipe.factories.from_function(read_fcn, list, training=True, undetermined_shape=None)¶

dpipe.factories.from_object(obj, getitem_fcn=None, itemlist_name=None, training=True, undetermined_shape=None)¶

Creates a tf.data.Dataset object with configuration parameters for fitting

Parameters

obj – Object instance of the data with ‘getitem_fcn’ function to access dataset
getitem_fcn (str, optional) – getitem_fcn Name of the method to access data . getitem_fcn can have any name defined for the in the class ‘obj’. If not specified infers ‘__getitem__’ as name of the access function
itemlist_name (str, optional) – name of the list containing samples on the object, if None name is “list”
training (bool, optional) – Specify training/validation flag
undetermined_shape (iterable, optional) – defines positions in the shape vector where dimensions are undetermided

Returns

An object tf.data.Dataset from the obj dataset

dpipe.datasets¶

dpipe.datasets.make_dataset(x_type, y_type, x_path=None, y_path=None, x_size=None, y_size=None, training=True, video_frames=None, video_cropping=None, one_hot_encoding=False)¶

Create custom dataset from a path list

Parameters

x_type – Defines the type of the input data to the model. It can be: label, video or image. The proper reading is generated accordingly.
y_type – Defines the type of the target data to the model, idem as above
x_path – Path to the dataset of inputs, the path is expected to contain images or videos sorted in a way that the name of the containing parent is the label, if label is relevant. For example cat/image1.png and dog/image2.png. All the files are indexed as individual samples.
y_path – Path to the dataset of targets, idem as above.
x_size – Size of the image or video for the input to the model
y_size – Size of the image or video for the target to the model
training (bool, optional) – Specify training/validation flag
video_frames – number of frames of the output video if data type is video
video_cropping – video cropping method creates a crop of the video with a length defined by video frames. Working modes are single and multi. single where the video will be just from the first frame to the number of video_frames defined; or the multi where the video is cropped sequences of clips with the number of frames defined by video_frames.
one_hot_encoding – Activate one hot encoding for the label input

Returns

Created dataset tf.data.Dataset with the pairs input and target (x,y)

dpipe.utils¶

dpipe.utils.create_label_dict(paths, one_hot_encoding=False)¶

Creates a label dictionaty from list of paths

Parameters: paths (list) – Path from where the label dictionary is created. The parent folder is considered as label for the images.
Returns: Dictionary of labels sorted alphabetically
Return type: dict

dpipe.utils.get_parent_path(path)¶

dpipe.utils.get_read_fcn(data_type, label_dict=None)¶

dpipe.utils.get_single_value(value, counter=0)¶

Recursively tracks a single element of value. It assumes all elements as of the same kind

Parameters: value – Value get a single value from.

dpipe.utils.get_tf_dtype(value)¶

Obtains tensorflow datatype

The available transformations are:

tf.float16: 16-bit half-precision floating-point.
tf.float32: 32-bit single-precision floating-point.
tf.float64: 64-bit double-precision floating-point.
tf.complex64: 64-bit single-precision complex.
tf.complex128: 128-bit double-precision complex.
tf.int8: 8-bit signed integer.
tf.uint8: 8-bit unsigned integer.
tf.uint16: 16-bit unsigned integer.
tf.uint32: 32-bit unsigned integer.
tf.uint64: 64-bit unsigned integer.
tf.int16: 16-bit signed integer.
tf.int32: 32-bit signed integer.
tf.int64: 64-bit signed integer.
tf.bool: Boolean.
tf.string: String.

Parameters: value – Value to identify class from.

dpipe.utils.get_tf_shape(value)¶

Obtains the shape of an variable

Parameters: value – input value can be numpy.ndarray, numeric or string class. It supports list of numerics or string but not nested lists.

dpipe.utils.get_video_length(path)¶

Reads the number of frames from the metadata of a video file

Parameters: path – path to the video file .
Returns: Number of frames extracted with ffprobe.
Return type: float64

dpipe.utils.is_iterable(value)¶

Verifies the value is an is_iterable

Parameters: value – value to identify if iterable or not.

dpipe.utils.is_supported_format(filename)¶

dpipe.factories¶

dpipe.datasets¶

dpipe.utils¶

Module contents¶