dpipe.factories¶
-
class
dpipe.factories.
AugmentedDataset
(dataset, gen_object=None, length=None, training=True)¶ Bases:
object
Augments
tf.data.Dataset
to handle custom configurations- Parameters
dataset (
tf.data.Dataset
) – Instance of atf.data.Dataset
length (int, optional) – length of the dataset, defaults to None
training (bool, optional) – defines training/validation flag. If True then the augmented dataset handles training configurations, and if False the augmented dataset handles validation configurations, defaults to True
-
batch
(batch_size)¶ Make dataset batchs of specific batch size
- Parameters
batch_size (int) – size of the batches
-
build
()¶ Creates an augmented dataset that contains the arguments to be used in the method
tf.keras.model.fit()
-
cache
(filename='')¶ Defines the cache file to store previously loaded samples
- Parameters
filename (str, optional) – File name of the file where the loaded samples are stored. The second access will be loaded from the cache, defaults to ‘’
-
enumerate
(start=0)¶ As the build-in enumerate function creates an index next to the sample.
- Parameters
start (int, optional) – start count of the enumeration, defaults to 0
-
filter
(filter_fcn)¶ Applies a filter function to all the samples of dataset. Applies lazily.
- Parameters
filter_fcn – funcion reference
-
map
(map_func, num_parallel_calls=None)¶ Maps every sample in the dataset by a map function.
- Parameters
map_func – function reference
-
parallelize_extraction
(cycle_length=4, block_length=16, num_parallel_calls=- 1, read_fcn=None)¶ Generates a parallel consuming of items in the list of the original object given the reading function. For example reading files, or images.
- Parameters
cycle_length – defaults to 4, read TF docs for more details.
block_length – defaults to 16, read TF docs for more details.
num_parallel_calls – defaults to Autotune, read TF docs for more details.
read_fcn – defaults to None, function to process one item in the dataset f(x) or f(x,y) options.
@return:
-
prefetch
(buffer_size)¶ Preloads samples on the tensor flow session i.e. memory to be processed.abs($0)
- Parameters
buffer_size (int) – size of the preloaded samples. If batch is specified then it loads a buffer_size of batches. For example, buffer_size=2 with batches of 100 will load 200 samples to the memory.
-
recompute_length
()¶ Recompute the length of the datatase.
This may take long since all the samples must be accessed.
-
repeat
(count=None)¶ Creates a concatenated repeated dataset_builder
- Parameters
count (int) – Number of repreatitions
-
shuffle
(buffer_size, seed=None, reshuffle_each_iteration=None)¶
-
class
dpipe.factories.
GeneratorBase
(obj, getitem_fcn=None, itemlist_name=None, length=None)¶ Bases:
object
Wraps an object with a method getitem to make it an iterable class
- Parameters
obj (
object
) – Instance of object that access datagetitem_fcn (str) – Name of the method in the object to access data
length (int) – length of the dataset, if None then infers from len() function, defaults to None
-
send
(ignored_arg)¶
-
throw
(type=None, value=None, traceback=None)¶ Raise a
StopIteration
-
dpipe.factories.
from_function
(read_fcn, list, training=True, undetermined_shape=None)¶
-
dpipe.factories.
from_object
(obj, getitem_fcn=None, itemlist_name=None, training=True, undetermined_shape=None)¶ Creates a tf.data.Dataset object with configuration parameters for fitting
- Parameters
obj – Object instance of the data with ‘getitem_fcn’ function to access dataset
getitem_fcn (str, optional) – getitem_fcn Name of the method to access data . getitem_fcn can have any name defined for the in the class ‘obj’. If not specified infers ‘__getitem__’ as name of the access function
itemlist_name (str, optional) – name of the list containing samples on the object, if None name is “list”
training (bool, optional) – Specify training/validation flag
undetermined_shape (iterable, optional) – defines positions in the shape vector where dimensions are undetermided
- Returns
An object
tf.data.Dataset
from the obj dataset
dpipe.datasets¶
-
dpipe.datasets.
make_dataset
(x_type, y_type, x_path=None, y_path=None, x_size=None, y_size=None, training=True, video_frames=None, video_cropping=None, one_hot_encoding=False)¶ Create custom dataset from a path list
- Parameters
x_type – Defines the type of the input data to the model. It can be: label, video or image. The proper reading is generated accordingly.
y_type – Defines the type of the target data to the model, idem as above
x_path – Path to the dataset of inputs, the path is expected to contain images or videos sorted in a way that the name of the containing parent is the label, if label is relevant. For example cat/image1.png and dog/image2.png. All the files are indexed as individual samples.
y_path – Path to the dataset of targets, idem as above.
x_size – Size of the image or video for the input to the model
y_size – Size of the image or video for the target to the model
training (bool, optional) – Specify training/validation flag
video_frames – number of frames of the output video if data type is video
video_cropping – video cropping method creates a crop of the video with a length defined by video frames. Working modes are single and multi. single where the video will be just from the first frame to the number of video_frames defined; or the multi where the video is cropped sequences of clips with the number of frames defined by video_frames.
one_hot_encoding – Activate one hot encoding for the label input
- Returns
Created dataset
tf.data.Dataset
with the pairs input and target (x,y)
dpipe.utils¶
-
dpipe.utils.
create_label_dict
(paths, one_hot_encoding=False)¶ Creates a label dictionaty from list of paths
- Parameters
paths (list) – Path from where the label dictionary is created. The parent folder is considered as label for the images.
- Returns
Dictionary of labels sorted alphabetically
- Return type
dict
-
dpipe.utils.
get_parent_path
(path)¶
-
dpipe.utils.
get_read_fcn
(data_type, label_dict=None)¶
-
dpipe.utils.
get_single_value
(value, counter=0)¶ Recursively tracks a single element of value. It assumes all elements as of the same kind
- Parameters
value – Value get a single value from.
-
dpipe.utils.
get_tf_dtype
(value)¶ Obtains tensorflow datatype
- The available transformations are:
tf.float16: 16-bit half-precision floating-point.
tf.float32: 32-bit single-precision floating-point.
tf.float64: 64-bit double-precision floating-point.
tf.complex64: 64-bit single-precision complex.
tf.complex128: 128-bit double-precision complex.
tf.int8: 8-bit signed integer.
tf.uint8: 8-bit unsigned integer.
tf.uint16: 16-bit unsigned integer.
tf.uint32: 32-bit unsigned integer.
tf.uint64: 64-bit unsigned integer.
tf.int16: 16-bit signed integer.
tf.int32: 32-bit signed integer.
tf.int64: 64-bit signed integer.
tf.bool: Boolean.
tf.string: String.
- Parameters
value – Value to identify class from.
-
dpipe.utils.
get_tf_shape
(value)¶ Obtains the shape of an variable
- Parameters
value – input value can be
numpy.ndarray
, numeric or string class. It supports list of numerics or string but not nested lists.
-
dpipe.utils.
get_video_length
(path)¶ Reads the number of frames from the metadata of a video file
- Parameters
path – path to the video file .
- Returns
Number of frames extracted with ffprobe.
- Return type
float64
-
dpipe.utils.
is_iterable
(value)¶ Verifies the value is an is_iterable
- Parameters
value – value to identify if iterable or not.
-
dpipe.utils.
is_supported_format
(filename)¶