TorchQL’s Table#

class torchql.Table(samples, id_index: dict | None = None, transform=None, disable=False)#

A Table is a collection of samples. It is specifically designed to be queried from within a database. It is a subclass of torch.utils.data.Dataset, so it can be used as a dataset.

batch(size, shuffle, batch_size=0, disable=False) Table#

Batch this table.

Args:

size (int): The size of the batches.

random (bool): Whether to shuffle the records before batching.

Returns:

A new table with the batches.

filter(cond: Callable[[...], bool], batch_size=0, disable=False) Table#

Filter this table by a condition.

Args:

cond (Callable): The condition to filter by.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the filtered records.

flatten(batch_size=0, disable=False) Table#

Flatten this table. If the records of this table are lists, the records of the new table will be the elements of the lists.

Args:
batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with flattened records

group_by(key: Callable[[...], Any], batch_size=0, disable=False) Table#

Group this table by a key. Records are grouped by the key function. The key function should return a hashable value. The records of the new table will be tuples of the key and a list of the records that have that key.

Args:
key (Callable): The key to group by.

Must return a hashable value.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the grouped records.

group_by_with_index(key: Callable[[...], Any], batch_size=0, disable=False) Table#

Group this table by a key. Records are grouped by the key function. The key function should return a hashable value. The records of the new table will be tuples of the key and a list of the records that have that key.

Args:
key (Callable): The key to group by.

Must return a hashable value.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the grouped records.

group_reduce(key: Callable[[...], Any], reduction: Callable[[...], Any], batch_size=0, disable=False) Table#

Group this table by a key and reduce the records of each group using a reduction function.

Args:
key (Callable): The key to group by.

Must return a hashable value.

reduction (Callable): The reduction function that takes in the records of each group.

Returns:

A new table with the grouped and reduced records.

head(n=10, print_id=False)#

Get the first n records of this table.

Args:

n (int, optional): The number of records to get. Defaults to 10.

print_id (bool, optional): Whether to print the id of each row. Defaults to False.

Returns:

A table with the first n records of this table.

intersect(table: Dataset, batch_size=0, disable=False) Table#

Intersect this table with another table. Common records between this and other table will be used to create a new table. Common records are identified by the id of the row. The columns of the other table will be used in the new table.

Args:

table (Dataset): The table to intersect with.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the common records.

join(table: Dataset, key=None, fkey=None, batch_size=0, disable=False) Table#

Join this table with another table. Records are joined if the key function returns the same value for both tables. If no key function is provided, the index is used.

Args:

table (Dataset): The table to join with.

key (Callable, optional): The key function to use for this table. Defaults to None.

Must take a set of columns from a row as input and return a hashable value that serves as a key.

fkey (Callable, optional): The foreign key function to use for the other table. Defaults to None.

Must take a set of columns from a row as input and return a hashable value that serves as a key.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the joined records.

order_by(key: Callable[[...], Any], reverse=False, batch_size=0, disable=False) Table#

Order this table by a key.

Args:

key (Callable): The key to order by.

reverse (bool, optional): Whether to reverse the order. Defaults to False.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the ordered records.

project(cols: Callable[[...], list], batch_size=0, disable=False) Table#

Select or perform an operation on the columns of this table.

Args:

cols (Callable): A function that takes the columns of this table as arguments and returns a list of the projected columns.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the projected columns.

reduce(reduction: Callable[[...], Any], batch_size=0, disable=False) Table#

Reduce the records of this table using a reduction function. This function operates over all the records of the table as opposed to each row individually.

Args:

reduction (Callable): The reduction function that takes in the records of the table.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the reduced records.

sample(print_id=False)#

Get a random row of this table.

Args:

print_id (bool, optional): Whether to print the id of the row. Defaults to False.

Returns:

A random row of this table.

sample_many(n=10, print_id=False)#

Get n random records of this table.

Args:

n (int, optional): The number of records to get. Defaults to 10.

print_id (bool, optional): Whether to print the id of each row. Defaults to False.

Returns:

A list of n random records of this table.

transform(transform) Table#

Register a PyTorch Transform to apply to this table.

Args:

transform (Callable): The PyTorch transform to apply to the records of this table.

Returns:

A new table with the transformed records.

union(table: Dataset, batch_size=0, disable=False) Table#

Union this table with another table. Records of the other table will be added to the bottom of the table.

Args:

table (Dataset): The table to union with.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the combined records.

unique(batch_size=0, disable=False) Table#

Select the unique records of this table.

Args:
batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new table with the unique records.