TorchQL’s Query#

class torchql.Query(name, base=None)#

A query object that can be used to build a query pipeline and execute it over a database.

base(tablename: str) Query#

Set the base table over which the query pipeline will operate. This will simply reset the base, but maintain the rest of the operations in the pipeline. This is useful for when one wants to run the same query over different tables.

Args:

tablename (str): The name of the table to set as the base.

Returns:

A new query object with the base table set.

cols(cols: Callable[[...], List], batch_size=0, disable=False) Query#

Select columns or perform a function on columns of the Table. This will register a projection operation to the pipeline.

Args:

cols (Callable[…, List]): A function that takes a row as input and returns a list of columns to select.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the projection operation registered.

Deprecated since version 0.0.1: Use Query.project instead

filter(cond: Callable[[...], bool], batch_size=0, disable=False) Query#

Filter Records based on a condition. This will register a filter operation to the pipeline.

Args:
cond (Callable[…, bool]): The condition to filter records on.

Must take a set of columns from a row as input and return a boolean.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the filter operation registered.

flatten(batch_size=0, disable=False) Query#

Flatten the result of the query. This will register a flatten operation to the pipeline. This applies when a Table’s records are iterables. This flattens each iterable into a set of records for the table.

Args:
batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the flatten operation registered.

forward(database: Database, **kwargs) Table#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

group_by(key: Callable[[...], Any], batch_size=0, disable=False) Query#

Group the records of the table by a key. Results in a Table with two columns: the key and a list of records that share the key.

Args:
key (Callable[…, Any]): The key to group records by.

Must take a set of columns from a row as input and return a value that serves as a key. Records are grouped if the result of the key function is equal.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the group_by operation registered.

group_by_with_index(key: Callable[[...], Any], batch_size=0, disable=False) Query#

Group the records of the table by a key. Results in a Table with two columns: the key and a list of records that share the key.

Args:
key (Callable[…, Any]): The key to group records by.

Must take a set of columns from a row as input and return a value that serves as a key. Records are grouped if the result of the key function is equal.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the group_by_with_index operation registered.

group_reduce(key: Callable[[...], Any], reduction: Callable[[...], Any], batch_size=0, disable=False) Query#

Group the records of the table by a key and reduce each group using a reduction function. Results in a Table with two columns: the key and the result of the reduction function on the group.

Args:
key (Callable[…, Any]): The key to group records by.

Must take a set of columns from a row as input and return a value that serves as a key. Records are grouped if the result of the key function is equal.

reduction (Callable[…, Any]): The reduction to apply to the records.

Must take a set of columns from a row as input and return a value.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the group_reduce operation registered.

intersect(tablename: str, batch_size=0, disable=False) Query#

Intersect a table. This will register a intersect operation to the pipeline. Records that are common to both tables will be the result.

Args:

tablename (str): The name of the table to intersect

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the intersect operation registered.

join(tablename: str, key=None, fkey=None, batch_size=0, disable=False) Query#

Join a table. This will register a join operation to the pipeline. Records from the query table and the table to join will be joined on the key and foreign key, respectively. If no key or foreign key is provided, the index is used. Otherwise, the key and foreign key must be functions that take a set of columns from a row as input and return hashable values that serve as a key. Records are joined if the result of the key and foreign key functions are equal.

Args:

tablename (str): The name of the table to join

key (Callable[…, Any]): The key to join on. Defaults to None, in which case the index is used.

Must take a set of columns from a row as input and return a hashable value that serves as a key.

fkey (Callable[…, Any]): The foreign key to join on. Defaults to None, in which case the index is used.

Must take a set of columns from a row as input and return a hashable value that serves as a key.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the join operation registered.

order_by(key: Callable[[...], Any], reverse: bool = False, batch_size=0, disable=False) Query#

Order the records of the table by a key.

Args:

key (Callable[…, Any]): The key to order records by.

reverse (bool): Whether to reverse the order (order in the descending order). Defaults to False.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the order_by operation registered.

project(cols: Callable[[...], List], batch_size=0, disable=False) Query#

Select columns or perform a function on columns of the Table. This will register a projection operation to the pipeline.

Args:

project (Callable[…, List]): A function that takes a row as input and returns a list of columns to select.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the projection operation registered.

reduce(reduction: Callable[[...], Any]) Query#

Reduce the records of the table using a reduction function.

Args:
reduction (Callable[…, Any]): The reduction to apply to the records.

Must take a set of columns from a row as input and return a value.

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the reduction operation registered.

register(tablename: str) Query#

Deprecated since version 0.0.1: Use Query.base instead

rename(name: str) Query#

Clone the query object.

Args:

name (str): The name of the new query object.

Returns:

A new query object with the same pipeline as the original.

run(database: Database, **kwargs) Table#

Execute the query pipeline over a database.

Args:

database (Database): The database to execute the query over.

**kwargs: Key-word arguments to be passed to individual operations in the query.

These key-word arguments will override the options set while defining the query operations.

Returns:

The table object with the result of the query stored in the results attribute.

union(tablename: str, batch_size=0, disable=False) Query#

Union a table. This will register a union operation to the pipeline. Records of the other table will be added to the bottom of the table.

Args:

tablename (str): The name of the table to union

batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the union operation registered.

unique(batch_size=0, disable=False) Query#

Return the set of unique records of the table.

Args:
batch_size (int): The batch size to enable batch-processing of the query. Note that a batch size >= 1 assumes

your supplied functions run on batches of records as opposed to a single record.

disable (boolean): A flag that disables progress bars if set to True.

Returns:

A new query object with the unique operation registered.