`module` `great_start`

`class` `GReaTStart`

Abstract super class GReaT Start

GReaT Start creates tokens to start the generation process.

Attributes:

tokenizer (AutoTokenizer): Tokenizer, automatically downloaded from llm-checkpoint

`method` `GReaTStart.init`

__init__(tokenizer)

Initializes the super class.

Args:

tokenizer: Tokenizer from the HuggingFace library

`method` `GReaTStart.get_start_tokens`

get_start_tokens(n_samples: int) → List[List[int]]

Get Start Tokens

Creates starting points for the generation process

Args:

n_samples: Number of start prompts to create

Returns: List of n_sample lists with tokens

`class` `CategoricalStart`

Categorical Starting Feature

A categorical column with its categories is used as starting point.

Attributes:

start_col (str): Name of the categorical column
population (list[str]): Possible values the column can take
weights (list[float]): Probabilities for the individual categories

`method` `CategoricalStart.init`

__init__(tokenizer, start_col: str, start_col_dist: dict)

Initializes the Categorical Start

Args:

tokenizer: Tokenizer from the HuggingFace library
start_col: Name of the categorical column
start_col_dist: Distribution of the categorical column (dict of form {"Cat A": 0.8, "Cat B": 0.2})

`method` `CategoricalStart.get_start_tokens`

get_start_tokens(n_samples)

`class` `ContinuousStart`

Continuous Starting Feature

A continuous column with some noise is used as starting point.

Attributes:

start_col (str): Name of the continuous column
start_col_dist (list[float]): The continuous column from the train data set
noise (float): Size of noise that is added to each value
decimal_places (int): Number of decimal places the continuous values have

`method` `ContinuousStart.init`

__init__(
    tokenizer,
    start_col: str,
    start_col_dist: List[float],
    noise: float = 0.01,
    decimal_places: int = 5
)

Initializes the Continuous Start

Args:

tokenizer: Tokenizer from the HuggingFace library
start_col: Name of the continuous column
start_col_dist: The continuous column from the train data set
noise: Size of noise that is added to each value
decimal_places: Number of decimal places the continuous values have

`method` `ContinuousStart.get_start_tokens`

get_start_tokens(n_samples)

`class` `RandomStart`

Random Starting Features

Random column names are used as start point. Can be used if no distribution of any column is known.

Attributes:

all_columns (List[str]): Names of all columns

`method` `RandomStart.init`

__init__(tokenizer, all_columns: List[str])

Initializes the Random Start

Args:

tokenizer: Tokenizer from the HuggingFace library
all_columns: Names of all columns

`method` `RandomStart.get_start_tokens`

get_start_tokens(n_samples)

This file was automatically generated via lazydocs.

module great_start

class GReaTStart

method GReaTStart.__init__

method GReaTStart.get_start_tokens

class CategoricalStart

method CategoricalStart.__init__

method CategoricalStart.get_start_tokens

class ContinuousStart

method ContinuousStart.__init__

method ContinuousStart.get_start_tokens

class RandomStart

method RandomStart.__init__

method RandomStart.get_start_tokens

`module` `great_start`

`class` `GReaTStart`

`method` `GReaTStart.init`

`method` `GReaTStart.get_start_tokens`

`class` `CategoricalStart`

`method` `CategoricalStart.init`

`method` `CategoricalStart.get_start_tokens`

`class` `ContinuousStart`

`method` `ContinuousStart.init`

`method` `ContinuousStart.get_start_tokens`

`class` `RandomStart`

`method` `RandomStart.init`

`method` `RandomStart.get_start_tokens`