module great_start
class GReaTStart
Abstract super class GReaT Start
GReaT Start creates tokens to start the generation process.
Attributes:
tokenizer
(AutoTokenizer): Tokenizer, automatically downloaded from llm-checkpoint
method GReaTStart.__init__
__init__(tokenizer)
Initializes the super class.
Args:
tokenizer
: Tokenizer from the HuggingFace library
method GReaTStart.get_start_tokens
get_start_tokens(n_samples: int) → List[List[int]]
Get Start Tokens
Creates starting points for the generation process
Args:
n_samples
: Number of start prompts to create
Returns: List of n_sample lists with tokens
class CategoricalStart
Categorical Starting Feature
A categorical column with its categories is used as starting point.
Attributes:
start_col
(str): Name of the categorical columnpopulation
(list[str]): Possible values the column can takeweights
(list[float]): Probabilities for the individual categories
method CategoricalStart.__init__
__init__(tokenizer, start_col: str, start_col_dist: dict)
Initializes the Categorical Start
Args:
tokenizer
: Tokenizer from the HuggingFace librarystart_col
: Name of the categorical columnstart_col_dist
: Distribution of the categorical column (dict of form {"Cat A": 0.8, "Cat B": 0.2})
method CategoricalStart.get_start_tokens
get_start_tokens(n_samples)
class ContinuousStart
Continuous Starting Feature
A continuous column with some noise is used as starting point.
Attributes:
start_col
(str): Name of the continuous columnstart_col_dist
(list[float]): The continuous column from the train data setnoise
(float): Size of noise that is added to each valuedecimal_places
(int): Number of decimal places the continuous values have
method ContinuousStart.__init__
__init__(
tokenizer,
start_col: str,
start_col_dist: List[float],
noise: float = 0.01,
decimal_places: int = 5
)
Initializes the Continuous Start
Args:
tokenizer
: Tokenizer from the HuggingFace librarystart_col
: Name of the continuous columnstart_col_dist
: The continuous column from the train data setnoise
: Size of noise that is added to each valuedecimal_places
: Number of decimal places the continuous values have
method ContinuousStart.get_start_tokens
get_start_tokens(n_samples)
class RandomStart
Random Starting Features
Random column names are used as start point. Can be used if no distribution of any column is known.
Attributes:
all_columns
(List[str]): Names of all columns
method RandomStart.__init__
__init__(tokenizer, all_columns: List[str])
Initializes the Random Start
Args:
tokenizer
: Tokenizer from the HuggingFace libraryall_columns
: Names of all columns
method RandomStart.get_start_tokens
get_start_tokens(n_samples)
This file was automatically generated via lazydocs.