Code
!pip install torchsampler
!pip install sacremoses
This project originally started as a school assignment for Big Data class. The notebook presented here demonstrates the use of a large language model (LLM) to tackle a binary classification problem. Specifically, our objective is to predict whether a comment will receive a response or not.
To achieve this, we use an enriched dataset compiled from comments on Le Soleil’s Facebook posts. I will also share a separate notebook detailing the process of building this dataset. Additionally, I plan to publish another post explaining how to utilize simple feedforward neural networks or statistical models based on various comment features or the comment text itself.
Let’s dive in!
I prefer to set aside the cell that install system dependency. It always produces a lot of useless gx3di3ce… You get it, right ?
!pip install torchsampler
!pip install sacremoses
Now, let’s import some packages to have fun !
import transformers, torch
import pandas as pd
import matplotlib.pyplot as plt
import multiprocessing as mp
import time
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from torch.utils.data import Dataset
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
from torchsampler import ImbalancedDatasetSampler
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from torch.optim.lr_scheduler import ReduceLROnPlateau
from IPython.display import clear_output
# warnings.filterwarnings('ignore')
= 'cuda' if torch.cuda.is_available() else 'cpu'
device # used to set the number of num_worker for Dataloader, usually the half of it mp.cpu_count()
And since I use Google colab, I mount my drive to load the datasets later.
# only if you are using Google colab of course...
from google.colab import drive
'/content/drive') drive.mount(
Mounted at /content/drive
In this section, we will explore the dataset used for our binary classification problem. The dataset has been divided into training and testing sets, with 70% of the data allocated for training and the remaining 30% for testing. This split ensures that we have a robust training set to build our model while retaining a sufficient portion of the data for evaluating the model’s performance.
Let’s load the train and the test sets.
= '/content/drive/MyDrive/DataSets/big_data/datasets' # specify here the path to the dataset
dirpath = pd.read_csv(dirpath + '/split/train_dataset.csv', index_col=0)
train = pd.read_csv(dirpath + '/split/valid_dataset.csv', index_col=0) test
A few statistics about our dataset. First of all, it contains almost a million rows.
= pd.concat([train, test])
dataset print(f'Dataset shape: {dataset.shape}')
Dataset shape: (935698, 68)
Secondly, the dataset is highly unbalanced. The following graph shows that there are only ~13% of comments with replies, by which I mean that these comments have received at least one comment.
'target'].value_counts(normalize=True) dataset[
target
False 0.876036
True 0.123964
Name: proportion, dtype: float64
'target'].value_counts().plot(kind='bar') dataset[
We have then two significant issues. Firstly, running the training on the entire dataset would be extremely time-consuming, even with substantial computational resources. Secondly, our dataset is unbalanced, which presents a challenge for accurate model training.
To address these issues, I opted for undersampling, ensuring an equal number of items from each class. This approach allows us to run our experiments more efficiently and mitigates the problem of data imbalance. Later in the notebook, we will explore the impact of the amount of data used on the model’s performance.
I write the CommentDataset class as a custom dataset designed for handling text data. It inherits from the Dataset class provided by PyTorch. This class is specifically tailored for tokenizing and preparing text data along with their corresponding labels for use in a model.
class CommentDataset(Dataset):
def __init__(self, message, labels, tokenizer):
self.message = message
self.labels = labels
self.tokenizer = tokenizer
def get_labels(self):
return self.labels
def __len__(self):
return len(self.message)
def __getitem__(self, idx):
= self.message[idx]
text = self.labels[idx]
label
= self.tokenizer.encode_plus(text, None, add_special_tokens=True, padding='max_length', return_token_type_ids=True, truncation=True)
inputs
return {
'input_ids': torch.tensor(inputs['input_ids'], dtype=torch.long),
'attention_mask': torch.tensor(inputs['attention_mask'], dtype=torch.long),
'token_type_ids': torch.tensor(inputs["token_type_ids"], dtype=torch.long),
'labels': torch.tensor(label, dtype=torch.float)
}
The train_model
method is designed to train a machine learning model using a provided training and testing dataloader, while tracking various performance metrics such as loss, accuracy, precision, recall, and a custom F2 score across multiple epochs, and implementing early stopping based on validation performance. The test_model
method evaluates the trained model on a validation dataset, computing and printing evaluation metrics to assess the model’s performance.
def train_model(model, train_dataloader, test_dataloader, history={}, num_epochs=5, lr=5e-5, early_stopping_patience=3, weight_decay=0.01):
= torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
model.to(device)= torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
optimizer = ReduceLROnPlateau(optimizer, mode='max', factor=0.1, patience=1) # ReduceLROnPlateau scheduler
scheduler = torch.nn.BCEWithLogitsLoss() # Binary Cross-Entropy Loss
loss_fn
'train_loss'] = []
history['train_accuracy'] = []
history['train_precision'] = []
history['train_recall'] = []
history['test_accuracy'] = []
history['test_precision'] = []
history['test_recall'] = []
history['epochs'] = []
history['test_loss'] = []
history['valid_score'] = []
history[= 0
best_valid_score = 0
early_stopping_counter
for epoch in range(num_epochs):
model.train()= 0.0
train_loss = []
train_preds = []
train_labels
# Training loop
for _, batch in enumerate(tqdm(train_dataloader, desc=f'Epoch {epoch + 1}/{num_epochs}')):
optimizer.zero_grad()
= batch['input_ids'].to(device)
input_ids = batch['attention_mask'].to(device)
attention_mask = batch['token_type_ids'].to(device)
token_type_ids = batch['labels'].to(device)
labels
= model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
outputs = outputs.logits.squeeze(1)
logits = loss_fn(logits, labels)
loss
loss.backward()
optimizer.step()
+= loss.item()
train_loss > 0.5).int().tolist())
train_preds.extend((logits
train_labels.extend(labels.tolist())
# Calculate metrics on training set
= accuracy_score(train_labels, train_preds)
train_accuracy = precision_score(train_labels, train_preds, average='binary')
train_precision = recall_score(train_labels, train_preds, average='binary')
train_recall
# Evaluation loop
eval()
model.= []
test_preds = []
test_labels = 0.0
test_loss
with torch.no_grad():
for batch in test_dataloader:
= batch['input_ids'].to(device)
input_ids = batch['attention_mask'].to(device)
attention_mask = batch['token_type_ids'].to(device)
token_type_ids = batch['labels'].to(device)
labels = model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
outputs = outputs.logits.squeeze(1)
logits = loss_fn(logits, labels)
loss
+= loss.item()
test_loss > 0.5).int().tolist())
test_preds.extend((logits
test_labels.extend(labels.tolist())
# Calculate metrics on test set
= accuracy_score(test_labels, test_preds)
test_accuracy = precision_score(test_labels, test_preds, average='binary')
test_precision = recall_score(test_labels, test_preds, average='binary')
test_recall = confusion_matrix(test_labels, test_preds).ravel()
tn, fp, fn, tp = (tp / (tp + fp + fn)) * 100
valid_score
# Update learning rate scheduler
scheduler.step(valid_score)
'epochs'].append(epoch + 1)
history[
'train_loss'].append(train_loss / len(train_dataloader))
history['train_accuracy'].append(train_accuracy)
history['train_precision'].append(train_precision)
history['train_recall'].append(train_recall)
history[
'test_loss'].append(test_loss / len(test_dataloader))
history['test_accuracy'].append(test_accuracy)
history['test_precision'].append(test_precision)
history['test_recall'].append(test_recall)
history['valid_score'].append(valid_score)
history[
print(f"Epoch {epoch + 1}/{num_epochs}:")
print(f" Train Loss: {train_loss / len(train_dataloader)}")
print(f" Test Loss: {test_loss / len(test_dataloader)}")
print(f" Train Accuracy: {train_accuracy}")
print(f" Train Precision: {train_precision}")
print(f" Train Recall: {train_recall}")
print(f" Test Accuracy: {test_accuracy}")
print(f" Test Precision: {test_precision}")
print(f" Test Recall: {test_recall}")
print(f" Test F2: {valid_score}")
# Early stopping
if valid_score > best_valid_score:
= valid_score
best_valid_score = 0
early_stopping_counter else:
+= 1
early_stopping_counter
if early_stopping_counter >= early_stopping_patience:
print("Early stopping triggered!")
break
def test_model(tokz, model, valid_data, history, device, bs = 16):
eval()
model.= []
test_preds = []
test_labels = 0.0
test_loss
= CommentDataset(valid_data[0].to_numpy(), valid_data[1].astype(int).to_numpy(), tokz)
valid_dataset = torch.utils.data.DataLoader(valid_dataset, batch_size=bs, shuffle=True)
test_dataloader = torch.nn.BCEWithLogitsLoss() # Binary Cross-Entropy Loss
loss_fn
with torch.no_grad():
for batch in test_dataloader:
= batch['input_ids'].to(device)
input_ids = batch['attention_mask'].to(device)
attention_mask = batch['token_type_ids'].to(device)
token_type_ids = batch['labels'].to(device)
labels = model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
outputs = outputs.logits.squeeze(1)
logits = loss_fn(logits, labels)
loss
+= loss.item()
test_loss > 0.5).int().tolist())
test_preds.extend((logits
test_labels.extend(labels.tolist())
= accuracy_score(test_labels, test_preds)
test_accuracy = precision_score(test_labels, test_preds)
test_precision = recall_score(test_labels, test_preds)
test_recall = confusion_matrix(test_labels, test_preds).ravel()
tn, fp, fn, tp 'valid_score'] = (tp / (tp + fp + fn)) * 100
history[
print("Test Metrics:")
print(f" Eval Accuracy: {test_accuracy}")
print(f" Eval Precision: {test_precision}")
print(f" Eval Recall: {test_recall}")
print(f" Eval F2: {history['valid_score']}")
The plot_history
method visualizes the training and testing metrics (loss, accuracy, precision, recall, and F2 score) over epochs using matplotlib. The evaluate_model
function assesses the model by optionally plotting the training history and running the test_model
function for evaluation metrics. The get_loader
function prepares the data loaders for training and testing datasets, including optional under-sampling, and sets up the tokenizer and model for sequence classification tasks. The equal_class_sampling
method ensures balanced class distribution by sampling an equal number of instances from each class in the dataset.
def plot_history(history):
=(17, 6))
plt.figure(figsize
= history['epochs']
epochs = history['train_loss']
train_losses = history['test_loss']
test_loss = history['train_accuracy']
train_accuracies = history['test_accuracy']
test_accuracies = history['train_precision']
train_precisions = history['test_precision']
test_precisions = history['train_recall']
train_recall = history['test_recall']
test_recall = history['valid_score']
valid_score
1, 5, 1)
plt.subplot(='Training Loss')
plt.plot(epochs, train_losses, label='Test Loss')
plt.plot(epochs, test_loss, label'Epochs')
plt.xlabel('Loss')
plt.ylabel('Training Loss')
plt.title(
1, 5, 2)
plt.subplot(='Training Accuracy')
plt.plot(epochs, train_accuracies, label='Test Accuracy')
plt.plot(epochs, test_accuracies, label'Epochs')
plt.xlabel('Accuracy')
plt.ylabel('Accuracy')
plt.title(
plt.legend()
1, 5, 3)
plt.subplot(='Training Precision')
plt.plot(epochs, train_precisions, label='Test Precision')
plt.plot(epochs, test_precisions, label'Epochs')
plt.xlabel('Precision')
plt.ylabel('Precision')
plt.title(
plt.legend()
1, 5, 4)
plt.subplot(='Training Recall')
plt.plot(epochs, train_recall, label='Test Recall')
plt.plot(epochs, test_recall, label'Epochs')
plt.xlabel('Recall')
plt.ylabel('Recall')
plt.title(
plt.legend()
1, 5, 5)
plt.subplot(='Training F2')
plt.plot(epochs, valid_score, label'Epochs')
plt.xlabel('F2')
plt.ylabel('F2')
plt.title(
plt.legend()
plt.tight_layout()
plt.show()
def evaluate_model(tokz, model, valid_data, history, device, bs = 16, plot_train=True):
if plot_train:
plot_history(history)= bs)
test_model(tokz, model, valid_data, history, device, bs
def get_loader(model_nm, dataset, bs = 100, under_sample=True, num_class = 1, use_pad_token=True, use_special_pad_token=False, num_workers=2):
= dataset['X_train'], dataset['y_train'], dataset['X_test'], dataset['y_test']
X_train, y_train, X_test, y_test = AutoTokenizer.from_pretrained(model_nm)
tokz
= AutoModelForSequenceClassification.from_pretrained(model_nm, num_labels = num_class)
model
if len(X_train) == 0:
return model, tokz, None, None
if use_pad_token:
= tokz.eos_token
tokz.pad_token if use_special_pad_token:
'pad_token': '[PAD]'})
tokz.add_special_tokens({
len(tokz))
model.resize_token_embeddings(
= CommentDataset(X_train.to_numpy(), y_train.astype(int).to_numpy(), tokz)
train_dataset = CommentDataset(X_test.to_numpy(), y_test.astype(int).to_numpy(), tokz)
test_dataset
if under_sample:
= torch.utils.data.DataLoader(train_dataset, sampler=ImbalancedDatasetSampler(train_dataset), batch_size=bs, num_workers=num_workers, pin_memory=True)
train_loader = torch.utils.data.DataLoader(test_dataset, shuffle=True, batch_size=bs, num_workers=num_workers, pin_memory=True)
test_loader else:
= torch.utils.data.DataLoader(train_dataset, batch_size=bs, shuffle=True, num_workers=num_workers, pin_memory=True)
train_loader = torch.utils.data.DataLoader(test_dataset, batch_size=bs, shuffle=True, num_workers=num_workers, pin_memory=True)
test_loader
return model, tokz, train_loader, test_loader
def equal_class_sampling(input_features, target_labels, num_samples):
= len(target_labels.unique())
num_classes = num_samples // num_classes
num_samples_per_class = pd.DataFrame({'input': input_features, 'target': target_labels})
dataset = dataset.groupby(['target'])
grouped = grouped.apply(lambda x: x.sample(min(num_samples_per_class, len(x))))
sampled_elements return sampled_elements['input'], sampled_elements['target']
To evaluate our model, we will primarily use recall and a custom metric that we will call F2. Recall measures the ability of the model to correctly identify all relevant instances (true positives) from the dataset and is calculated as TP/(TP + FN), where TP stands for true positives and FN stands for false negatives.
The custom metric, F2, is designed to provide a more comprehensive evaluation of the model’s performance by balancing the detection of the positive class and minimizing errors. The F2 score is calculated as TP/(TP + FN + FP), where FP stands for false positives. This metric helps evaluate the model’s capacity to detect the positive class correctly while accounting for both false negatives and false positives. By considering both types of errors, the F2 metric ensures that the model is not only identifying positive instances accurately but also minimizing the incorrect classification of negative instances as positive. This balanced approach provides a more nuanced assessment of the model’s overall effectiveness.
For simplicity’s sake, I’ll use the distill version of Camembert model here, but you’re free to use any of the models below.
= {
models 'bert': "bert-base-uncased",
'gpt': "distilgpt2",
'flau': "flaubert/flaubert_base_uncased",
'cmb': "cmarkea/distilcamembert-base",
}
The next code might look a bit confusing at first, but let’s break it down step by step.
What we’re doing here is creating smaller subsets from our original training and test sets to build training, validation and test samples.
# Grab the 'message' and 'target' columns from the training set and store them in X_train and y_train
= train['message'], train['target'] X_train, y_train
Next, we split off a small portion of the original test set to use as our validation set. This is like keeping a small piece of pie aside before sharing the rest.
= train_test_split(test['message'], test['target'], test_size=0.95, random_state=42)
X_valid_sample, X_valid, y_valid_sample, y_valid X_valid_sample.shape, X_valid.shape, y_valid_sample.shape, y_valid.shape
((9357,), (177783,), (9357,), (177783,))
Then, we balance our training data. Imagine we have 6000 rows, and we want to make sure we have an equal number of positive and negative samples—3000 of each.
= equal_class_sampling(X_train, y_train, 6000) X_train_sample, y_train_sample
Finally, we take another small slice of the test set to build our final test sample. Think of this as taking a tiny bit more of that pie for a taste test.
# Split the remaining validation set to create a small test sample (2% of X_valid and y_valid)
= train_test_split(X_valid, y_valid, test_size=0.02, random_state=42)
_, X_test_sample, _, y_test_sample X_test_sample.shape
(3556,)
By doing this, we ensure our model has balanced and representative data for training, validation, and final testing.
Now, we train, we validate and evaluate the model on the test set.
= {} # Initialize an empty dictionary to store training and evaluation history.
history = 16 # Define the batch size for data loaders.
BATCH_SIZE = 1e-4 # Set the learning rate for the optimizer.
LEARNING_RATE = 1e-2 # Set the weight decay (L2 regularization) for the optimizer.
weight_decay = 10 # Set the number of epochs for training.
EPOCHS
# Prepare the dataset dictionary with training and testing samples.
= {'X_train': X_train_sample, 'y_train': y_train_sample, 'X_test': X_test_sample, 'y_test': y_test_sample}
data
# Get the model, tokenizer, training data loader, and testing data loader.
= get_loader(models['cmb'], data, bs=BATCH_SIZE, use_special_pad_token=True, num_workers=8)
model, tokz, train_loader, test_loader
# Move the model to the specified device (CPU or GPU).
model.to(device)
= time.time() # Record the start time for training.
start_time =EPOCHS, lr=LEARNING_RATE, early_stopping_patience=2, weight_decay=weight_decay) # Train the model.
train_model(model, train_loader, test_loader, history, num_epochs= time.time() # Record the end time for training.
end_time = end_time - start_time # Calculate the execution time for training.
execution_time
# Clear the output (useful in Jupyter notebooks to clear previous outputs).
clear_output() print("Execution time:", execution_time, "seconds") # Print the execution time for training.
= time.time() # Record the start time for evaluation.
start_time # Evaluate the model on the validation dataset and optionally plot the training history.
= BATCH_SIZE * 2, plot_train=True)
evaluate_model(tokz, model, (X_valid_sample, y_valid_sample), history, device, bs = time.time() # Record the end time for evaluation.
end_time = end_time - start_time # Calculate the execution time for evaluation.
execution_time print("Execution time:", execution_time, "seconds") # Print the execution time for evaluation.
Execution time: 363.3319444656372 seconds
Test Metrics:
Eval Accuracy: 0.6475366036122688
Eval Precision: 0.23312101910828026
Eval Recall: 0.7605985037406484
Eval F2: 21.7184903868977
Execution time: 84.7105667591095 seconds
Based on the performance of the model on two epochs, we can make the following analysis:
This analysis suggest that there is a problem with our model, because we need a model that should perform well on unseen data with low errors.
Well, imagine that we will deploy our model in a real-world application. We don’t want to miss comments that might receive a response because we could use them to increase traffic on our site or social media. In that case, a model that detects positive instances well with minimal false positives is acceptable. However, our model currently has many false positives, which can be problematic.
But, let’s try with more data in our training set to see the impact on the model performance. We will initialise a new model and train it on 10000 comments.
= train['message'], train['target']
X_train, y_train = equal_class_sampling(X_train, y_train, 10000)
X_train_sample, y_train_sample = train_test_split(X_valid, y_valid, test_size=0.02, random_state=42) _, X_test_sample, _, y_test_sample
= {}
history = 16
BATCH_SIZE = 1e-4
LEARNING_RATE = 1e-2
weight_decay = 10
EPOCHS = {'X_train': X_train_sample, 'y_train': y_train_sample, 'X_test': X_test_sample, 'y_test': y_test_sample}
data = get_loader(models['cmb'], data, bs=BATCH_SIZE, use_special_pad_token=True, num_workers=8)
model, tokz, train_loader, test_loader
model.to(device)
= time.time()
start_time =EPOCHS, lr=LEARNING_RATE, early_stopping_patience=2, weight_decay=weight_decay)
train_model(model, train_loader, test_loader, history, num_epochs= time.time()
end_time = end_time - start_time
execution_time
clear_output()print("Execution time:", execution_time, "seconds")
= time.time()
start_time = BATCH_SIZE * 2, plot_train=True)
evaluate_model(tokz, model, (X_valid_sample, y_valid_sample), history, device, bs = time.time()
end_time = end_time - start_time
execution_time print("Execution time:", execution_time, "seconds")
Execution time: 1122.6102643013 seconds
Test Metrics:
Eval Accuracy: 0.6856898578604254
Eval Precision: 0.25793871866295265
Eval Recall: 0.769742310889443
Eval F2: 23.946211533488494
Execution time: 84.70709776878357 seconds
The F2 score improved from 21% to 23%. But can we conclude that it’s the increased training set that induces these results?
Let’s try it with a larger training set.
= train['message'], train['target']
X_train, y_train = equal_class_sampling(X_train, y_train, 15000)
X_train_sample, y_train_sample = train_test_split(X_valid, y_valid, test_size=0.02, random_state=42) _, X_test_sample, _, y_test_sample
= {}
history = 32
BATCH_SIZE = 1e-4
LEARNING_RATE = 1e-2
weight_decay = 10
EPOCHS = {'X_train': X_train_sample, 'y_train': y_train_sample, 'X_test': X_test_sample, 'y_test': y_test_sample}
data = get_loader(models['cmb'], data, bs=BATCH_SIZE, use_special_pad_token=True, num_workers=8)
model, tokz, train_loader, test_loader
model.to(device)
= time.time()
start_time =EPOCHS, lr=LEARNING_RATE, early_stopping_patience=2, weight_decay=weight_decay)
train_model(model, train_loader, test_loader, history, num_epochs= time.time()
end_time = end_time - start_time
execution_time
clear_output()print("Execution time:", execution_time, "seconds")
= time.time()
start_time = BATCH_SIZE * 2, plot_train=True)
evaluate_model(tokz, model, (X_valid_sample, y_valid_sample), history, device, bs = time.time()
end_time = end_time - start_time
execution_time print("Execution time:", execution_time, "seconds")
Execution time: 1914.5363965034485 seconds
Test Metrics:
Eval Accuracy: 0.7079192048733568
Eval Precision: 0.26705237515225333
Eval Recall: 0.7290108063175395
Eval F2: 24.293628808864266
Execution time: 85.33442664146423 seconds
On test set, the F2 score of the model improve again from 23% to 24%.
The increase in the training set size helps the model generalize better to the test data, reducing overfitting and improving its ability to balance precision and recall effectively.
Dataset Analysis is Crucial: Always begin by analyzing your dataset. Understanding the distribution and characteristics of your data helps in making informed decisions about model training and evaluation. In scenarios with limited resources, generating more data for the underrepresented class might not be feasible. Instead, sampling equal numbers of comments from each class for the training set can help reduce bias towards the overrepresented class.
Balanced Training, Unbalanced Validation: While balancing the training set by equal sampling is important to reduce bias, the validation set should remain unbalanced. This approach ensures that the model’s performance is evaluated in a realistic manner, reflecting its ability to generalize to the true distribution of the data.
Resource-Based Training Strategy: Define your training strategy based on the available resources. When computational power or time is limited, working with a smaller, balanced sample of the dataset is a practical approach. This allows for iterative experimentation and tuning without the overhead of processing the entire dataset.
Problem-Specific Metrics: Choose evaluation metrics that align with your problem’s objectives. For instance, in this scenario, the F2 score (F2 = tp / (tp + 2 * fn + fp)) is used to evaluate model performance by balancing the detection of the positive class and minimizing errors.
Initial Model Performance: After the first round of training, the F2 score indicates that the model prioritizes recall over precision. This is evidenced by the high recall but low precision on the test set.
Impact of False Positives: High false positives can be problematic in real-world applications. They can lead to inefficient use of resources and missed opportunities to engage with truly relevant comments. This highlights the need for a balance between precision and recall.
Training Set Size and Generalization: Increasing the size of the training set helps the model generalize better to the test data. A larger training set reduces overfitting and enhances the model’s ability to balance precision and recall effectively. This results in improved overall performance and more reliable predictions.
Choosing the Right Model: Select a model that is suitable for your specific problem. For instance, since the dataset consists of French text, using CamemBERT, a model specifically designed for the French language, is an appropriate choice.
Hyperparameter Tuning: Finding the optimal hyperparameters for your model is crucial and often involves extensive experimentation. Before finalizing the model, numerous combinations were tested to identify the best-performing configuration. Hyperparameter tuning is more of an art than a strict recipe, requiring intuition and experience to achieve the best results.