CLIP
DEPOTx

Devulgarization of Polish Texts

DEPOTx is a text style transfer framework for replacing vulgar expressions in Polish utterances with their non-vulgar equivalents while preserving the main characteristics of the text. The framework contains three pre-trained language models (GPT-2, GPT-3 and T-5) trained on a newly created parallel corpus of sentences containing vulgar expressions and their equivalents. The resulting models are evaluated by checking style transfer accuracy, content preservation and language quality.

Download

Requirements

Required packages can be installed by running:

pip3 install -r requirements.txt

Additionally, in order to run evaluation scripts, please install Przetak and update the value of the GOPATH variable in evaluation/__init__.py file.

Usage

The notebooks/ directory contains examples of inference and evaluation.

Evaluation might be run from the command line:

python3 -m evaluation -o <original texts> -t <transfered texts>

Training details

All models have been trained using AdamW optimizer using NVidia P100 GPU.

Following hyperparameter values have been used to fine-tune the models:

GPT-2

GPT-3

T-5 base

1st step:

2nd step:

T-5 large

1st step:

2nd step:

Evaluation

The performance of the models was assessed in three categories using automatic metrics.

Models were evaluated against two baselines:

Overall performance of the models was assessed using geometric mean of CS, STA and PPL scores.

Results

Method

STA

CS

WO

BLEU

PPL

GM

Duplicate

0.38

1.00

1.00

1.00

146.86

1.78

Delete

1.00

0.93

0.84

0.92

246.80

4.14

GPT-2

0.90

0.86

0.71

0.86

258.44

3.71

GPT-3

0.88

0.92

0.79

0.92

359.12

3.58

T-5 base

0.90

0.97

0.85

0.95

187.03

4.10

T-5 large

0.93

0.97

0.86

0.95

170.02

4.31

Licence

CC-BY-NC 4.0

Citation

Klamra C., Wojdyga G., Żurowski S., Rosalska P., Kozłowska M., Ogrodniczuk M. (2022). Devulgarization of Polish Texts Using Pre-trained Language Models. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. Lecture Notes in Computer Science, vol. 13351, pp. 49--55. Springer, Cham.

last edited 2022-07-15 20:30:32 by MaciejOgrodniczuk