StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

25 Nov 2024

We introduce STYLE DISTANCE, a novel method for training style representations that effectively separate writing style from content by leveraging a synthetic dataset of paraphrases with controlled style variations. Our approach achieves superior content-independent embeddings, validated through human and automated evaluations, and outperforms existing methods on real-world benchmarks.

Addressing Content Leakage in Style Representations

Traditional methods for developing style representations often encounter the issue of content leakage, where embeddings inadvertently capture content-specific information, compromising their effectiveness. To tackle this challenge, we introduce StyleDistance, a novel approach that leverages large language models to generate synthetic datasets comprising near-exact paraphrases with controlled stylistic variations. This methodology enables precise contrastive learning across 40 distinct style features, significantly enhancing the content-independence of style embeddings.

Key Contributions

Synthetic Dataset Generation: We created a dataset of near-exact paraphrases, each exhibiting controlled stylistic variations, facilitating precise contrastive learning.
Enhanced Content-Independence: By focusing on style features while controlling for content, StyleDistance achieves superior separation between style and content in embeddings.
Robust Evaluation: Our approach was rigorously evaluated through both human assessments and automated benchmarks, demonstrating its effectiveness in real-world applications.

Access the Model

For those interested in exploring or utilizing our model, it is available at styledistance
Synthetic Dataset is available at synthstel
Paper is available at ArXiv Link

StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

Addressing Content Leakage in Style Representations

Key Contributions

Access the Model

Jiacheng Zhu
朱家成

Error

Addressing Content Leakage in Style Representations

Key Contributions

Access the Model

Templates (for web app):

Error