← Back to Projects

WordSignature

A personal stylometric project that analyzes my writing patterns through word frequencies and linguistic features. By tracking these patterns over time, I aim to understand how my writing style evolves and potentially develop an authorship checker. Inspired by authorship analysis of The Federalist Papers (Kosuke Imai-Quantitative Social Science)

Python NLTK Chart.js python-docx

Methodology

Data Collection

  • Process writing samples from .txt and .docx files
  • Text preprocessing: lowercase conversion, tokenization, and stop word removal (NLTK standard + modal verbs, discourse markers, adverbs like 'usually', 'generally', 'typically', and common function words)

Analysis Techniques

  • Word frequency analysis and distribution
  • Readability metrics (Flesch Reading Ease, Gunning Fog Index)
  • Style analysis (vocabulary sophistication, sentence variety)
  • Sentiment analysis (polarity and subjectivity)

Document Analysis

⚠️ Flash Attention

One may analyse their own documents by clicking the 'Quick Analysis' tab

Analysis of some assorted writing samples over time. These results may be periodically updated as new documents are added to the corpus. The notes for a sociology exam feature prominently

15,432
Total Words Analyzed
2,847
Unique Words
18.6
Avg. Words per Sentence

Most Frequent Terms

In future I may add more writing samples and try to add more features....possibly more analysis techniques too

In an ideal world i would be able to

  • Implement an authorship verification system
  • Add sentiment analysis across different writing contexts
  • Develop interactive visualizations for exploring the data