A personal stylometric project that analyzes my writing patterns through word frequencies and linguistic features. By tracking these patterns over time, I aim to understand how my writing style evolves and potentially develop an authorship checker. Inspired by authorship analysis of The Federalist Papers (Kosuke Imai-Quantitative Social Science)
PythonNLTKChart.jspython-docx
Methodology
Data Collection
Process writing samples from .txt and .docx files
Text preprocessing: lowercase conversion, tokenization, and stop word removal (NLTK standard + modal verbs, discourse markers, adverbs like 'usually', 'generally', 'typically', and common function words)
Analysis Techniques
Word frequency analysis and distribution
Readability metrics (Flesch Reading Ease, Gunning Fog Index)
One may analyse their own documents by clicking the 'Quick Analysis' tab
Analysis of some assorted writing samples over time. These results may be periodically updated as new documents are added to the corpus. The notes for a sociology exam feature prominently
15,432
Total Words Analyzed
2,847
Unique Words
18.6
Avg. Words per Sentence
Most Frequent Terms
Upload your own document for instant stylometric analysis. Supported formats: .docx, .txt, .pdf
Shorter texts provide less context for accurate sentiment analysis... For my analysis I have uploaded 28,000 words of written text
Individual words or very short phrases can be misinterpreted!
Longer texts allow the analyser to better understand: Overall tone, Context, Nuanced expressions and the Balance of positive/negative elements
×
Data Handling & Privacy Information
User uploads a file
File is temporarily saved
Analysis is performed
Results are sent back to the user
Temporary file is immediately deleted
No data persists on the server
# Technical implementation using Python's tempfile:
temp_file = tempfile.NamedTemporaryFile(delete=False)
file.save(temp_file.name)
temp_file.close()
try:
# Analyze the document
results = analyze_document(temp_file.name)
finally:
# Clean up: delete the temporary file
if os.path.exists(temp_file.name):
os.unlink(temp_file.name)
Analyzing document...
-
Words
-
Unique Words
-
Avg. Sentence Length
Readability
Flesch Reading Ease:-
Gunning Fog Index:-
Style Characteristics
Vocabulary Sophistication:-
Sentence Variety:-
Sentiment Analysis
Polarity:-
Subjectivity:-
Metrics Guide
Flesch Reading Ease (0-100)
0-30: Very difficult (Academic/Scientific)
30-50: Difficult (College level)
50-60: Fairly difficult
60-70: Standard
70-80: Fairly easy
80-90: Easy
90-100: Very easy
Gunning Fog Index
17+: Post-graduate level
14-17: College/University level
12-14: High school level
10-12: General audience
8-10: Conversational English