Decisive Word Identification for Text Classification
Decisive Word Identification for Text Classification
Understanding why machine learning models classify text in a certain way remains a significant challenge, especially in fields like legal, medical, or content moderation where transparency is crucial. Existing methods often highlight influential words or features but don't show which specific words would actually change the classification if removed.
A More Direct Explanation Approach
One way to provide clearer explanations for text classification decisions is by identifying decisive words—those that, when removed, alter the predicted class. For example, if a classifier labels a document as "sports-related," this approach would pinpoint whether removing the word "football" would make it classify as "politics" instead. The process involves:
- Testing document variations with specific words or short phrases removed
- Tracking which changes make the prediction flip to another category
- Returning these minimal sets of words as the explanation
This method offers tangible insights, helping users understand exactly what text segments are driving the classification.
Execution and Advantages
An initial version could be a Python library compatible with common models like BERT or SVM, focusing first on single-word removals. Later phases might optimize the process with smarter sampling and visualization tools. The key advantage over alternatives like LIME or SHAP is that it doesn't just highlight important words—it proves which ones are decisive by demonstrating how their removal changes the outcome. This makes the explanations more actionable for debugging, compliance, or user trust.
While computational challenges exist—especially with long documents—techniques like prioritizing content words and parallel processing could ensure scalability. The focus on minimal decisive sets also differentiates it from methods that return feature weights without demonstrating concrete impact.
Hours To Execute (basic)
Hours to Execute (full)
Estd No of Collaborators
Financial Potential
Impact Breadth
Impact Depth
Impact Positivity
Impact Duration
Uniqueness
Implementability
Plausibility
Replicability
Market Timing
Project Type
Research