Skip to content
Snippets Groups Projects
Commit b039fa49 authored by weirdwizardthomas's avatar weirdwizardthomas
Browse files

Added a word prunner to filter out stop words and non alphabetical words (i.e. punctuation)

parent 225b6076
No related branches found
No related tags found
No related merge requests found
import string
from nltk.corpus import stopwords
class WordPrunner:
def __init__(self):
self.stop_words = set(stopwords.words('english'))
def prune(self, tokens: list) -> list:
# remove stop words and punctuation
return [term for term in tokens if term.isalpha() and term not in self.stop_words]
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment