Skip to content
Snippets Groups Projects
Commit 394e5b32 authored by Luca Morandini's avatar Luca Morandini
Browse files

- Corrected bug in the text pre-procesing (lazy evaluation of RDDs version)

parent 413750b1
Branches
Tags v2021
No related merge requests found
......@@ -151,11 +151,11 @@ tokens = sc.parallelize(documents, 12)\
.zipWithIndex()
```
To show the kazy evaluation of RDDs, let's rewrite the above text processing as a sequence of steps:
To show the lazy evaluation of RDDs, let's rewrite the above text processing as a sequence of steps:
```python
tokens0 = sc.parallelize(documents, 12)
print("tokens0: {}".format(tokens0))
tokens1= tokens.map(lambda document: word_tokenize(document))
tokens1= tokens0.map(lambda document: word_tokenize(document))
print("tokens1: {}".format(tokens))
tokens2= tokens1.map(lambda document: [x[0] for x in nltk.pos_tag(document) if x[1][0:1] == 'N'])
print("tokens2: {}".format(tokens))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment