Penn Treebank Playground
On this page you can play around with the search tool for the Penn Treebank.
With the 'Filter' field you can search words for words or even specific tree structures. When searching for words you must set the 'Filter application' field to 'Raw text' and enter you words in the 'Filter' field. These can be either single words or entire strings you wish to search for.
When searching for (sub)trees you must set the 'Filter application' field to 'Tree representation'. You can then enter your tree in bracket notation in the 'Filter' field. If you have problems with writing your tree in a bracket notation you should try it with the info you get when clicking the explanation button.
- , (comma): functions as an AND operator. Entering several values separated by commas means you will get all sentences returned that contain all these values.
- | (bar): functions as an OR operator. Entering several values separated by bars means you will get all sentences returned that contain either of these values.
- * (star): functions as a wildcard or placeholder. Entering a word with the star attached means you will get all sentences returned containing words that start with the string before the star. For instance, entering 'man*' will not only give you all sentences containing 'man', but also all sentences containg 'many', 'manner', etc. You may also use the star as a placeholder for any (single) word (or category in a tree structure). For instance, try entering the structure (NP (NP (DT *) (NN *)) and see what sentences it returns. Note that one closing bracket to the right is missing on purpose. This leaves the structure open to the right, allowing it to match to any right hand daughter of the top NP.
- - (minus): functions as a negative operator. Entering a word or tag with a minus attached to it's front will specifically not show up in the sentences that are returned. For instance, entering 'man, -woman' will return all sentences containing the word man but lacking the word woman.
Note that you may also combine the use of both the word and categorial search in one query.