Select folder

Select this button to identify where the folder is that contains the files in your corpus.
- Your files must be in a text format. The may not be in doc, docx, rtf, pdf or any other proprietary/binary format. On Macs, an app such as TextWrangler [http://www.barebones.com/products/textwrangler/] can help you to properly format your files. On Windows, Microsoft Notepad [https://en.wikipedia.org/wiki/Microsoft_Notepad] is a reliable app for creating text documents.
- Make sure your files end with the .txt suffix, e.g., 1001.txt, 4398.txt.
- Your corpus must be in a flat folder structure, meaning that files must all be in the same folder and the folder must not contain any subfolders.
File list
You may select one of the files in your file list and preview its contents.

Encoding

The encoding is the manner in which the text files are stored. If the characters you see as you preview the files in your corpus seem to be rendered incorrectly, you should adjust the encoding setting until all previews render correctly. This will make an important difference in terms of the results you see and the accuracy of your searches.
Preview rendered incorrectly with UTF8 encoding:

Corrected by switching to MacRoman:

Supported encodings:
| MacRoman | An encoding typically used on Macs. |
| UTF8 | An encoding typically used on most all computers. It accommodates the use of various character sets, including Arabic, Chinese, Japanese, and Russian. |
| ISO Latin 9 | An encoding typically used on Windows machines. |
Line ending
The line ending is an invisible character that breaks off one line from another in your corpus. If your corpus is not rendered correctly in the preview, you should adjust the line ending setting until all previews render correctly.
Supported line endings:
| unix | A line ending used on the most recent Macs. |
| mac | A line ending typically used on older Macs. |
| windows | A line ending typically used on Windows machines. |