TOPic CATegoriser (TOPCAT) Support

Please note that TOPCAT will only work with the Windows version of Microsoft Excel (2007 or later)

TOPCAT is designed to work with the English language only

Installation

To install the TOPCAT add-in:

  1. Save the TOPCAT add-in file to a location on your computer e.g. your Desktop
  2. Start Excel and ensure that a blank workbook is open
  3. If you already have a version of the TOPCAT add-in installed, please uninstall (delete) this before proceeding
  4. Click the File tab, click Options, and then click the Add-Ins category
  5. In the Manage box, click Excel Add-ins, and then click Go. The Add-Ins dialog box appears
  6. Click Browse, locate the TOPCAT add-in file you saved in Step 1
  7. click Open and then click OK
  8. Excel may ask you if you want to copy the add-in to your add-ins folder. Click Yes

The first time you use TOPCAT after installation you will be asked to enter your product key:

  1. With any workbook open, click the TopCat tab, and then click the Extract Topics button on the ribbon
  2. Enter your email address / username and product activation key exactly as supplied in the email from us, and then click Activate
  3. The main TOPCAT dialog box will appear. Click Cancel

Congratulations, TOPCAT is now installed and ready to use!

Using TOPCAT

Before you can use TOPCAT you need to create a taxonomy file. This is a spreadsheet that contains all the lexical rules that TOPCAT uses to identify specific themes and topics within your target data. TOPCAT doesn't automatically identify topics for you, what it does do is let you create multiple flexible rules to target the topics you want to identify that are unique to your data.

Creating your first Taxonomy file

  1. Select the TopCat tab on the Excel Ribbon
  2. Click the New Taxonomy button
  3. A new blank workbook will appear with two worksheets called Taxonomy and Entities

Taxonomy Rules

On the Taxonomy worksheet there are two rows labeled Category and Rule. In the Category column you enter a name for the topic or theme that you want to identify. In the Rule column you enter the lexical rule/s associated with the Category. Each Category can and often will have multiple rules.

For example, suppose you have some customer feedback data that is captured immediately after the customer has spoken to one of your contact centre agents and you want to know which customers mentioned being put on hold. On the Taxonomy worksheet you could enter the following:

Category

Rule

Hold on hold
Hold hanging on
Hold kept me waiting

This rule will look for any instances of the phrases "on hold", "hanging on" or "kept me waiting" within your data and tag this with the Category "Hold". The exact words and phrases you use to create your rules will depend on the type of data you have. It's always best to spend some time reading though some of your data to gain an understanding of the sorts of words and phrases that are being used to refer to specific topics.

Here is a more complex example:

Category

Rule

Staff agent*
Staff manager*
Staff i spoke to
Staff representative*

In this example, whenever a match for any of these four rules is found anywhere within each item of text, that item of data will be tagged with the Category "Staff". In this example we introduce the concept of wildcards (*). Wildcards enable you to create rules that match partial pieces of text. For example the rule "agent*" will match the word "agent" and also the word "agents". Without using a wildcard the rule "agent" would not match text that contained the word "agents". Wildcards can be used at the beginning or end of a rule but not in the middle.

Important

TOPCAT pre-processes your text to remove certain characters and symbols:

The following characters are replaced with a space: ,/.;?!

Only the following characters are then allowed to remain (all other characters are removed): @0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

You need to be aware of this when creating rules

For example if your text contains the string @Virgin_Atlantic, your rule should read *virginatlantic. The @ symbol remains so you need to use a wildcard character '*' to ensure a match, but the underscore will be removed during pre-processing by TOPCAT so you should also remove this from your rule otherwise a match won't be found.

Keywords

In addition to using wildcards, there are three keywords that can be used within rules. These are:

  1. NEAR
  2. AND
  3. NOT

Keywords give you even more flexibility in creating rules to cover multiple lexical variations or to be very specific with a rule.

NEAR Keyword: Consider the rule "call back". This will match when the phrase "call back" appears within the target text, but will not match if the phrase "call me back" or "call us back" appears. By using the NEAR keyword you can create a single rule "call NEAR back" that matches all of these variations. The NEAR keyword works for words or phrases that are up-to 5 words apart within a string of text.

AND Keyword: Using the AND keyword you can create a rule that matches two words or phrases that appear anywhere within the target text. For example the rule "handset AND complaint" will match any text item where the words "handset" and "complaint" appear anywhere within the text string.

NOT Keyword: The NOT keyword can be used to match two words or phrases that do not appear within a text string. For example the rule "management NOT management information" will match text strings that contain the word "management" but not the phrase "management information". This can be a useful technique to avoid finding false positives in your data.

Some important points about Keywords:

  1. You can use phrases and wildcards with Keywords
  2. Keywords ignore punctuation so will search across sentences in the text string
  3. You cannot use more than one Keyword in any single rule

Entity Rules

Entities are things that can be referred to in different ways. For example the entity "Staff" can appear in your data as "agent", "manager", "supervisor", "checkout operator" and so on.

Entity rules allow you to create a single description "Staff" for all of these different ways that staff can be referred to. Entities are a useful way to reduce the number of taxonomy rules you need to create. For example by creating a single entity for "Staff" you can then use the entity name "Staff" in your Taxonomy rules. This is particularly useful when using Keywords in your taxonomy rules

Entity Rules

Entity

Rule

Staff agent*
Staff manager*
Staff supervisor*
Staff teller*
Attitude rude
Attitude pleasant
Attitude abrupt
Attitude off hand

Taxonomy Rules

Category

Rule

Staff Attitude staff NEAR attitude

The above rules enable you to identify data items where 'staff attitude' specifically is mentioned.

You can also use Entities to exclude certain phrases that might give false positive matches in your rules. For example in the above example we have created a rule that looks for the word "agent" being used to describe staff. However imagine a situation where the phrase "estate agent" appears in your data. This rule would trigger, giving a false positive match. Using entities we can create an Entity called "Ignore" and a rule "estate agent". This entity rule should be entered before any other entity rules. Now, any instances of the phrase "estate agent" will not trigger our "agent" rule.

Using Entity rules isn't necessary to use TOPCAT but in certain circumstance they can be extremely useful.

Important points about creating rules

  1. Don't use punctuation or special characters in your rules. Only letters and numbers should be used
  2. The only exception to the above is the @ character
  3. Any http web addresses in your text will be converted to the string "URL" before the text is processed, so you can't create rules that look for specific text within these URLs.

When you have created taxonomy rules (and entity rules if necessary) for your data it's time to run TOPCAT:

Running TOPCAT

  1. Highlight all the textual data that you want to process
  2. Select the TopCat Tab on the Excel Ribbon
  3. Click the Extract Topics button
  4. Select the taxonomy workbook to use (if you have more than one open). If you don't have a taxonomy workbook open you must open an existing taxonomy workbook or create a new one before you can continue
  5. Click the Start button

Output

When the TOPic CATegoriser has finished processing your data there will be 2 new worksheets added to your original workbook;

Taxonomy - this worksheet repeats the taxonomy you have created in this workbook and adds a count of the number of instances where each rule was matched. Note that a rule is only counted once in each item of data (individual cell containing text). A single item of data can be counted in more than one category e.g. one data item can contain multiple topics.

Categories - this is a list of all the matches that were found in your original data. There can be multiple matches for a single data item.

On the worksheet containing your original data two new columns are added immediately to the right of your data;

  1. The first new column contains a reference number that matches the reference numbers on the new 'categories' worksheet
  2. The second column contains a string that shows every category that exists within each data item

How do I know what taxonomy rules to create?

Creating an accurate taxonomy is an iterative process that initially can take some time. You will need to manually read some of your data to establish what appear to be common terms and themes. This is easier if you already know what you are looking for, for example product names, but can take more time if you don't know exactly what the common topics and themes might be in your data.

We suggest that you initially manually review a sample of your data and create an initial taxonomy based on what you find. Then, run TOPCAT against your whole data set using this taxonomy. Next filer the results based on the 'UNCATEGORISED' category and review a sample of this data to see if there are different ways that topics are being referred to or new themes are identified. Re-run TOPCAT again and repeat this process until you reduce the proportion of 'UNCATEGORISED' topics to a small amount (how small is up to you to decide based on your data and how much time you put into the initial taxonomy creation process).

Once you are happy with your taxonomy, if your data is refreshed or updated i.e. regular customer survey data that is updated every month, it's just a matter of doing a quick review of the 'UNCATEGORISED' topics to see if any new trends are appearing.

Unlike manual topic categorisation often done by a team of humans, TOPCAT will always consistently apply the rules you create in your taxonomy file, meaning you can be confident that over time you are able to make a like-for-like comparison to accurately identify changes to themes and trends in your data. And of course, TOPCAT can apply categorisation rules in a fraction of the time it would take for a human to manually code a set of data!

TOPCAT doesn't automatically identify the topics in your data. There are products on the market that attempt to do this although even these (which cost considerably more than TOPCAT!) still need some manual fine tuning, and are often complex to install and use. We designed TOPCAT from the ground up to be an easy to install and use, cost effective alternative that gives everyone access to the power and benefits of a flexible topic categorisation tool at a low price point.