Manage duplicate entries

Who should use this feature?

The manage duplicate entries feature automatically identifies duplicate entries (based on a configurable similarity threshold) and supports manually setting entries as duplicates.

Through a process of selecting a primary entry and confirming duplicates, duplicates are archived, but can be made visible as part of the judging process.

None of this changes the original entry (e.g duplicate entries are not being merged nor data moved between entries)— so that any actions with managing duplicates are non-destructive, reversible and changeable.

What is automatically identified as a duplicate entry?

Entries with the same or very similar entry name, in the same category.

Scanning for and confirming duplicates

If you are on a Pro plan, you'll find the Manage duplicates button at the bottom left of the Manage entries list view:

Scanning.png

The first step on the Manage duplicates page is the Scan for duplicates button. Whilst scanning is quite quick, scanning is a computationally-intensive process, so is only done on demand.

The scan process compares every entry with all other entries created before it for similarity, then displays a list of all identified duplicate entries.

The entry created and submitted first is treated as the "primary" entry. The primary can be changed. The primary entry in an identified duplicate set is the one that will be the basis of judging.

The objective of a program manager is then to work through sets of duplicates to:

  • Compare entries if necessary (via the action overflow)
  • Select a different primary if preferred (with the radio button)
  • Set entries as Not a duplicate if that is the case (via the action overflow)
  • Confirm and archive duplicates when satisfied
  • ... and eventually empty the Manage duplicates page

Dashboard.png

Note:
  • Only a submitted entry can be set as primary (as that will be the judged entry, and only submitted entries can be judged)

  • A set of duplicates with no submitted entries (and therefore no primary) cannot be confirmed as duplicates

  • The scan process will handle a maximum of 3,000 entries in a batch to keep things running smoothly. After Confirm + archive is done, running the scan again will continue scanning more entries, and it will not rescan entries already handled

  • If an entry name or category is changed after a confirm is done, the entry will be flagged for scanning again with the next scan

  • Adjacent to the scan button is a summary of when the last scan was done, how many entries remain to be scanned (if any), and whether entries need to be rescanned (e.g. after a change has been made)

 

Setting similarity threshold

St Settings > General > Entries, is a setting for the Minimum entry similarity percentage, that can be set between 50% and 100%.

The default is set to 85%, and optimum seems to be around 80% to 85%. Changing the setting after a scan has been done will necessitate scanning again.

Any entries already confirmed as duplicates will remain so.

Similarity.png

Duplicates summary

On the manager's view of each entry is a summary of duplicates, where a program manager can:

  • have an overview of a duplicate set
  • observe or change the primary
  • see those that are confirmed as duplicates and archived
  • compare duplicate entries

Duplicates_summary.png

Manually setting duplicates

It may be that the automatic scanning process does not identify a duplicate (e.g. because a nominee's nickname is used, which substantially differs in spelling from their name used in other nominations), in which case a program manager can manually set entries as duplicates:

  • via the Manage entries list view
  • as a bulk action
  • or acting on a single entry via the action overflow

Manually_set.png

Judging duplicates

The quantity of duplicate entries, and the content of duplicates may have a bearing on judging, so there are a couple of options to be able to display this information to judges. This can be set on each score set, so can be treated differently for different stages of judging.

Under Judging > Score sets, select the score set in question, then select the Display tab. You will see two check boxes to control what is shown to judges:

mceclip0.png

Based on these settings, a duplicates box is displayed on the associated judging view with the entrant / nominators name linking to a preview of that entry, opening in another tab. All judging, scoring and commenting is against the primary entry only.

 mceclip1.png