Document Type

Article

Disciplines

Engineering | Medicine and Health Sciences

Abstract

Abstract

Background

Record linkage integrates records across multiple related data sources identifying duplicates and accounting for possible errors. Real life applications require efficient algorithms to merge these voluminous data sources to find out all records belonging to same individuals. Our recently devised highly efficient record linkage algorithms provide best-known solutions to this challenging problem.

Method

We have developed RLT-S, a freely available web tool, which implements our single linkage clustering algorithm for record linkage. This tool requires input data sets and a small set of configuration settings about these files to work efficiently. RLT-S employs exact match clustering, blocking on a specified attribute and single linkage based hierarchical clustering among these blocks.

Results

RLT-S is an implementation package of our sequential record linkage algorithm. It outperforms previous best-known implementations by a large margin. The tool is at least two times faster for any dataset than the previous best-known tools.

Conclusions

RLT-S tool implements our record linkage algorithm that outperforms previous best-known algorithms in this area. This website also contains necessary information such as instructions, submission history, feedback, publications and some other sections to facilitate the usage of the tool.

Availability

RLT-S is integrated into http://www.rlatools.com, which is currently serving this tool only. The tool is freely available and can be used without login. All data files used in this paper have been stored in https://github.com/abdullah009/DataRLATools. For copies of the relevant programs please see https://github.com/abdullah009/RLATools.

Comments

Citation: Mamun A-A, Aseltine R, Rajasekaran S (2015) RLT-S: A Web System for Record Linkage. PLoS ONE 10(5): e0124449. doi:10.1371/journal.pone.0124449

Copyright: © 2015 Mamun et al.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All data are available through Figshare.com.

Dataset doi: http://dx.doi.org/10.6084/m9.figshare.1340114.

Source code doi: http://dx.doi.org/10.6084/m9.figshare.1340113.

COinS