License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.ICCSW.2015.87
URN: urn:nbn:de:0030-drops-54850
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2015/5485/
Zhang, Jian
Automatic Transformation of Raw Clinical Data Into Clean Data Using Decision Tree Learning Combining with String Similarity Algorithm
Abstract
It is challenging to conduct statistical analyses of complex scientific datasets. It is a timeconsuming process to find the relationships within data for whether a scientist or a statistician. The process involves preprocessing the raw data, the selection of appropriate statistics, performing analysis and providing correct interpretations, among which, the data pre-processing is tedious and a particular time drain. In a large amount of data provided for analysis, there is not a standard for recording the information, and some errors either of spelling, typing or transmission. Thus, there will be many expressions for the same meaning in the data, but it will be impossible for analysis system to automatically deal with these inaccuracies. What is needed is an automatic method for transforming the raw clinical data into data which it is possible to process automatically. In this paper we propose a method combining decision tree learning with the string similarity algorithm, which is fast and accuracy to clinical data cleaning. Experimental results show that it outperforms individual string similarity algorithms and traditional data cleaning process.
BibTeX - Entry
@InProceedings{zhang:OASIcs:2015:5485,
author = {Jian Zhang},
title = {{Automatic Transformation of Raw Clinical Data Into Clean Data Using Decision Tree Learning Combining with String Similarity Algorithm}},
booktitle = {2015 Imperial College Computing Student Workshop (ICCSW 2015)},
pages = {87--94},
series = {OpenAccess Series in Informatics (OASIcs)},
ISBN = {978-3-95977-000-2},
ISSN = {2190-6807},
year = {2015},
volume = {49},
editor = {Claudia Schulz and Daniel Liew},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2015/5485},
URN = {urn:nbn:de:0030-drops-54850},
doi = {10.4230/OASIcs.ICCSW.2015.87},
annote = {Keywords: Raw Clinical Data, Decision Tree Learning, String Similarity Algorithm}
}
Keywords: |
|
Raw Clinical Data, Decision Tree Learning, String Similarity Algorithm |
Collection: |
|
2015 Imperial College Computing Student Workshop (ICCSW 2015) |
Issue Date: |
|
2015 |
Date of publication: |
|
23.09.2015 |