An Erroneous Word Locating Algorithm for Natural Language Text in Medical Scene Based on Clustering Analysis

Qiubo Huang, Xin Huang, Keyuan Jiang

Abstract


With the development of technology, the processing of natural language texts is increasingly used in medical scenes. For example, speech recognition technology and smart customer service robots can help hospitals to improve efficiency. However, the accuracy rate of the text output of the speech recognition is not satisfactory, especially in the medical scene. As a result, the study of natural language text processing in medical scene is very meaningful. To address this issue, we construct a corpus, a knowledge base, a Pinyin library and a word frequency library. We perform word segmentation operations based on the knowledge base and calculate the word co-occurrence probability based on the word frequency library. The word co-occurrence probability is used to determine whether erroneous words exist or not. If there are words whose co-occurrence probabilities are below a threshold, we assert that there exists erroneous words. And then the clustering analysis is used to locate the erroneous words. The experimental results show that the accuracy rate of erroneous word locating is very high. Therefore, this method has certain practical value.

Keywords


Medical Scene, Natural Language Text, Erroneous Word Locating, Co-occurrence Probability, Clustering Analysis


DOI
10.12783/dtetr/eeec2018/26847

Full Text:

PDF

Refbacks

  • There are currently no refbacks.