An Erroneous Word Locating Algorithm for Natural Language Text in Medical Scene Based on Clustering Analysis

Qiubo Huang, Xin Huang, Keyuan Jiang


With the development of technology, the processing of natural language texts is increasingly used in medical scenes. For example, speech recognition technology and smart customer service robots can help hospitals to improve efficiency. However, the accuracy rate of the text output of the speech recognition is not satisfactory, especially in the medical scene. As a result, the study of natural language text processing in medical scene is very meaningful. To address this issue, we construct a corpus, a knowledge base, a Pinyin library and a word frequency library. We perform word segmentation operations based on the knowledge base and calculate the word co-occurrence probability based on the word frequency library. The word co-occurrence probability is used to determine whether erroneous words exist or not. If there are words whose co-occurrence probabilities are below a threshold, we assert that there exists erroneous words. And then the clustering analysis is used to locate the erroneous words. The experimental results show that the accuracy rate of erroneous word locating is very high. Therefore, this method has certain practical value.


Medical Scene, Natural Language Text, Erroneous Word Locating, Co-occurrence Probability, Clustering Analysis


Full Text:



  • There are currently no refbacks.