A Double-layer Word Segmentation Combined with ...

URL: http://www.ivypub.org/cst/paperInfo.aspx?ID=2304

This paper presents a double-layer model of Chinese word segmentation based on the combination of Local Ambiguity Word Grid and Conditional Random Fields. Firstly, the Local Ambiguity Word Grid algorithm is used to generate rough segmentation results in the lower level. Then, the text is segmented again based on CRF, where the rough results are set as one feature. The Local Ambiguity Word Grid algorithm has the advantage of detecting ambiguity from the process of Chinese word segmentation, while CRF can cope with vocabulary and out-of-vocabulary word equally. Therefore, the hybrid Local Ambiguity Word Grid and CRF approach is the effective resolution for the ambiguity and out-of-vocabulary word. The system is closed tested in the MSRA and PKU testing sets that are provided by the SIGHAN2005 Chinese Language Processing Bakeoff, along with the comparison between four characters and six characters in a set of label. The experiments show that F-measures of the MSRA and PKU testing sets in the closed test reach 97.1% and 95.1% respectively. Additional, the experimental results of open test reveal the practical application of the model.

There are no views created for this resource yet.

Additional Information

Field Value
Last updated May 15, 2013
Created unknown
Format aspx
License Other (Open)
Createdover 12 years ago
Media typetext/html
Size28,702
formataspx
id219ef929-5666-41f5-a66d-232aa62a7f70
last modifiedover 12 years ago
package id0c5f392b-8666-4597-95df-7af548c44ed5
resource typefile
revision idb7c3315c-d167-4e3c-acbd-9e3493e91bb2
stateactive