Ò»¸öÓÃC++ʵÏÖµÄ ÖÐÎľ䷨·ÖÎö ¹¤¾ß°ü£¬²ÉÓõÄÊÇÖÐÎıöÖÝÊ÷¿â±ê×¼(Chinese Tree Bank)£¬²¢ÌṩԴ´úÂë¡£ ¿ÉÒÔ¶ÔÔʼµÄÎĵµ£¨GBK±àÂ룩½øÐÐ×Ô¶¯·±¼òת»»£¬·Ö¾ä£¬·Ö´Ê£¬´ÊÐÔ±ê×¢£¬ÒÀ´æ¾ä·¨·ÖÎö¡£
¸Ã¾ä·¨·ÖÎö¹¤¾ß²ÉÓÃÁ˱ê×¼µÄͼģÐÍËã·¨£¬¼´×î´óÉú³ÉÊ÷Ëã·¨(projective Maximum Spanning Tree)¡£¸ÃËã·¨ÓÉEisnerÓÚ96ÄêÌá³ö£¬¸´ÔÓ¶ÈΪ¾ä×Ó³¤¶ÈµÄÈý´Î·½¡£ÏêÇé¿É²Î¼ûÂÛÎÄ[1]
¾ä·¨·ÖÎöĿǰÉÐÊôÓÚÑо¿½×¶Î£¬Àëʵ¼ÊÓ¦Óû¹Óкܳ¤µÄ¾àÀë¡£Æä¹Ø¼üÎÊÌâÔÚÓھ䷨·ÖÎöËÙ¶ÈÌ«Âý¡£Ô¶Ô¶ÂäºóÓڷִʺʹÊÐÔ±ê×¢£¬Òò´ËÎÞ·¨´¦Àíº£Á¿Êý¾Ý¡£ÕýÒò ΪÕâÑù£¬±¾¹¤¾ß°ü²»Ìṩ¸ü¼ÓºÄʱµÄ¸ß½×½âÂëËã·¨(Higher order projective parsing)£¬²¢²ÉÓÃеÄÊý¾Ý½á¹¹[2]£¬ÔÚÎÞË𾫶ȵÄǰÌáÏ£¬Ìá¸ß¾ä·¨·ÖÎöµÄËÙ¶È¡£
ϵͳ¿ò¼Ü£º
Õû¸ö¾ä·¨·ÖÎö·ÖΪ5²½£º
1¡¢·±¼ò¡¢°ëÈ«½Çת»»²¢Çо䣺ËùÓз±Ìå×Öͨ¹ýÒ»ÕŶÔÓ¦±íת³É¼òÌå×Ö£¬ËùÓеİë½Ç·ûºÅת³ÉÈ«½Ç¡£±ÈÈç'a'¾Íת³É'£á'¡£¶Ôת»»ºÃµÄ¾ä×Ó£¬Óüòµ¥µÄ¹æÔò½øÐÐÇо䡣
2¡¢ÃüÃûʵÌåʶ±ð£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£Ðͱê³ö¾äÖеÄÈËÃû¡¢µØÃû¡£
3¡¢·Ö´Ê£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£ÐͽøÐзִʣ¬ÈËÃû¡¢µØÃû±»Ç¿ÖƶÀÁ¢³É´Ê¡£´ËÍ⣬²ÉÓÃ×î¶Ì·µÄ·½·¨Æ¥Åä³ö×ÖµäÖеĴʡ£
4¡¢´ÊÐÔ±ê×¢£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£ÐͽøÐдÊÐÔ±ê×¢£¬×ֵ䯥Åä³öµÄ´Ê£¬Æä´ÊÐÔºÍ×ÖµäÒ»Ö¡£
5¡¢¾ä·¨·ÖÎö£º²ÉÓÃ×îÓÅÉú³ÉÊ÷Ëã·¨£¬¶Ô¾ä×Ó½øÐо䷨·ÖÎö¡£
ÆÀ²â£º
ÔÚCTB6±ê×¼²âÊÔ¼¯ÉÏ£¬ctbparser·Ö´ÊµÃµ½95.3% F1Öµ£¬´ÊÐÔ±ê×¢¾«¶È94.27%£¬¾ä·¨·ÖÎö¾«¶È81%¡£´¦ÀíËÙ¶È£¨°üÀ¨·Ö´Ê¡¢´ÊÐÔ±ê×¢¡¢¾ä·¨·ÖÎö£©µÄËÙ¶ÈÊÇÿÃë30¾ä£¬ÄÚ´æÕ¼ÓÃΪ270M¡££¨²Ù×÷ϵ ͳ£º64λCentOS 5£¬CPU: Intel(R) Xeon(R) E5405, 2.00GHz£©
¾ßÌåʹÓÃ˵Ã÷ÔÚ¹¤¾ß°üreadme_cn.htmlÎļþÖУ¬ÕâÀï¾Í²»ÌáÁË¡£
²Î¿¼ÎÄÏ×£º
[1] Mark A. Paskin, "Cubic-time Parsing and Learning Algorithms for Grammatical Bigram Models", technique report, 2001
[2] Xian Qian, Qi Zhang, Xuangjing Huang and Lide Wu. "2D Trie for fast parsing ", COLING 2010
Ö÷Ò³£ºhttp://code.google.com/p/ctbparser/
ÏÂÔØ£ºhttp://code.google.com/p/ctbparser/downloads/list
À´×Ô:¿ªÔ´ÖйúÉçÇø

