ºìÁªLinuxÃÅ»§
Linux°ïÖú

ÖÐÎľ䷨·ÖÎöÆ÷ ctbparser

·¢²¼Ê±¼ä:2012-11-20 09:17:28À´Ô´:ºìÁª×÷Õß:empast
Ò»¸öÓÃC++ʵÏÖµÄ ÖÐÎľ䷨·ÖÎö ¹¤¾ß°ü£¬²ÉÓõÄÊÇÖÐÎıöÖÝÊ÷¿â±ê×¼(Chinese Tree Bank)£¬²¢ÌṩԴ´úÂë¡£ ¿ÉÒÔ¶ÔԭʼµÄÎĵµ£¨GBK±àÂ룩½øÐÐ×Ô¶¯·±¼òת»»£¬·Ö¾ä£¬·Ö´Ê£¬´ÊÐÔ±ê×¢£¬ÒÀ´æ¾ä·¨·ÖÎö¡£

¸Ã¾ä·¨·ÖÎö¹¤¾ß²ÉÓÃÁ˱ê×¼µÄͼģÐÍËã·¨£¬¼´×î´óÉú³ÉÊ÷Ëã·¨(projective Maximum Spanning Tree)¡£¸ÃËã·¨ÓÉEisnerÓÚ96ÄêÌá³ö£¬¸´ÔÓ¶ÈΪ¾ä×Ó³¤¶ÈµÄÈý´Î·½¡£ÏêÇé¿É²Î¼ûÂÛÎÄ[1]

¾ä·¨·ÖÎöĿǰÉÐÊôÓÚÑо¿½×¶Î£¬Àëʵ¼ÊÓ¦Óû¹Óкܳ¤µÄ¾àÀë¡£Æä¹Ø¼üÎÊÌâÔÚÓھ䷨·ÖÎöËÙ¶ÈÌ«Âý¡£Ô¶Ô¶ÂäºóÓڷִʺʹÊÐÔ±ê×¢£¬Òò´ËÎÞ·¨´¦Àíº£Á¿Êý¾Ý¡£ÕýÒò ΪÕâÑù£¬±¾¹¤¾ß°ü²»Ìṩ¸ü¼ÓºÄʱµÄ¸ß½×½âÂëËã·¨(Higher order projective parsing)£¬²¢²ÉÓÃеÄÊý¾Ý½á¹¹[2]£¬ÔÚÎÞË𾫶ȵÄǰÌáÏ£¬Ìá¸ß¾ä·¨·ÖÎöµÄËÙ¶È¡£

ϵͳ¿ò¼Ü£º
Õû¸ö¾ä·¨·ÖÎö·ÖΪ5²½£º
1¡¢·±¼ò¡¢°ëÈ«½Çת»»²¢Çо䣺ËùÓз±Ìå×Öͨ¹ýÒ»ÕŶÔÓ¦±íת³É¼òÌå×Ö£¬ËùÓеİë½Ç·ûºÅת³ÉÈ«½Ç¡£±ÈÈç'a'¾Íת³É'£á'¡£¶Ôת»»ºÃµÄ¾ä×Ó£¬Óüòµ¥µÄ¹æÔò½øÐÐÇо䡣
2¡¢ÃüÃûʵÌåʶ±ð£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£Ðͱê³ö¾äÖеÄÈËÃû¡¢µØÃû¡£
3¡¢·Ö´Ê£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£ÐͽøÐзִʣ¬ÈËÃû¡¢µØÃû±»Ç¿ÖƶÀÁ¢³É´Ê¡£´ËÍ⣬²ÉÓÃ×î¶Ì·µÄ·½·¨Æ¥Åä³ö×ÖµäÖеĴʡ£
4¡¢´ÊÐÔ±ê×¢£º²ÉÓÃÌõ¼þËæ»ú³¡Ä£ÐͽøÐдÊÐÔ±ê×¢£¬×ֵ䯥Åä³öµÄ´Ê£¬Æä´ÊÐÔºÍ×ÖµäÒ»Ö¡£
5¡¢¾ä·¨·ÖÎö£º²ÉÓÃ×îÓÅÉú³ÉÊ÷Ëã·¨£¬¶Ô¾ä×Ó½øÐо䷨·ÖÎö¡£

ÆÀ²â£º
ÔÚCTB6±ê×¼²âÊÔ¼¯ÉÏ£¬ctbparser·Ö´ÊµÃµ½95.3% F1Öµ£¬´ÊÐÔ±ê×¢¾«¶È94.27%£¬¾ä·¨·ÖÎö¾«¶È81%¡£´¦ÀíËÙ¶È£¨°üÀ¨·Ö´Ê¡¢´ÊÐÔ±ê×¢¡¢¾ä·¨·ÖÎö£©µÄËÙ¶ÈÊÇÿÃë30¾ä£¬ÄÚ´æÕ¼ÓÃΪ270M¡££¨²Ù×÷ϵ ͳ£º64λCentOS 5£¬CPU: Intel(R) Xeon(R) E5405, 2.00GHz£©

¾ßÌåʹÓÃ˵Ã÷ÔÚ¹¤¾ß°üreadme_cn.htmlÎļþÖУ¬ÕâÀï¾Í²»ÌáÁË¡£

²Î¿¼ÎÄÏ×£º

[1] Mark A. Paskin, "Cubic-time Parsing and Learning Algorithms for Grammatical Bigram Models", technique report, 2001

[2] Xian Qian, Qi Zhang, Xuangjing Huang and Lide Wu. "2D Trie for fast parsing ", COLING 2010

Ö÷Ò³£ºhttp://code.google.com/p/ctbparser/

ÏÂÔØ£ºhttp://code.google.com/p/ctbparser/downloads/list

À´×Ô:¿ªÔ´ÖйúÉçÇø
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 0 ÌõÆÀÂÛ