Genius ·Ö´Ê 3.1.4 °æ±¾·¢²¼
±¾´ÎµÄÖ÷Òª¸üУº
1¡¢Öع¹ÖÐÎÄÊý×Öת°¢À²®Êý×Ö¹¦ÄÜ£¨¡°Ò»°Ùһʮ¶þ¡±×ª»»³É¡°112¡±£©¡£
2¡¢Ìí¼ÓÖÐÎÄÊý×ÖµÄbreakÕýÔò±í´ïʽ¡£
Genius
GeniusÊÇÒ»¸ö¿ªÔ´µÄpythonÖÐÎÄ·Ö´Ê×é¼þ£¬²ÉÓà CRF(Conditional Random Field)Ìõ¼þËæ»ú³¡Ëã·¨¡£
Feature
Ö§³Öpython2.x¡¢python3.xÒÔ¼°pypy2.x¡£
Ö§³Ö¼òµ¥µÄpinyin·Ö´Ê
Ö§³ÖÓû§×Ô¶¨Òåbreak
Ö§³ÖÓû§×Ô¶¨ÒåºÏ²¢´Êµä
Ö§³Ö´ÊÐÔ±ê×¢
Source Install
°²×°git: 1) ubuntu or debian apt-get install git 2) fedora or redhat yum install git
ÏÂÔØ´úÂ룺git clone https://github.com/duanhongyi/genius.git
°²×°´úÂ룺python setup.py install
Pypi Install
Ö´ÐÐÃüÁeasy_install genius»òÕßpip install genius
Algorithm
²ÉÓÃtrieÊ÷½øÐкϲ¢´Êµä²éÕÒ
»ùÓÚwapitiʵÏÖÌõ¼þËæ»ú³¡·Ö´Ê
¿ÉÒÔͨ¹ýgenius.loader.ResourceLoaderÀ´ÖØÔØÄ¬ÈϵÄ×Öµä
¹¦ÄÜ 1)£º·Ö´Êgenius.seg_text·½·¨
genius.seg_textº¯Êý½ÓÊÜ5¸ö²ÎÊý£¬ÆäÖÐtextÊDZØÌî²ÎÊý:
textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û
use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue
use_combine´ú±íÊÇ·ñʹÓÃ×ֵ佸Ðдʺϲ¢£¬Ä¬ÈÏÖµFalse
use_tagging´ú±íÊÇ·ñ½øÐдÊÐÔ±ê×¢£¬Ä¬ÈÏÖµTrue
use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµTrue
´úÂëʾÀý( È«¹¦ÄÜ·Ö´Ê )
#encoding=utf-8
import genius
text = u"""×òÌì,ÎÒºÍÊ©Íß²¼ÏÈÉúÒ»ÆðÓ벿·ÖÆóÒµ¼Ò½øÐÐÁ˽»Á÷,´ó¼Ò¶ÔÖйú¾¼Ãµ±Ç°¡¢Î´À´·¢Õ¹µÄÌ¬ÊÆ¡¢×ßÊÆ¶¼Ê®·Ö¹ØÐÄ¡£"""
seg_list = genius.seg_text(
text,
use_combine=True,
use_pinyin_segment=True,
use_tagging=True,
use_break=True
)
print('\n'.join(['%s\t%s' % (word.text, word.tagging) for word in seg_list]))
¹¦ÄÜ 2)£ºÃæÏòË÷Òý·Ö´Ê
genius.seg_keywords·½·¨×¨ÃÅΪËÑË÷ÒýÇæË÷Òý×¼±¸£¬±£ÁôÆçÒå·Ö¸î£¬ÆäÖÐtextÊDZØÌî²ÎÊý¡£
textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û
use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue
use_tagging´ú±íÊÇ·ñ½øÐдÊÐÔ±ê×¢£¬Ä¬ÈÏÖµFalse
use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµFalse
ÓÉÓںϲ¢²Ù×÷Óë´Ë·½·¨ÓÐÒâÒåÉϵijåÍ»£¬´Ë·½·¨²¢²»ÌṩºÏ²¢¹¦ÄÜ£»²¢ÇÒÈç¹û²ÉÓô˷½·¨×öË÷Òýʱºò£¬¼ìË÷ʱ²»ÍƼögenius.seg_textʹÓÃuse_combine=True²ÎÊý¡£
´úÂëʾÀý
#encoding=utf-8
import genius
seg_list = genius.seg_keywords(u'ÄϾ©Êг¤½´óÇÅ')
print('\n'.join([word.text for word in seg_list]))
¹¦ÄÜ 3)£º¹Ø¼ü´ÊÌáÈ¡
genius.tag_extract·½·¨×¨ÃÅΪÌáÈ¡tag¹Ø¼ü×Ö×¼±¸£¬ÆäÖÐtextÊDZØÌî²ÎÊý¡£
textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û
use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue
use_combine´ú±íÊÇ·ñʹÓÃ×ֵ佸Ðдʺϲ¢£¬Ä¬ÈÏÖµFalse
use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµFalse
´úÂëʾÀý
#encoding=utf-8
import genius
tag_list = genius.extract_tag(u'ÄϾ©Êг¤½´óÇÅ')
print('\n'.join(tag_list))
ÆäËû˵Ã÷ 4)£º
Ŀǰ·Ö´ÊÓïÁϳö×ÔÈËÃñÈÕ±¨1998Äê1Ô·ݣ¬ËùÒÔ¶ÔÓÚÐÂÎÅÀàÎÄÕ·ִʽÏΪ׼ȷ¡£
CRF·Ö´ÊЧ¹ûºÜ´ó³Ì¶ÈÉÏÒÀÀµÓÚѵÁ·ÓïÁϵÄÀà±ðÒÔ¼°¸²¸Ç¶È£¬Èô½â¾öÓïÁÏÎÊÌâ·Ö´ÊºÍ±êעЧ¹û»¹ÓкܴóµÄÌáÉý¿Õ¼ä¡£
ÏîÄ¿Ö÷Ò³£ºhttps://github.com/duanhongyi/genius
À´×Ô:¿ªÔ´ÖйúÉçÇø