ºìÁªLinuxÃÅ»§
Linux°ïÖú

Genius·Ö´Ê3.1.4°æ±¾·¢²¼

·¢²¼Ê±¼ä:2014-05-21 10:47:23À´Ô´:ºìÁª×÷Õß:empast
Genius ·Ö´Ê 3.1.4 °æ±¾·¢²¼

±¾´ÎµÄÖ÷Òª¸üУº

1¡¢Öع¹ÖÐÎÄÊý×Öת°¢À­²®Êý×Ö¹¦ÄÜ£¨¡°Ò»°Ùһʮ¶þ¡±×ª»»³É¡°112¡±£©¡£

2¡¢Ìí¼ÓÖÐÎÄÊý×ÖµÄbreakÕýÔò±í´ïʽ¡£

Genius

GeniusÊÇÒ»¸ö¿ªÔ´µÄpythonÖÐÎÄ·Ö´Ê×é¼þ£¬²ÉÓà CRF(Conditional Random Field)Ìõ¼þËæ»ú³¡Ëã·¨¡£

Feature
Ö§³Öpython2.x¡¢python3.xÒÔ¼°pypy2.x¡£

Ö§³Ö¼òµ¥µÄpinyin·Ö´Ê

Ö§³ÖÓû§×Ô¶¨Òåbreak

Ö§³ÖÓû§×Ô¶¨ÒåºÏ²¢´Êµä

Ö§³Ö´ÊÐÔ±ê×¢

Source Install

°²×°git: 1) ubuntu or debian apt-get install git 2) fedora or redhat yum install git

ÏÂÔØ´úÂ룺git clone https://github.com/duanhongyi/genius.git

°²×°´úÂ룺python setup.py install

Pypi Install

Ö´ÐÐÃüÁeasy_install genius»òÕßpip install genius

Algorithm

²ÉÓÃtrieÊ÷½øÐкϲ¢´Êµä²éÕÒ

»ùÓÚwapitiʵÏÖÌõ¼þËæ»ú³¡·Ö´Ê

¿ÉÒÔͨ¹ýgenius.loader.ResourceLoaderÀ´ÖØÔØÄ¬ÈϵÄ×Öµä

¹¦ÄÜ 1)£º·Ö´Êgenius.seg_text·½·¨

genius.seg_textº¯Êý½ÓÊÜ5¸ö²ÎÊý£¬ÆäÖÐtextÊDZØÌî²ÎÊý:

textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û

use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue

use_combine´ú±íÊÇ·ñʹÓÃ×ֵ佸Ðдʺϲ¢£¬Ä¬ÈÏÖµFalse

use_tagging´ú±íÊÇ·ñ½øÐдÊÐÔ±ê×¢£¬Ä¬ÈÏÖµTrue

use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµTrue

´úÂëʾÀý( È«¹¦ÄÜ·Ö´Ê )
#encoding=utf-8
import genius
text = u"""×òÌì,ÎÒºÍÊ©Íß²¼ÏÈÉúÒ»ÆðÓ벿·ÖÆóÒµ¼Ò½øÐÐÁ˽»Á÷,´ó¼Ò¶ÔÖйú¾­¼Ãµ±Ç°¡¢Î´À´·¢Õ¹µÄÌ¬ÊÆ¡¢×ßÊÆ¶¼Ê®·Ö¹ØÐÄ¡£"""
seg_list = genius.seg_text(
text,
use_combine=True,
use_pinyin_segment=True,
use_tagging=True,
use_break=True
)
print('\n'.join(['%s\t%s' % (word.text, word.tagging) for word in seg_list]))
¹¦ÄÜ 2)£ºÃæÏòË÷Òý·Ö´Ê

genius.seg_keywords·½·¨×¨ÃÅΪËÑË÷ÒýÇæË÷Òý×¼±¸£¬±£ÁôÆçÒå·Ö¸î£¬ÆäÖÐtextÊDZØÌî²ÎÊý¡£

textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û

use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue

use_tagging´ú±íÊÇ·ñ½øÐдÊÐÔ±ê×¢£¬Ä¬ÈÏÖµFalse

use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµFalse

ÓÉÓںϲ¢²Ù×÷Óë´Ë·½·¨ÓÐÒâÒåÉϵijåÍ»£¬´Ë·½·¨²¢²»ÌṩºÏ²¢¹¦ÄÜ£»²¢ÇÒÈç¹û²ÉÓô˷½·¨×öË÷Òýʱºò£¬¼ìË÷ʱ²»ÍƼögenius.seg_textʹÓÃuse_combine=True²ÎÊý¡£

´úÂëʾÀý
#encoding=utf-8
import genius

seg_list = genius.seg_keywords(u'ÄϾ©Êг¤½­´óÇÅ')
print('\n'.join([word.text for word in seg_list]))
¹¦ÄÜ 3)£º¹Ø¼ü´ÊÌáÈ¡

genius.tag_extract·½·¨×¨ÃÅΪÌáÈ¡tag¹Ø¼ü×Ö×¼±¸£¬ÆäÖÐtextÊDZØÌî²ÎÊý¡£

textµÚÒ»¸ö²ÎÊýΪÐèÒª·Ö´ÊµÄ×Ö·û

use_break´ú±í¶Ô·Ö´Ê½á¹¹½øÐдò¶Ï´¦Àí£¬Ä¬ÈÏÖµTrue

use_combine´ú±íÊÇ·ñʹÓÃ×ֵ佸Ðдʺϲ¢£¬Ä¬ÈÏÖµFalse

use_pinyin_segment´ú±íÊÇ·ñ¶ÔÆ´Òô½øÐзִʴ¦Àí£¬Ä¬ÈÏÖµFalse

´úÂëʾÀý
#encoding=utf-8
import genius

tag_list = genius.extract_tag(u'ÄϾ©Êг¤½­´óÇÅ')
print('\n'.join(tag_list))
ÆäËû˵Ã÷ 4)£º

Ŀǰ·Ö´ÊÓïÁϳö×ÔÈËÃñÈÕ±¨1998Äê1Ô·ݣ¬ËùÒÔ¶ÔÓÚÐÂÎÅÀàÎÄÕ·ִʽÏΪ׼ȷ¡£

CRF·Ö´ÊЧ¹ûºÜ´ó³Ì¶ÈÉÏÒÀÀµÓÚѵÁ·ÓïÁϵÄÀà±ðÒÔ¼°¸²¸Ç¶È£¬Èô½â¾öÓïÁÏÎÊÌâ·Ö´ÊºÍ±êעЧ¹û»¹ÓкܴóµÄÌáÉý¿Õ¼ä¡£

ÏîÄ¿Ö÷Ò³£ºhttps://github.com/duanhongyi/genius

À´×Ô:¿ªÔ´ÖйúÉçÇø
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 0 ÌõÆÀÂÛ