特性(最新版sphinx性能某些方面更高于下面描述)
1.高速的建立索引(在當代CPU上,峰值性能可達到10 MB/秒); 2.高性能的搜索(在2 – 4GB 的文本數據上,平均每次檢索響應時間小于0.1秒); 3.可處理海量數據(目前已知可以處理超過100 GB的文本數據, 在單一CPU的系統上可 處理100 M 文檔); 4.提供了優秀的相關度算法,基于短語相似度和統計(BM25)的復合Ranking方法; 5.支持分布式搜索; 6.可作為MySQL的存儲引擎提供搜索服務; 7.支持布爾、短語、詞語相似度等多種檢索模式; 8.文檔支持多個全文檢索字段(最大不超過32個); 9.文檔支持多個額外的屬性信息(例如:分組信息,時間戳等); 10.支持單一字節編碼和UTF-8編碼; 11.原生的MySQL支持(同時支持MyISAM 和InnoDB ); 12.原生的PostgreSQL 支持.
反正就是很牛逼就是了。
3.sphinx的安裝與運行(此部分轉載的)
1.需要安裝的軟件 coreseek的mmseg包 mysql安裝包 sphinx-0.9.8版 sphinx中文分詞補丁1 sphinx中文分詞補丁2
2.安裝libmmseg
tar -zxvf mmseg-0.7.3.tar.gz cd mmseg-0.7.3 ./configure --PRefix=/usr/local/mmseg make make install 1234512345有問題嘗試執行下面命令
echo '/usr/local/mmseg/lib' >> /etc/ld.so.conf ldconfig -v ln -s /usr/local/mmseg/bin/mmseg /bin/mmseg1231233.重新編譯mysql 安裝sphinx之前先裝兩個補丁。
tar -zxvf sphinx-0.9.8-rc2.tar.gz cd sphinx-0.9.8 patch -p1 < ../sphinx-0.98rc2.zhcn-support.patch patch -p1 < ../fix-crash-in-excerpts.patch123412344.安裝sphinx
cd /root/lemp/sphinx-0.9.8-rc2 ./configure --prefix=/usr/local/sphinx --with-mysql=/opt/mysql / --with-mysql-includes=/opt/mysql/include/mysql --with-mysql-libs=/opt/mysql/lib/mysql / --with-mmseg-includes=/usr/local/mmseg/include --with-mmseg-libs=/usr/local/mmseg/lib --with-mmseg make1234512345tokenizer_zhcn.cpp:1:30: SegmenterManager.h: 沒有那個文件或目錄 tokenizer_zhcn.cpp:2:23: Segmenter.h: 沒有那個文件或目錄1212make clean ./configure --prefix=/usr/local/sphinx --with-mysql=/opt/mysql / --with-mysql-includes=/usr/local/mysql/include/mysql --with-mysql-libs=/opt/mysql/lib/mysql / --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib --with-mmseg/root/sphinx/sphinx-0.9.8-rc2/src/tokenizer_zhcn.cpp:34: undefined reference to `libiconv_close' collect2: ld returned 1 exit status123456123456官網解決辦法:In the meantime I've change the configuration file and set#define USE_LIBICONV 0 in line 8179.修改configure 文件把 #define USE_LIBICONV 0 最后的數值由1改為0重新編譯。1234512345make clean ./configure --prefix=/usr/local/sphinx --with-mysql=/opt/mysql / --with-mysql-includes=/usr/local/mysql/include/mysql --with-mysql-libs=/usr/local/mysql/lib/mysql / --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib --with-mmseg
12341234vi configure輸入/define USE_LIBICONV 找到目標行按i鍵后將1改成0,按esc,輸入:wq保存退出
123123make make installcd /usr/local/sphinx/etc cp sphinx.conf.dist sphinx.conf
123412345.配置sphinx
vim /usr/local/sphinx/etc/sphinx.conftype = mysql # some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = sql_db = test sql_port = 3306 # optional, default is 3306address = 127.0.0.1 #安全點可以只監聽本機
123456789101112345678910116.索引建立 裝好sphinx后在sphinx的目錄中有三個目錄 分別為bin etc var bin中 存有sphinx用到的一些執行文件 包括 indexer 索引建立 search 查詢工具 searchd 查詢服務器。備注:最新版已經沒有search 查詢工具了
usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf test1 建立索引期間可能由于不同版本的數據庫導致indexer找不到共享庫libmysqlclient.so.16需要把/opt/mysql/lib/mysql/libmysqlclient.so.16.0.0 這個文件復制到/usr/lib下 或者作軟連接即可
123412347.查詢服務器 /usr/local/sphinx/bin/searchd –config /usr/local/sphinx/etc/sphinx.conf 為開啟
/usr/local/sphinx/bin/searchd –config /usr/local/sphinx/etc/sphinx.conf –stop 為關閉
sphinx的查詢 可以大致分為三種
7.1 數據庫引擎中的查詢7.2 通過search工具查詢(最新版已不提供這個工具) /usr/local/sphinx/bin/search --config /usr/local/sphinx/etc/sphinx.conf test7.3 通過php的接口查詢 詳見sphinxapi.php8.創建sphinx啟動腳本與配置
#!/bin/sh # sphinx: Startup script for Sphinx search # # chkconfig: 345 86 14 # description: This is a daemon for high performance full text / # search of MySQL and PostgreSQL databases. / # See http://www.sphinxsearch.com/ for more info. # # processname: searchd # pidfile: $sphinxlocation/var/log/searchd.pid # Source function library. . /etc/rc.d/init.d/functions processname=searchd servicename=sphinx username=sphinx sphinxlocation=/usr/local/sphinx pidfile=$sphinxlocation/var/log/searchd.pid searchd=$sphinxlocation/bin/searchd RETVAL=0 PATH=$PATH:$sphinxlocation/bin start() { echo -n $"Starting Sphinx daemon: " daemon --user=$username --check $servicename $processname RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename } stop() { echo -n $"Stopping Sphinx daemon: " $searchd --stop #killproc -p $pidfile $servicename -TERM RETVAL=$? echo if [ $RETVAL -eq 0 ]; then rm -f /var/lock/subsys/$servicename rm -f $pidfile fi } # See how we were called. case "$1" in start) start ;; stop) stop ;; status) status $processname RETVAL=$? ;; restart) stop sleep 3 start ;; condrestart) if [ -f /var/lock/subsys/$servicename ]; then stop sleep 3 start fi ;; *) echo $"Usage: $0 {start|stop|status|restart|condrestart}" ;; esac exit $RETVAL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778chmod 755 /etc/init.d/sphinx chkconfig --add sphinx chkconfig --level 345 sphinx on chkconfig --list|grep sphinx #檢查下service sphinx start #運行 service sphinx stop #停止,官方的腳本在我的as4上有點問題,所以粗魯的改了下 service sphinx restart #重啟 service sphinx status #查看是否運行#檢查下已用sphinx用戶運行ps aux |grep searchd sphinx 24612 0.0 0.3 11376 6256 pts/1 S 14:07 0:00 searchd
1234567891011121234567891011124.sphinx在億級項目中的使用場景
不管是網站還是app很多產品的設計思路和產品功能多多少少都有相似之處,那么這邊主要講以下幾個場景
描述、話題的搜索
主要的實現思路是全量索引+增量索引方式,可設定時任務定點跑索引
用戶昵稱的搜索
主要是實現思路是實時索引+分布式索引的方式,用戶由于過多,故使用實時索引的方法進行增加,舊數據通過跑腳本重新讀取后再寫入。
搜索框聯想詞的提示
主要實現思路是分布式索引的方式,自動聯想其他人曾經輸入過的詞語。
tip: morphology = stem_en會啟用英文單詞的提取。搜索英文時候就不會一個一個字母搜了,會提高sphinx搜索英文單詞的時候的效率。