我用MySQL模擬的一個(gè)表,其實(shí)Hive語法和SQL差不多,插入了三條數(shù)據(jù),a, b, c 分別代表三個(gè)機(jī)場名稱,結(jié)構(gòu)如下:
mysql> show create table t/G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `airport` varchar(10) DEFAULT NULL, `distant` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
mysql> select * from t; +---------+---------+ | airport | distant | +---------+---------+ | a | 130 | | b | 140 | | c | 150 | +---------+---------+ 3 rows in set (0.00 sec) 通過!=篩選掉本機(jī)場自己之間的比較,用abs函數(shù)取絕對值得到位置小于100的兩個(gè)機(jī)場
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | b | a | | c | a | | a | b | | c | b | | a | c | | b | c | +---------+---------+ 6 rows in set (0.00 sec) 但是問題來了,(b,a) 與(a,b),(c,a)與(a,c),(c,b)與(b,c)這里被我們視為重復(fù)值,我們只需要得到其中某一行的數(shù)據(jù),就知道是哪兩個(gè)機(jī)場名了,那么,如何去掉這個(gè)重復(fù)值呢?
mysql> select t1.airport,hex(t1.airport), t2.airport,hex(t2.airport) from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+-----------------+---------+-----------------+ | airport | hex(t1.airport) | airport | hex(t2.airport) | +---------+-----------------+---------+-----------------+ | b | 62 | a | 61 | | c | 63 | a | 61 | | a | 61 | b | 62 | | c | 63 | b | 62 | | a | 61 | c | 63 | | b | 62 | c | 63 | +---------+-----------------+---------+-----------------+ 6 rows in set (0.00 sec) 這樣我們就可以通過比較機(jī)場1和機(jī)場2的大小,來去掉重復(fù)值了
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec) 最后再優(yōu)化一下,結(jié)果如下:
mysql> select t1.airport, t2.airport from t t1,t t2 where hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec)