Java 集合與隊列的插入、刪除在并發下的性能比較

2019-11-17 03:26:51

字體：大中小

來源：轉載

供稿：網友

這兩天在寫一個java多線程的爬蟲，以廣度優先爬取網頁，設置兩個緩存：
•　　一個保存已經訪問過的URL：vistedUrls
•　　一個保存沒有訪問過的URL：unVistedUrls

　　需要爬取的數據量不大，對URL壓縮后，可以把這兩個數據結構都放入內存，vistedUrls很顯然用HashSet<String>實現，因為已經訪問的URL只會添加，不會刪除和修改，使用HashSet可以高效判斷一個URL是否已經訪問。

　　糾結unVistedUrls該用什么數據結構，如果用隊列的話，并發情況下，隊列中可能會用重復的URL，比如一個線程A爬了CSDN的一個URL1，另一個線程B爬了博客園的一個URL2，URL1和URL2的頁面都有一個相同的出鏈URL3，線程A把URL3加入到unVistedUrls的隊尾，等待下次爬取，但在URL3被爬取之前，線程B也把URL3加到隊尾，這樣隊列中就有兩個相同的URL，可能會導致重復爬取網頁，當然可以通過其他方法來保證不會重復爬取。

　　然后就想能否也用Set來保存未訪問的URL，這樣在添加新的URL時，自動去重處理了，能夠有效保證不爬取重復網頁。但是unVistedUrls會有大量的插入和刪除操作，我認為對集合進行大量的插入刪除性能會比較低，為了測試集合的插入刪除性能對比隊列低多少，我寫了一個簡單的并發測試：

復制代碼
1 /**
2 * 測試集合與隊列的插入與讀寫性能
3 *
4 * @author jiqunpeng@Gmail.com
5 *
6 */
7 public class SetQueueTest {
8     // 隨即數構造器
9     PRivate static Random r = new Random(10);
10     // 控制測試線程停止的原子變量
11     private static AtomicBoolean stop = new AtomicBoolean(false);
12
13     /***
14      * 基類，供測試用
15      *
16      * @author [email protected]
17      *
18      */
19     static abstract class Service {
20         // 操作的計數器
21         protected long count = 0;
22
23         // 添加一堆元素，并去一個元素
24         public abstract String addAndPick(List<String> elements);
25
26         // 取一個元素
27         public abstract String pickOne();
28
29         /**
30          * 打印操作次數
31          */
32         public void tell() {
33             System.out.println(this + " :/t" + count);
34         }
35     }
36
37     /***
38      * 采用TreeSet的集合工具
39      *
40      * @author [email protected]
41      *
42      */
43     static class SetService extends Service {
44         private TreeSet<String> set = new TreeSet<String>();
45
46         @Override
47         public synchronized String addAndPick(List<String> elements) {
48             count++;
49             set.addAll(elements);
50             return set.pollFirst();
51         }
52
53         @Override
54         public synchronized String pickOne() {
55             count++;
56             return set.pollFirst();
57         }
58
59     }
60
61     /***
62      * 采用LinkedList的隊列工具
63      *
64      * @author [email protected]
65      *
66      */
67     static class QueueService extends Service {
68         private Queue<String> queue = new LinkedList<String>();
69
70         @Override
71         public synchronized String addAndPick(List<String> elements) {
72             count++;
73             queue.addAll(elements);
74             return queue.poll();
75         }
76
77         @Override
78         public synchronized String pickOne() {
79             count++;
80             return queue.poll();
81         }
82     }
83
84     /***
85      * 測試類
86      *
87      * @author [email protected]
88      *
89      */
90     static class Tester implements Runnable {
91         // 綁定要測試的工具對象
92         private Service service;
93
94         Tester(Service s) {
95             this.service = s;
96         }
97
98         @Override
99         public void run() {
100             while (stop.get() == false) {
101                 List<String> elements = new ArrayList<String>();
102                 int len = r.nextInt(200) + 8;
103                 for (int i = 0; i < len; i++) {
104                     elements.add(String.valueOf(r.nextInt()));
105                 }
106                 service.addAndPick(elements);
107                 for (int i = 0; i < 104; i++)
108                     service.pickOne();
109             }
110         }
111     }
112
113     /***
114      * 多線程方式，測試一個插入、刪除工具
115      *
116      * @param service
117      * @param time
118      * @param unit
119      * @throws InterruptedException
120      */
121     private static void test(Service service, int time, TimeUnit unit)
122             throws InterruptedException {
123         ExecutorService execs = Executors.newCachedThreadPool();
124         for (int i = 0; i < 20; i++) {
125             execs.execute(new Tester(service));
126         }
127         execs.shutdown();
128         unit.sleep(time);
129         stop.compareAndSet(false, true);
130         service.tell();
131     }
132
133     public static void main(String[] args) throws InterruptedException {
134         Service setService = new SetService();
135         test(setService, 5, TimeUnit.SECONDS);
136         stop.compareAndSet(true, false);// 重置終止條件
137         Service queueService = new QueueService();
138         test(queueService, 5, TimeUnit.SECONDS);
139     }
140 }

復制代碼

　　輸出的結果如下：

SetQueueTest$SetService@5e9de959 : 7149859
SetQueueTest$QueueService@11b343e0 : 24303408

　　測試結果讓我感到吃驚，TreeSet的插入刪除效率確實比LinkedList低，20個線程跑了10秒，使用隊列，插入刪除24303408次，使用集合，插入刪除7149859次。它們之間差距并不大，隊列只比集合快2~3倍。屬于同一個數量級。于是我這個小型的爬蟲應該放心的選擇用Set作為unVistedUrls的實現。

轉載請注明出處：www.companysz.com/fengfenggirl

上一篇：webApp開發幾點體會

下一篇：C#4.0從入門到精通_第22章_酒店管理系統