พอดี อาจาร์ยให้ทำการแปลบทความการทดลองที่เกี่ยวกับ การค้นคืนสารสนเทศ อะครับ แต่พอพร้อมลองใช้ translate + ด้วยความรู้ด้านด้านภาษาอันน้อยนิดแต่เรียงไปเรียงมา แล้วประโยคไม่ได้อะครับ มันอ่านแล้วมันยังรู้สึก ขาดๆ อะ รบกวนช่วยหน่อยนะครับ บทความตามด้านล่างนี้เลยครับ __/\__
4.3 Experiment Results The performance evaluation considers the main operations: complete index creation, simultaneous full text search over single terms under various workloads, and - in parallel - performing index up- date as product data change. The experiments are conducted for the file system index and the data- base index. We drop the RAM directory from our consideration, since the index under investigation is too large to fit into the 1.5 GB heap size provided by Java under Windows.
4.3.1 Complete index creation Building the complete index from scratch on the file system takes about 28 minutes. We find that the best way to create the complete index for the database is to first create a working copy on the file system and then to migrate the index from the file system to the database using a small utility that we developed to migrate the index from one storage to the other. This migration takes 3 minutes 19 seconds to complete. Thus, the overhead in this one time operation is less than 12%.
4.3.2 Full text search In this set of experiments, we vary the number of search threads from 1 to 25 concurrent worker threads and compare the system throughput, illu- strated in Fig. 5, and the query response time, illu- strated in Fig. 6, for both index storage techniques. We find that the performance indices are en- hanced by a factor > 2. The search throughput jumps from round 1,250,000 searches per hour to almost 3,000,000 searches per hour in our proposed system. The query response time is lowered by 40% by decreasing from 0.8 second to 0.6 second in av- erage. This is a very important result because it means that we increase the performance and take the robustness and scalability advantages of data- base management systems on top in our proposed system.
4.3.3 Index update In this set of experiments, we enable the incre- mental indexing option and repeat the above men- tioned experiments of Section 4.3.2. for different settings of think time between successive updates. In order to highlight the effect of incremental in- dexing, we choose very high index update rates by varying the think time from 20 to 100 milliseconds. For readability purposes, we only plot the results of the experiments having a think time of 40 and 80 milliseconds. In real life, we do not expect this ex- aggerated index update frequency. Fig. 7 demonstrates that the throughput of the index update thread in our proposed system is slightly better than the file system based implemen- tation. However, Fig. 8 shows that the response time of the index update operation in our system is worse than the original one. We attribute this to an inherent problem in Lucene. During index update, the whole index is exclusively locked by the index updater thread. This is too restrictive. In our im- plementation, we keep this exclusive lock although the database management system also keeps its own locking on the level of tuples which is less restric- tive, which would allow for more than one index update thread and certainly more concurrent searches. The extra overhead of holding both locks lead to the increase in the system response time. The good news is that the response time always remains under the absolute level of 25 seconds which is acceptable for most application taking into consideration the high update rate.
The search performance of our proposed sys- tem becomes very comparable to the original file system based implementation in an environment suffering from a high rate of index updates. Fig. 9 shows that the search throughput of the proposed system is slightly better than the file system based implementation; whereas Fig. 10 shows that our database index suffer from a slightly higher re- sponse time than the original system. Again, the effect of the exclusive lock over the whole index during index update is remarkable by comparing the performance indices of Fig. 5 and Fig. 6 to those of Fig. 9 and Fig. 10, respectively. The search throughput drops from 3,000,000 to round 1,100,000 searches per hour and the response time increases from 0.6 seconds to round 3 seconds.
5 CONCLUSION AND FUTURE WORK In this paper, we attempt to bring information retrieval back to database management systems. We propose using commercial DBMS as backend to existing full text search engines. Achieving this, today’s search engines directly gain more robust- ness, scalability, distribution and replication fea- tures provided by DBMS. In our case study, we provide a simple system integration of Lucene and MySQL without loss of generality. We build a performance evaluation toolkit and conduct several experiments on real data of an electronic marketplace. The results show that we reach comparable system throughout and re- sponse times of typical full text search engine oper- ations to the current implementation, which stores the index directly in the file system on the disk. In several cases, we even reach much better results which mean that we take the robustness and scala- bility of DBMS on top. Yet, this is only the beginning. We plan on mapping the whole internal index structure into database logical schema instead of just taking the file chunk as the smallest building block. This will solve the restrictive locking problem inherent in Lucene and will definitely boost overall perfor- mance. We also plan on extending our performance
evaluation toolkit to work on several sites of a dis- tributed database.
ช่วยหน่อยนะครับ ใครเก่งภาษากรุณาชาวยที
4.3 Experiment Results The performance evaluation considers the main operations: complete index creation, simultaneous full text search over single terms under various workloads, and - in parallel - performing index up- date as product data change. The experiments are conducted for the file system index and the data- base index. We drop the RAM directory from our consideration, since the index under investigation is too large to fit into the 1.5 GB heap size provided by Java under Windows.
4.3.1 Complete index creation Building the complete index from scratch on the file system takes about 28 minutes. We find that the best way to create the complete index for the database is to first create a working copy on the file system and then to migrate the index from the file system to the database using a small utility that we developed to migrate the index from one storage to the other. This migration takes 3 minutes 19 seconds to complete. Thus, the overhead in this one time operation is less than 12%.
4.3.2 Full text search In this set of experiments, we vary the number of search threads from 1 to 25 concurrent worker threads and compare the system throughput, illu- strated in Fig. 5, and the query response time, illu- strated in Fig. 6, for both index storage techniques. We find that the performance indices are en- hanced by a factor > 2. The search throughput jumps from round 1,250,000 searches per hour to almost 3,000,000 searches per hour in our proposed system. The query response time is lowered by 40% by decreasing from 0.8 second to 0.6 second in av- erage. This is a very important result because it means that we increase the performance and take the robustness and scalability advantages of data- base management systems on top in our proposed system.
4.3.3 Index update In this set of experiments, we enable the incre- mental indexing option and repeat the above men- tioned experiments of Section 4.3.2. for different settings of think time between successive updates. In order to highlight the effect of incremental in- dexing, we choose very high index update rates by varying the think time from 20 to 100 milliseconds. For readability purposes, we only plot the results of the experiments having a think time of 40 and 80 milliseconds. In real life, we do not expect this ex- aggerated index update frequency. Fig. 7 demonstrates that the throughput of the index update thread in our proposed system is slightly better than the file system based implemen- tation. However, Fig. 8 shows that the response time of the index update operation in our system is worse than the original one. We attribute this to an inherent problem in Lucene. During index update, the whole index is exclusively locked by the index updater thread. This is too restrictive. In our im- plementation, we keep this exclusive lock although the database management system also keeps its own locking on the level of tuples which is less restric- tive, which would allow for more than one index update thread and certainly more concurrent searches. The extra overhead of holding both locks lead to the increase in the system response time. The good news is that the response time always remains under the absolute level of 25 seconds which is acceptable for most application taking into consideration the high update rate.
The search performance of our proposed sys- tem becomes very comparable to the original file system based implementation in an environment suffering from a high rate of index updates. Fig. 9 shows that the search throughput of the proposed system is slightly better than the file system based implementation; whereas Fig. 10 shows that our database index suffer from a slightly higher re- sponse time than the original system. Again, the effect of the exclusive lock over the whole index during index update is remarkable by comparing the performance indices of Fig. 5 and Fig. 6 to those of Fig. 9 and Fig. 10, respectively. The search throughput drops from 3,000,000 to round 1,100,000 searches per hour and the response time increases from 0.6 seconds to round 3 seconds.
5 CONCLUSION AND FUTURE WORK In this paper, we attempt to bring information retrieval back to database management systems. We propose using commercial DBMS as backend to existing full text search engines. Achieving this, today’s search engines directly gain more robust- ness, scalability, distribution and replication fea- tures provided by DBMS. In our case study, we provide a simple system integration of Lucene and MySQL without loss of generality. We build a performance evaluation toolkit and conduct several experiments on real data of an electronic marketplace. The results show that we reach comparable system throughout and re- sponse times of typical full text search engine oper- ations to the current implementation, which stores the index directly in the file system on the disk. In several cases, we even reach much better results which mean that we take the robustness and scala- bility of DBMS on top. Yet, this is only the beginning. We plan on mapping the whole internal index structure into database logical schema instead of just taking the file chunk as the smallest building block. This will solve the restrictive locking problem inherent in Lucene and will definitely boost overall perfor- mance. We also plan on extending our performance
evaluation toolkit to work on several sites of a dis- tributed database.