BitFunnel
| BitFunnel | |
|---|---|
| Developer | Microsoft |
| Initial release | 2016 |
| Written in | C++ |
| Platform | Windows, macOS, Ubuntu |
| Type | Search engine indexing algorithm |
| License | MIT License |
| Website | bitfunnel |
| Repository | github |
BitFunnel is the search engine indexing algorithm and a set of components used in the Bing search engine,[1] which were made open source in 2016.[2] BitFunnel uses bit-sliced signatures instead of an inverted index in an attempt to reduce operations cost.[3]
History
Progress on the implementation of BitFunnel was made public in early 2016, with the expectation that there would be a usable implementation later that year.[4] In September 2016, the source code was made available via GitHub.[5] A paper discussing the BitFunnel algorithm and implementation was released as through the Special Interest Group on Information Retrieval of the Association for Computing Machinery in 2017 and won the Best Paper Award.[3][6]
Components
BitFunnel consists of three major components:[1]
- BitFunnel – the text search/retrieval system itself
- WorkBench – a tool for preparing text for use in BitFunnel
- NativeJIT – a software component that takes expressions that use C data structures and transforms them into highly optimized assembly code
Algorithm
Initial problem and solution overview
The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through inverted indexes, where each searchable item is maintained with a map of keywords.[3]
In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions.[3]
Theoretical implementation of bit-string signatures
The signature of a document (D) can be described as the logical-or of its term signatures:
Similarly, a query for a document (Q) can be defined as a union:
Additionally, a document D is a member of the set M' when the following condition is satisfied:
This knowledge is then combined to produce a formula where M' is identified by documents which match the query signature:
These steps and their proofs are discussed in the 2017 paper.[3]
Pseudocode for bit-string signatures
This algorithm is described in the 2017 paper.[3]
References
- ^ a b Yegulalp, Serdar (September 6, 2016). "Microsoft open-sources Bing components for fast code compilation". InfoWorld.
- ^ Verma, Arpit (2016-09-07). "Microsoft Open Sources Major Components Of Bing Search Engine, Here's Why It Matters". Fossbytes. Retrieved 2020-06-12.
- ^ a b c d e f Goodwin, Bob; Hopcroft, Michael; Luu, Dan; Clemmer, Alex; Curmei, Mihaela; Elnikety, Sameh; He, Yuxiong (2017-08-07). "BitFunnel". Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM. pp. 605–614. doi:10.1145/3077136.3080789. ISBN 978-1-4503-5022-8.
- ^ "When will BitFunnel be usable? · BitFunnel". bitfunnel.org. Retrieved 2020-06-12.
- ^ BitFunnel/BitFunnel, BitFunnel, 2020-05-12, retrieved 2020-06-12
- ^ "SIGIR Best Paper Awards". ACM. Retrieved 8 July 2020.
External links
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.