Improving the fulltext search engine

The search engine you have created so far works, but there is certainly much room for improvement. A few ways you can further improve this search engine are as follows:

• Creating a DOCX and PDF keyword extractor:

The Microsoft Word DOCX and Adobe PDF formats are popular file attachment types. Letting your users search through DOCX and PDF content can indeed provide a lot of business value. You can easily create new keyword extractors for these file formats by leveraging the framework you have created.

• Supporting nested Boolean queries:

You can also build support for nested Boolean queries. A sample nested Boolean query might look like this: ((Medical OR Certificate) NOT Clinic) OR Hospital.

• Creating a more comprehensive list of stop words to reduce the size of generated keywords:

By increasing the size of the stop words list, you can increase the accuracy of your search by stripping away irrelevant text. Take note, however, that increasing the size of the stop words list means your application will need to incur more processing cycles to strip away this data. You will need to find the right balance between accuracy and performance for your project.

• Implementing search result ranking:

You can also easily implement a search-result ranking system roughly similar to that used by Google. By keeping track of a popularity counter for each search result, and incrementing it every time the user opens that particular item, you can sort the search results by this popularity counter in descending order to show the most popular result at the top.

