Searcharoo.net ASP.NET Search Engine

Searcharoo.net: ASP.NET Search with C#

Skip Navigation Links
Home
Version 1
Version 2
Version 3
Version 4
Version 5
Version 6
Links

  

Version 6
Index JPG images, index GPS location data for mapping results, address "No" Trust problem and fix a few bugs. NEW! June '08
Version 5
Remove Binary Serialization to solve Medium Trust problem; index OpenXML document formats.
Version 4
Refactored codebase and ability to index and search Microsoft Word, Excel, PowerPoint and Acrobat PDFs. Little improvements like robots.txt and excluding regions of HTML also added.
Version 3
Adds a "save to disk" for the catalog; feature suggestions, bug fixes and incorporation of code contributed by others from previous versions.
Version 2
Extend Searcharoo to populate its search catalog by Spidering HTML pages - follow links and imagemaps to process both static and dynamicly generated pages! You can also search for multiple words.
Version 1
How to build a simple, extensible search engine using ASP.NET that can crawl files and create a searchable catalog by processing the text from HTML source.
Display Pagerank
Locations of visitors to this page

Web search technology is a huge subject, encompassing:

  • networking (spidering the web),
  • string and markup-language manipulation (parsing HTML)
  • proprietary file formats (searching Word, Excel, PDF, etc)
  • language and text-parsing (finding words & sentences in documents, stemming and other linguistic analysis),
  • algorithms (finding matches, AND/OR queries, combining multiple word results)
  • performance (both increasing spidering speed, and making large catalogs fast to search)
  • user interface (presenting search input options, and results)

Searcharoo.net hardly touches the surface on any of these topics :-) but it does attempt to introduce them with an open-source C# implementation of a search engine that you can download and use on your website.

The default interface should be familiar (and is easily customizable).
Overview of version 6: Image search, Google Earth and Google Maps
The articles describe how the engine itself is built, from a simple file-system crawler to a fully-fledged web-spider. You can comment or ask questions on CodeProject.

In addition to information on this website, these search-related links might be interesting/useful.

Useful links

conceptdevelopment.net

Craig's Blog
 Linqaroo - Linq for Searcharoo

On Search, the Series

dotLucene [Open Source]

SiteSearchEngine [VB.net article]

What is Stemming?

Robots.txt

more links »