skimpa.com

This announcement concerns a network of 30 websites being built for the following goals:

- Integrating newly obtained knowledge and techniques into Joomla
- Testing 30 different web directories
- Gathering links with applicable material for the purpose of collecting material for a number of websites.

Goals:
-Liquidate 15 of the 30 websites in this network by the end of April, 2008 after moving the material on those 15 websites to the remaining 15. The 15 domains will be redirected to proper websites.
-Liquidate 10 more websites, leaving 5 websites with 2-3 months worth of material from 25 websites by the end of May, 2008.

concerning this website:

topics, goals and notes:

gather the following info:
hardware and software requirements
to run from a server
to run from elsewhere... how?


perl search engines, indexing, crawlers and crawler platforms


Labrador
http://ir.dcs.gla.ac.uk/wiki/Labrador


Namazu
http://www.namazu.org/index.html.en
http://en.wikipedia.org/wiki/Namazu_%28search_engine%29


FDSE Perl Spider
http://www.perlspider.com/

java search engines, indexing, crawlers and crawler platforms


WebSPHINX
http://www.cs.cmu.edu/~rcm/websphinx/


Lucene
http://en.wikipedia.org/wiki/Lucene
http://lucene.apache.org/
search for: Software using Lucene
(MediaWiki can use Lucene for full-text search.)


Nutch
http://en.wikipedia.org/wiki/Nutch
http://lucene.apache.org/nutch/


WebRACE
http://grid.ucy.ac.cy/WebRace/


YaCy
http://en.wikipedia.org/wiki/YaCy
http://yacy.net

search engines, indexing, crawlers and crawler platforms written in C


DataparkSearch
http://www.dataparksearch.org/
http://en.wikipedia.org/wiki/DataparkSearch


mnoGoSearch
http://www.mnogosearch.org/
http://www.mnogosearch.org/


Surfraw
http://alioth.debian.org/projects/surfraw
http://en.wikipedia.org/wiki/Surfraw
Surfraw (Shell Users' Revolutionary Front Against the WWW) is a free POSIX-compliant (ie. meant for Linux, FreeBSD etc.) command-line shell program for interfacing with a number of web-based search engines. It is licensed under the GNU General Public License and written in the C programming language.
It uses what it calls "elvi" (a tribute to Elvis); elvi are interface widgets for specific engines and databases such as Google, AltaVista, Wikipedia, Dejanews, Freshmeat, research index, Slashdot, ArXiv, and a number of others. It allows options for configuration of using a shell browser (lynx, w3m) or a graphical browser (firefox, mozilla, konqueror) "for meatminds".

search search engines, indexing, crawlers and crawler platforms written in C++


Xapian
xapian.org
http://en.wikipedia.org/wiki/Xapian
written in C++ with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby. Xapian is highly portable and runs on Linux, Mac OS X, FreeBSD, NetBSD, OpenBSD, Solaris, HP-UX, Tru64, IRIX, and Microsoft Windows.
A growing number of organisations and projects are known to be using Xapian including Orange, Gmane, Die Zeit and the Newspaper Licensing Agency.

MS-Windows-based search engines


PubChemSR
http://cheminfo.informatics.indiana.edu/PubChemSR/
http://en.wikipedia.org/wiki/PubChemSR
.NET Platform
written in MS Visual Basic .Net
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
end h3s | end h3s | end h3s | end h3s | end h3s | end h3s | end h3s | end h3s
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



other/misc:

Gonzui
http://gonzui.sourceforge.net/
http://en.wikipedia.org/wiki/Gonzui
Gonzui is a source code search engine software.
This software was originally developed by Satoru Takabayashi who developed the Namazu whole text robot search engine.
Gonzui is specially designed for searching source code including various programming languages. As of 2005, it supports C, C++, Java, JavaScript, Ruby, Python, PHP, Perl, Objective Caml, Brainfuck, CSS, Shell and plain text.
It also supports licenses such as GPL, LGPL, Perl's License, and Ruby's License as a search key.


Grub (search engine/ crawler platform)
http://en.wikipedia.org/wiki/Grub_%28search_engine%29
http://grub.org/
http://search.wikia.com/
Wikia Search???


OpenFTS
http://www.rhodesmill.org/brandon/projects/pyfts.html
http://openfts.sourceforge.net/
http://en.wikipedia.org/wiki/OpenFTS
OpenFTS is a free software database search engine based on PostgreSQL. It is distributed under the GNU General Public License.


Redland RDF Application Framework
Redland is a set of free software libraries written in C that provide support for the Resource Description Framework (RDF), created by Dave Beckett (a former resident of Redland, Bristol).
Redland Language Bindings for APIs to Redland in C#, Java, Objective-C, Perl, PHP, Python, Ruby and Tcl
http://en.wikipedia.org/wiki/Redland_RDF_Application_Framework
http://librdf.org/
http://md.devc.at/users/rho/internet/semantic-web/rdf/redland-rdf (TM-hub)
http://kill.devc.at/internet/semantic-web/rdf/redland/tutorial


StrangeSearch
www.strangesearch.net
http://en.wikipedia.org/wiki/StrangeSearch
StrangeSearch is written in C#, C++ and Perl.


Sphinx
http://en.wikipedia.org/wiki/Sphinx_%28search_engine%29
http://sphinxsearch.com/
Sphinx is a free software search engine designed with indexing database content in mind. It currently supports MySQL and PostgreSQL
Solr
http://en.wikipedia.org/wiki/Solr
http://lucene.apache.org/solr/
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Apache Tomcat.



Free search engine software to do


SWISH-E
http://en.wikipedia.org/wiki/SWISH-E


Terrier Search Engine
http://en.wikipedia.org/wiki/Terrier_Search_Engine


Zettair
http://en.wikipedia.org/wiki/Zettair



??? to do


Agent Kernel
http://www.agentkernel.com/content.agent?page_name=Home



Uicrawler Universal information crawler
http://uicrawler.sourceforge.net/



Sherlock Holmes
http://www.ucw.cz/holmes/



ParallelUserAgent-2.57
http://search.cpan.org/~marclang/ParallelUserAgent-2.57/lib/LWP/Parallel/RobotUA.pm


JSpider
http://www.iterating.com/products/JSpider



Spinn3r
http://spinn3r.com/


The Laboratory for Web Algorithmics
http://law.dsi.unimi.it/index.php


--------------------------------------------------------------------------------------------------------------------------
???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!

YaCy???

Sciencenet
Java
http://en.wikipedia.org/wiki/Sciencenet
http://sciencenet.fzk.de
"Sciencenet" is an experimental search engine at KIT (Karlsruhe Institute of Technology - Liebel-Lab) for scientific knowledge. The Sciencenet software (YaCy) is based on p2p technology developed by Michael Christen in collaboration with Liebel-lab at Karlsruhe Institute of Technology.

???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!

text indexing and retrieval software programs


GLIMPSE
http://en.wikipedia.org/wiki/GLIMPSE
http://webglimpse.net/

???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!???!!!!!!
--------------------------------------------------------------------------------------------------------------------------

Free search engine software



DataparkSearch
Gonzui
Grub (search engine)
Ht-//Dig
Lucene
MnoGoSearch
Namazu (search engine)
Nutch
OpenFTS
PubChemSR
Redland RDF Application Framework
SWISH-E
Sciencenet
Solr
Sphinx (search engine)
StrangeSearch
Surfraw
Terrier Search Engine
Xapian
YaCy
Zettair

Free web crawlers


Axel (computer software)
CURL
DataparkSearch
Grub (search engine)
HTTrack
Heritrix
MnoGoSearch
Nutch
Wget

Desktop search engines


Advanced Searchbar
Ask.com
Beagle (software)
Copernic Desktop Search
Windows Indexing Service
Docco
DtSearch Corp.
Exalead
Filehawk
GNOME Storage
Gaviri PocketSearch
Google Desktop
Likasoft Archivarius 3000
Spotlight (software)
Strigi
Tracker (desktop search software)
Tropes Zoom
Windows Live Toolbar
Windows Search
X1 Professional Client


This announcement concerns a network of 30 websites being built for the following goals:

- Integrating newly obtained knowledge and techniques into Joomla
- Testing 30 different web directories
- Gathering links with applicable material for the purpose of collecting material for a number of websites.

Goals:
-Liquidate 15 of the 30 websites in this network by the end of April, 2008 after moving the material on those 15 websites to the remaining 15. The 15 domains will be redirected to proper websites.
-Liquidate 10 more websites, leaving 5 websites with 2-3 months worth of material from 25 websites by the end of May, 2008.

concerning this website:

topics, goals and notes:

-----------------------------------------------------------------------------------------------
search engines
directories
warnings concerning link farms
history of search engines and ...each country (...wp.pl / onet....)
-----------------------------------------------------------------------------------------------


National Digital Information Infrastructure and Preservation Program

Library of Congress Digital Library project


Academic databases and search engines



Apache Hadoop
Metasearch engine
Web archiving
Search engine indexing
Spidering Hacks
PageRank
link analysis algorithms
spider trap (or crawler trap)
Spambot
focused crawler or topical crawler
Distributed web crawling
Internet Archive
ht-//Dig
Lucene
Xapian
search engines, indexing, crawlers and crawler platforms
search engines and crawler platforms - is there a difference?
parsers
Desktop search
web crawler
Search appliance
NOD32 Signature Search Engine
Search aggregator

List of search engines
List of web directories



We are fully aware that this website doesn't look like much at the time. We offer numerous seo, hosting, web development services. If you do happen to find this site online and are interested in collaboration efforts with us or if you need assistance building your own website free of charge please feel free to contact us here