ejes consulting

Techincal Consulting Design and Automation

Archive for the ‘Programming’ Category

pSearch Source!!

leave a comment »

So I had some trouble getting my source put onto wordpress.  I can understand their point, they don’t want to share .zip, .tar or any other archive container formats.

In the intrest of brevity, I decided to just use a free file host.  I chose medifire, it was top on google when I checked.

http://www.mediafire.com/?u01bf1nwbemata9

There is where you can find the historical archives of my search development.  I did my best to ensure that it could be compiled on Windows (32-bit XP via, MinGW) or Linux (ubuntu 64-bit server), sometimes OpenBSD. You’ll likely need sqlite (http://www.sqlite.org/) and libcurl (http://curl.haxx.se/), you’ll probably need pcre libraries as well (http://www.pcre.org/).  If you try to compile something that looks like it should work, let me know and I’ll see if there’s any libraries that I might be missing, or at least I can let you know if it SHOULD compile.

All the above source is released under the original BSD Licence.

Copyright (c) 2010-, Evan Stawnyczy (ejes consulting) ejes@torfree.net
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. All advertising materials mentioning features or use of this software
   must display the following acknowledgement:
   This product includes software developed by ejes consulting.
4. Neither the names ejes consulting, Evan Stawnyczy nor the
   names of its addional contributors may be used to endorse or promote
   products derived from this software without specific prior written
   permission.

THIS SOFTWARE IS PROVIDED BY EVAN STAWNYCZY AND EJES CONSULTING ''AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

As of right now, I’m working on a complete rewrite.

This rewrite I’m hoping will serve as good working reference code.  It will be clear, easy to understand and most likely SLOW.  I am now using ODBC instead of limiting users to sqlite.  This also will hopefully allow enterprises to adopt use without too much trouble.  I decided to scrap the web-server aspect – hopefully someone will want to bundle search and web-server for home use.  Especially with the wide adoption of ipv6 a single workstation could easily share information with others all indexed through a private search network.  At the very least someone should write an nginx (http://nginx.org/), lighttpd (http://www.lighttpd.net/) and apache (http://www.apache.org/) module that indexes static and cached content and publishes the search results.

That leads me to trying to build this as an ip-agnostic application. I want it to run in both an ipv4 and ipv6 network. Of course this has it’s own challenges as well.  I’m trying to maintain ANSI compliance where I possibly can so that it can be easily portable, and mostly so that it can run on windows or unix without too much trouble.

Written by ejes

November 11, 2011 at 2:54 pm

pSearch – a peer to peer, distributed search engine

leave a comment »

Forward

So I haven’t been posting very much for the last while, and this is mainly because I’ve been very busy.

I always have several projects on the go, and I don’t have enough time to devote to all of these things at once, so usually the least interesting project gets placed on the back burner.

That is what happened to this blog.

Now I’ve spent a great deal of time on this, and have produced some very good design documents as well as a bunch of source code.  So… Without further ado

This is my distributed, peer-to-peer search engine.

Attached to this post you’ll find a couple of architecture documents, a pdf with a visual diagram of how this engine is suppose to work, and another pdf with a long winded, half written description of why and how I expect this conceptually to run.

I’m not a writer, and am mostly a technical person, however, I am actively updating and modifying this project so expect updates as it goes.

The first document is the “pSearch – Document

In this document I attempt to explain the strategy, and reasons for this project and what  I hope that it will accomplish.   This document is incomplete, but I encourage you to read it anyway.

The second document is the “pSearch – Drawing

In this document I have detailed the major aspects of the distributed search.  Hopefully it’s easy to follow, I don’t expect this diagram to change very much.

And I have a LOT of source code that I still have to organize – much of it will be posted here and some of it is too embarrassing.

Summary

So, without drudging into my documentation in too much detail (I posted them above, feel free) a simplified “how does this work” seems appropriate.

Each peer will accept connections  from the internet.  Each search request is forwarded to other peers as defined in it’s database.

While this happens, it also uses a second task to search it’s own internal database.  On a private home machine this internal crawler has a small collection of sites and keywords based on several configurable data collection points (such as your browser cache, or installed programs) which would automatically include a lot of data that would be specific to you.   A public internet site would index their own pages (this isn’t mandatory, but preferred).

After that, it’s a simple case of matching the keyword and publishing the results to the connected client.

Peers who respond quickly, and with a lot of results are flagged as “experts” when it comes to this set of terms.  This way, when you search for a similar set of terms again, the “expert” peers will be consulted first.

This way, common search terms will be responded to by clients who have a lot of information on these terms.  For example a site that indexes movies (like imdb) would respond with a lot of results for movie titles and information about films, but probably have very little to respond when a query has some specific request about cars.

Expect more as I develop more.  I encourage anyone to read and comment about my designs.

Written by ejes

November 7, 2011 at 11:43 am

More Command Line Magic

leave a comment »

Wow, I was looking around for a way to quickly convert my ls -al listing into octal ‘0774’ permission display. I found a really neat awk script that does just this here:
http://www.linuxforums.org/forum/newbie/21722-command-shows-me-permissions-file-octal.html#post371256

it’s this:
ls -l | awk ‘{k=0;for(i=0;i<=8;i++)k+=((substr($1,i+2,1)~/[rwx]/)*2^(8-i));if(k)printf(“%0o “,k);print}’

to make it permanent, add this to your .profile:
alias l=”ls -la –color | awk ‘{k=0;for(i=0;i<=8;i++)k+=((substr(\$1,i+2,1)~/[rwx]/)*2^(8-i));if(k)printf(\” %0o \”,k);print}'”

Written by ejes

February 25, 2011 at 11:11 pm

Posted in Hacking, Programming, Scripts

Tagged with ,