KrawlSite

Browser

Source i (link to git-repo or to original if based on someone elses unmodified work):

Add the source-code for this project on opencode.net

1
Become a Fan
5.0

Available as/for:
Description:
KrawlSite is a web crawler/spider/ offline browser/download manager application. It is a KPart component with its own shell, so it can be run independently in its shell as well as it can be embedded into KPart aware applications like Konqueror.
To integrate with Konqueror, open the file associations page in the configuration dialog, select text/html mime type and in the embedded viewers list choose KrawlSite_Part. Now when you right click on a web-page in Konqueror, in the preview in menu, you'll see KrawlSite. Selecting it embeds the component into Konqueror as in the second screen shot. The first screen shot shows the shell in which the component runs. The third component is the configuration dialog.

If you like it please rate it as good

Feel free to send in your bug reports and comments. I'll look into them when I have some spare time.

Also, I am lousy at creating icons, so if someone out there likes this applications(a lot), please make an icon for this app. I'll include your name in the credits.

TIP
To use this app to download tutorials, set offline mode on, start crawling from the start of the tutorial. If the start page of the tutorial is the TOC, set crawl depth to 1 or if the start page has the TOC along with the first chapter, set crawl depth to 0. If only next & previous links are present per chapter page, set crawl depth to number of chapters.

I'd like to put in all this information in the handbook, but due to lack of time, not been able to do so. If someone understands the functionality and is willing to write the handbook, pls contact me.

If someone develops an rpm for this, pls contact me, so that I can link your rpm from this page. Many thanks!
Last changelog:

ver 0.7
Finally!
*crash free(afaik!), esp after kde 3.4 came around.
*support for html frames
*better UI

patch to v 0.6
* removes a bug that crashes app.
* removes bug in multiple job mode

ver 0.6
This one took a long time to come out, but it removes almost all of the bugs that caused the app to crash intermittently, apparently without any reason! There's one KNOWN BUG:
* If icon thumbnail previews are generated real time as files are created/deleted the app crashes. This has something to do with the internal implementation of the file browser(a KDE component), so to remove this bug, I'll have to write my own component( lot of work ), or i am doing something wrong with it ( will look into it). Thumbnail previews is disabled by default(but can be enabled by the context menu)
changes:
*) almost crash proof (see above)
*) new file browser, much cleaner to use.
*) more work on the leech mode, so its easier to use as a download manager.
If you use this app, with some regularity, i strongly suggest that you upgrade from 0.5.1, not because of any major new features but a much easier and crash-less experience.
Last of all, thanks for bearing with the crashes. I know it must have been exasperating.
~

ver 0.5.1
* corrected a bug in leech mode

ver 0.5
Some more features:
* leech mode finally functional. In Leech mode, the app simply parses through the html file and presents the links and images as checkable items. Select the files to download and save it to disk. handy when you need to download 20-30 links(files) from a list of 50-60-100 (rather than right-click and save link 30 times).
* Multiple job support with drop target window. click on drop target window, and drop urls on it. then you can configure each url to have different crawl settings, that is you can crawl the first url to depth 1 in offline mode, while 2nd url to depth 2 in simple mode, and so on. By default each url takes the current main settings.
* notification window. notifies when all job(s) have completed.
* user can jump to next link(in case current link is unresponsive), to next dropped url, pause and restart crawling.
* UI improvements(hopefully!) :-)

ver 0.4.1
* corrected a bug in downloading external links.

ver 0.4
0.4 is a huge jump from 0.3. Almost everything has been spruced up, and some new features added, though Leech mode is still unimplemented.
changes:
* total rework on offline mode browsing. now links are correctly cross-linked.
* handles dynamic content correctly.
* tar file support fully functional. turned out tougher to implement than i thought initially, thanks to the tar protocol. the archive tool in konqueror is really simplistic and doesnt do the job right. My version does. :-)
* regular expression parsing to correctly parse html pages.can parse through almost 12000 links(in one page) in no time. :-)
* a proper file manager with drag-support.
* spruced up URL list view.
* quick set options available on the page
* UI improvements.

ver 0.3
* offline browser mode added. crawl through a site with this setting on, and the app modifies the links in the parsed files to point to local files if they exist on local disk.
* improved error reporting. errors encountered are reported in a separate window in real time.
* file types can be excluded(dont dowload these file types) or exclusive(only download these file types besides text/html)
* UI improvements in main window & config dialog.
* web archive support - not working completely. more complicated than i thought initially. right now, only creates a compressed tarball.
* leech mode - not implemented as yet.
* more code cleanup.

ver 0.2
* major code cleanup.
* ugly qt event loop hack replaced with elegant threaded model
* ugly crashes due to ugly qt event loop hack removed.
* minor UI improvements


Ratings & Comments

41 Comments

polrus

it's a pitty this project is somehow forgotten :/

pupil

v0.7 RPM for SLED 10: http://donnie.110mb.com/downloads.php?cat_id=2 GPG key in the front page of my website.

pupil

my webhost domain is temporary inaccessible, because some idiots use it for phising activity. they provided me with a temporary domain at http://donnie.911mb.com. if you have trouble downloading the rpm, just replace the 110mb.com with 911mb.com

overkill

Hi all, How can i do, when i want to download e.g. only jpeg images between size of 100kb and 500kb? Thanks.

frantek

hi, at the first sight krawlsite was what i was looking for ... but ... when i try to copy a site with e.g. picture index pages where each site contains a link to each other site krawlsite does not recongnize that this results in a nearly endless loop ... cheers frantek

wireframe01

You could try leech mode. That would show the links on the page. Then, select the picture links and select save. Hope this works for you. I should add a "visited" url list though. That should speed up things.

blurymind

made a package for VECTOR LINUX SoHo 5.01

gohanz

A SlackWare 10.2 Package with SlackBuild script is ready to download!! http://www.slacky.it/index.php?option=com_remository&Itemid=1&func=fileinfo&filecatid=382&parent=category

linux3114a

krawlsite-0.7-S10K35.i586.rpm at http://home.tiscali.be/raoul.linux/downloadSuSE10.0.htm ENJOY !!!

tux86

...really great!!! Well done.

Kribby

I've have always been looking for a website checker for KDE....something that can check copies of a webpage for updates. The only thing that I could find was KWebWatch...but that hasn't been updated in ages. Is there possibility of such a feature being included in KrawlSite....or better yet: a completely new independant program? :)

wireframe01

A content checker could be included. Nice idea.

cado

Krawlsite available in Debian Sid at http://pacotesdeb.codigolivre.org.br Requeriment: KDE 3.4

linux3114a

Due to a very unstable version , I remove the Suse Krawlsite from my server...Sorry waiting for for new http://home.tiscali.be/raoul.linux/download.htm

Flextron

This version crash everytime I download anything. Older versions worked fine -althought it crashed frome time to time. The output (it crashes at the first xlib error): flex@gardenia:~> krawlsite krawlsite: splitter width: 80 krawlsite: KrawlSitePart...checking to see if there's an active thread krawlsite: KrawlSitePart...mode from part:1 krawlsite: Krawler... m_url:http://www.gulic.org/static/diveintopython-5.4-es/toc/index.html krawlsite: Krawler... start krawlinghttp://www.gulic.org/static/diveintopython-5.4-es/toc/index.html krawlsite: Krawler... mode: 1 krawlsite: ERROR: : couldn't create slave : Unable to create io-slave: klauncher said: Error en loading 'kmailservice %u'. krawlsite: krawlsite: ERROR: : couldn't create slave : Unable to create io-slave: klauncher said: Error loading 'kmailservice %u'. krawlsite: krawlsite: ERROR: : couldn't create slave : Unable to create io-slave: klauncher said: Error loading 'kmailservice %u'. krawlsite: krawlsite: ERROR: : couldn't create slave : Unable to create io-slave: klauncher said: Error loading 'kmailservice %u'. krawlsite: Xlib: unexpected async reply (sequence 0x795b9)! Xlib: sequence lost (0xc547a > 0xbe03e) in reply type 0xdf! Xlib: sequence lost (0xc967f > 0xbe03e) in reply type 0x9f! Xlib: sequence lost (0xc547a > 0xbe03e) in reply type 0xdf! Xlib: sequence lost (0xc967f > 0xbe03e) in reply type 0x9f! KCrash: Application 'krawlsite' crashing... (Is this useful or you need the full debug output like amarok needs?)

wireframe01

did you apply the patch?

rcappuccio

Hi, whatever URL I put into the URL bar, it starts crawling and immediately stops saying "Malformed URL". How am I supposed to insert URLs? I tried: http://www.unibz.it, www.unibz.it... but nothing. As soon as I'll get your app working I will use it everyday, so thanks in advance! Bye

wireframe01

hi, http://www.unibz.it/ worked for me. no malformed url error. did you try opening it up from a browser or ping the site? perhaps you havent applied the patch? hope this helps, happy crawling :-)

ronacc

I dl,d the .06 rpm it gives me an undefined link error on startup, I tried compiling from source and make errors when it gets to libart_lpgl.la even when configured with extra libs and dirs because it looks for it in the wrong lib dir /usr/lib as opposed to usr/lib64 . my system is suse 9.2 x86_64 what do I need to do to get it to compile correctly?

wireframe01

i am not maintaining the rpm. as far as the source is concerned, i have not tried it on a x_64 arch. mebbe you can google to see if some other application has had same error, and the resolution for it..?

ronacc

thanks ,I think its the x86_64 instalation thats the problem I,ve run into this on a couple of other things and can usually get around it by compiling --with-extra-libs=xxx but only if I need only 1 extra lib ,I cant find how to specify multiple extra libs as are needed here. I just mentioned the rpm to note that I tried that route also.

linux3114a

krawlsite-0.6-s92k33rc.i586.rpm at http://home.tiscali.be/raoul.linux/download.htm

ladora

Why is the link broken on the next site? I can't download.... :-(

linux3114a

Sorry Because my private server stays on from 10:00 to 22:00 belgian hours

linux3114a

krawlsite-0.5.1-suse92.i586.rpm at http://home.tiscali.be/raoul.linux/download.htm my participation

Pling
0 Affiliates
Details
license
version 0.7
updated
added
downloads 24h 0
mediaviews 24h 0
pageviews 24h 2

Other Browser:

Akeni Help Desk Assistant for Enterprise IM
akeni
last update date: 20 years ago

Score 5.0

kubdeb - software for Kubuntu
thatdaftpunk
last update date: 15 years ago

Score 5.0

cms-bandits
infocreator
last update date: 19 years ago

Score 5.0

Tavia Web Browser (non-RPM)
Joedaism
last update date: 21 years ago

Score 5.0

K-CSS
cime3d
last update date: 18 years ago

Score 5.0

KMT
sarah03
last update date: 21 years ago

Score 5.0



System Tags