WebSeer is a tool to perform detailed feature extraction on web pages to assist in classification/survey tasks. What it is Not:It is not a search engine, though it was designed to plug into one. It doesn't actually have classification algorithms, but provides convenient wrappers around Weka classification API. What it Is:It's a collection of analysis packages that give very detailed statistics on many aspects of a web page, including HTML structure, style and script usage, text, positional and segmentation information. Library HistoryIt is primarily being used to conduct research for the PhD of Ryan Levering, which includes web surveys and document classification. However, it has been used to assist Information Retrieval classes, taking a lot of the overhead away from working with web pages. |
News:January 25 2008 - First Release Approaching: Very shortly the first official release of webseer will happen. This is in conjunction with the wrapping up of my PhD research. I'm currently working on finalizing the tools that will make it much easier to work with the libraries. |
Publications:
|
Thanks To:
|
YourKit is kindly supporting open source projects with its full-featured Java Profiler.
YourKit, LLC is creator of innovative and intelligent tools for profiling Java and .NET applications.
Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.