Knowledge Combinotronics


Knowledge Combinotronics
By this I mean a stream of different data feeds relevant to the user, each one manipulated by an appropriate algorithm to increase relevance and then offered for selection by the user. That is the user would select some combination of the presented data and that would then be used in the next phase. Really there is no magic going on but that user selections are amenable to an iterative process of selection and reselection in different or unfolding contexts.
There are many examples of how a context may be created depending on use scenario but one would be where  they are fed back into the data set for other users immediate guidance.
This depends on high throughput infrastructure, typically this has been expensive.
'different or unfolding contexts' this is key here.

posted by Kobrix Software at Kobrix Software, Official Blog - 3 weeks ago
NoSQL has picked up a lot of steam lately. HyperGraphDB being a NoSQL DB *par excellence*, we will be joining the upcomping conference organized by the 10gen, the maker of MongoDB: "NoSQL Live from Boston...
These are the comments I have made and the reply from Boris.
I gather them here for any further comment I may want to make.
See after the last comment.


3 comments:

semanticC said...
As its the 12th I guess the conference has taken place, links I will be following up. There are a few questions I have. 1. Can you point me towards any comparison of NeoDB and Hypergraphdb, do they cover the same ground? How do they differ? 2. The relationship between graph databases and 2.1. OWL, how would OWL be consumed, or would it? 2.2. more generally RDF, and then XML, after all there are XML databases that parse in the XML. How do they compare? I'm sure I have missed something(s), but what? 3. One of the problems I have encountered is in keeping various .properties files aligned. One approach is to use something like magic lenses such as the augeas implementation. But, at the same time, I have wanted to rewrite these properties out of their ANT context into a Maven POM context. A job for hypergraphdb? Ideas? 4. Moving on, I have noticed the fascinating post about using hypergraphdb to create a neural net. 4.1. Would you agree that what is happening here is in line with Rickard Öberg? http://www.qi4j.org/ for background and http://www.qi4j.org/qi4j/351.html where he discusses the relationship between algorithms and OOP. BTW, he also arrives at the need for atoms and mentions the same focus, the business case, that you emphasise in your background paper, Rapid Software Evolution. 4.2. I notice that Neo4J has an example of a spreading activation algorithm (token passing), http://wiki.github.com/tinkerpop/gremlin/pagerank - I expect this means that either db could also be used to implement Random Indexing - sparse matrices - as developed by P. Kanerva and M. Sahlgren Some of this may be touched on in the Disko project. Again, ideas? Sorry for such a long comment, but not sure how/if to email privately.
Kobrix Software said...
Hi semanticC, A good place to discuss HyperGraphDB would be the discussion forum: http://groups.google.com/group/hypergraphdb?hl=en This is a long list of topics raised indeed :) Let me try to cover them one by one, perhaps in separate responses: 1) Such comparison should ideally be done independently and I am not aware of any. For starters, HyperGraphDB has much more general data model than Neo. In fact, the name is maybe a bit misleading from a functionality perspective because now it's being labeled as "another graph database", which it is, but it is also OO database, a relational database (albeit nonsql) etc. In HyperGraphDB, edges point to an arbitrary number of things, including nodes and other edges Neo is a classical graph of nodes and directed edges between any two nodes. In addition, HGDB has a type system while Neo doesn't. So HGDB has in effect a dynamic schema that you can introspect, reason about and change. Besides the data models, the storage models are quite different: HyperGraphDB has a general two-layered architecture where a big part of the storage layout can be customized. Neo uses linked lists to store its graph and claims that this makes faster traversals (probably true) and that this is all you need to do with a graph, you don't need indices, pattern mining etc. (here, I disagree). HGDB relies heavily on a lot of indexing for more complicated graph-related queries & algorithms. In sum, HyperGraphDB has pretty much the most versatile data model I know of, and subsumes Neo and others easily. Weather that sort of generality comes at the expense of performance remains to be seen. As you've probably realized from the neural net post, HGDB gives you more representational choices so performance has to be measured more globally, at an application level, through a design that makes intelligent use of what HGDB has to offer. more on the others later....perhaps at the end I'll sum up my responses in a separate blog.
semanticC said...
Hi Boris, Thanks so much for your reply. It would be great if the other questions inspire a blog post. If anyone is interested the NoSQL conference is previewed and will be written up here http://radar.oreilly.com/2010/02/nosql-conference-coming-to-bos.html - and it is a good discussion. Boris contributes too! There are still many things I cannot get my head around. I can see the 'representational choices' the ability to define functions directly working on the data using the HGDB API. I expect this is a good thing in the way that, for example, annotations are better than XML, everything is in the place where it will be used, which facilitates concentrating on the task. But other benefits? Here I cannot see. Moving on again, I am reminded of the efforts of Henry Story to create a framework to import RDF, inspired by Active Record. I am very unclear about all of this. Did I read somewhere that there is a standardisation of the syntax for the import statements of RDF namespaces? Anyway, the idea would be to make the referenced ontology available in code, presumably it would already be in Sesame as the graph db backend? All of this seems relevant to HGDB. First you have mentioned the type system, so how to model the types? I had thought that OWL was a good way of both modelling and sharing those models. But if so, what of the other aspect of HGDB, its ability to deal with semi-structured data, how to fit the two together? I am thinking about Collective Entity Resolution as perhaps one sort of solution and simply in code, how they might interact, as another area. Moving up towards the goal of evolutionary software, I have long thought that it must be possible to describe software using OWL. I assumed that reasoning would take the place of a lot of code when there is a well constructed model. Of course that brings me back to what role reasoning in NoSQL. I know it is build in to AllegroGraph. As I say, many thoughts, but I don't really understand the ramifications of NoSQL at the moment. Perhaps I am missing the point altogether?

Unit Testing

Unit Testing

29/10/09 17:52



"Article on out of container testing. http://blog.code-adept.com "

from recent.xml after an edit

A Wiki, a Note Pad, a Feed

10/11/09 10:44


I think this is a very amazing tool, just because it encourages writing.
I think I will settle on capitals for begining of sentences, a various other norms of writing. This is so easy, do I need more?

But I thought, not unreasonably, that I would like to mirror these posts - is it a blog? - into my blogger blog, SemanticC. As I don't often post there and this machine is not always available.
I will get onto that, but first, point out -

a. Yes, it is true, this is not a blog, or a very sophisticated wiki, no subscriptions, or not easily, no various other bits.

b. As you will see, who needs the clutter?

c. this is only 1 - 3 meg memory footprint, compare to firefox - 340, artifactory - 259, nepomuk - 200, tomcat - 170, konquerer - 25, other nepomuk services 14 each and so on.

lua is suitable for embedding in phones as can be seen. I wonder how hard that is?

d. markdown, the styling syntax is dead simple and unobtrusive.

e. the whole interface is a joy of simplicity, once mysteries have been solved, such as the alias game, which is too much fun!

As a feed.

There is a feed to these pages and, I believe, the whole output is available too, as feedvalidator.org shows.

But there seems to be no way of echoing a followed blog from blogs I am reading into the main blog.
I imagine this is because one shouldn't quote without reading first, but still I'm a bit surprised. In my case, anyway, I just want to syndicate my own material. Perhaps I have missed something?
What I haven't missed is that when I copy it into a post it becomes, from something elegant, a whole mess to the point of being impossible to read.
Keeping things simple, legible and attractive in blogger is an effort.
What to do?

This Is How It Works

10/11/09 10:18


nanoki keeps a reference to the address that it is contacted from, whether internal or external.
an edit from a different internal machine yields a different alias.
an edit from the same machine, no matter the address is conjoint.biz or 192 ... is the same alias.
does nanoki just resolve it to the same internal address or is it relying on a cookie sent to the browser?
not too bothered to find out at the moment.
does my machine resolve all internal request to the same address, or series of addresses, which nanoki then receives?

what is important is that it is possible to to check my machine is available on certain posts. but gibson shieldsup is not quite enough on its own as I found I didn't understand what it meant when it reported a port closed. now I do know I guess it would be, but I also used http://www.websitepulse.com/help/tools.php?

Who Am I

10/11/09 10:07



So now I come in from conjoint.biz, I assume a different alias.
But how about www.conjoint.biz and my internal address, 192... ?
Anyway ...

So, who am I now, coming in from a different internal address, no cookies?

Internal Alias

10/11/09 10:03



This is me on the internal net, but not localhost, which, by the way, is not available.
That is another story.
linux -
1. many routing tables - 255 c.f. route and ip route
2. a host table
3. but why is it that tomcat can start in two instances and both available as 192...:port, localhost:port and conjoint.biz:port? Whereas this is not so of nanoki?

No wonder I thought is was the firewall!
Not sure at the moment what I need to look for to sort out. Presumably it is either in the host table, but I don't think there is any port number given in that, or it is the way nanoki binds to an address?

I think I'll move on for now.
#
Edge Cases
posted by Lab Zipzipace at Recent - 13 minutes ago
I can inspect a POM and see if each artifact - as they shall be known - corresponds to a rule. Consider:- net.sourceforge.nekohtml nekohtml...
#
Lab Zipzipace
posted by Rat Outzipair at Recent - 13 minutes ago
Eurydice by Sue Hubbard I am not afraid as I descend, step by step, leaving behind the salt winds blowing up the corrugated river The damp city streets their sodium glare of rush hour headlights pitted...
#
Log
posted by Nanoki at Recent - 13 minutes ago
#
Main
posted by Lab Zipzipace at Recent - 13 minutes ago
What an effort to get this up - too many hours. But worth it. Now to enjoy! Remember this is a wiki not a blog, so page postings I guess. Wonder how formated. alt.dev looks good and is intuitive. So this...
#
Markdown syntax reference
posted by Nanoki at Recent - 13 minutes ago
Paragraphs, Headers, Blockquotes A paragraph is simply one or more consecutive lines of text, separated by one or more blank lines. (A blank line is any line that looks like a blank line -- a line contai...
#
Maven Dependency Resolution
posted by Lab Zipzipace at Recent - 13 minutes ago
What have I been doing recently? Building a variety of projects. These include some - relatively - old projects. All of the CoffeeShop sample code in JUnit Recipes: Practical Methods for Programmer Testi...
#
Nanoki
posted by Nanoki at Recent - 13 minutes ago
*Nanoki* is a simple wiki engine implemented in Lua, allowing you to create, edit, and link web pages easily. [image: Nanoki] ------------------------------ Run Nanoki Start Nanoki from the command lin...
#
Rat Outzipair
posted by Lab Zipzipace at Recent - 13 minutes ago
This is the default high level user. Edited by Lab Zipzipace. One or other of these fabulous names must be lost if another one of the pair is to be created using some other name. But how to add other use...
#
Textpageissy
posted by Rat Outzipant at Recent - 13 minutes ago
a text page that is a test
#
Unit Testing
posted by Lab Zipzipace at Recent - 13 minutes ago
"Article on out of container testing. http://blog.code-adept.com "

Maven Dependency Resolution

Maven Dependency Resolution


05/11/09 17:42




What have I been doing recently?
Building a variety of projects.
These include some - relatively - old projects. All of the CoffeeShop sample code in JUnit Recipes: Practical Methods for Programmer Testing -
Here are the details --
By: J. B. Rainsberger
Publisher: Manning Publications
Pub. Date: July 15, 2004
Print ISBN-10: 1-932394-23-0
Print ISBN-13: 978-1-932394-23-8
Pages in Print Edition: 752


This is from around 2005.
I have also done same with zoe, also from around 2005.
Interesting.
CoffeeShop, with multiple projects, needed a lot of work reconciling dependencies.
In come my friends jarvana and so.
zoe [1.,2.,3.,4.] is different, and, in a way, more complex.
zoe is configured for maven1 with a project.xml file.
It is instructive making the conversion by hand. Certainly the mvn one:convert tool cannot cope.
There are guesses that can be made that must be regular mappings. So why not look at the xsd for each version, along with any notes, and make a map from this, say using generateDS?

There are some other things.
The way I see it is that we have.


  1. ant builds - these entail finding canonical versions of referenced artifacts across a system - as far as is
    possible.


    1.1. when artifacts have been resolved the .properties file needs to be referenced in the created pom


    1.2. the build.xml file needs to be parsed, from what I can see just for directory locations


    1.3. the build file remains intact, other tasks, such as code generation, will be invoked via ant


    1.3.1. some tasks are ant like and it is not reasonable to use maven for them. However some tasks imply something
    about structure that really needs to be resolved in the pom module dependency hierarchy.
    It is unclear how to make this distinction automatically, but a mechanism to create a module against
    which a specific complex ant task is run would go some way.


  2. maven1 builds - this would entail a mapping between the two as mentioned.


  3. creating canonical versions and canonical version management.


    3.1. creating canonical - there are different issues here. Finally it seems that there is no way of engineering
    an unknown version apart from doing a look up


    3.2. look ups have their own difficulties and interest. Sometimes a project cannot be found -


    3.2.1. the jar is not available on public repositories, e.g. Sun version jars


    3.2.2. the particular version is not available


    3.3. above seems to be an issue that could be solved with some very sophisticated search - more later


    3.4. canonical version management - augeas with the magic lens seems to be the way to control this and seems
    preferable to an XML db. It would only be build.xml that would be appropriate to an xml db anyway.


  4. issues to explore are semantic annotation, which would seem appropriate to build.xml etc. This maybe made
    automatic if it is possible to digest the nature of the ant tasks.






  1. http://alt.textdrive.com/dynam/
  2. http://alt.textdrive.com/nanoki/
  3. http://it1.evectors.it/itSites/zoe/
  4. http://www.zoeprofessional.com/taketour.html
Main


05/11/09 12:02




What an effort to get this up - too many hours. But worth it. Now to enjoy!
Remember this is a wiki not a blog, so page postings I guess. Wonder how formated.
alt.dev looks good and is intuitive.
So this is the main page.

Added the file “2004-03-02-12-54-00.jpg”. "On the quay"

(this doesn't show for some reason so omitted from this entry.)

Edge Cases

Edge Cases


31/10/09 18:51

I can inspect a POM and see if each artifact - as they shall be known - corresponds to a rule.
Consider:-
    <dependency>
        <groupId>net.sourceforge.nekohtml</groupId>
        <artifactId>nekohtml</artifactId>
        <version>1.9.12</version>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-full</artifactId>
        <version>1.1-rc1</version>
    </dependency>
Issues:-


  1. Is it possible to deduce the information in above from a jar file?


  2. Where it is possible to find - starting from the most right part of the jar file name after the .jar suffix - the version and the artifactId, is it possible to find the groupId?


  3. is a sub to 1. and the answer to 1. is no completely. The first jar will be nekohtml-1.9.12 and the second spring-full-1.1-rc1, but in neither case is it possible to know the groupId.



Look at these examples:-

The simple case --
<dependency>
  <groupId>httpunit</groupId>
  <artifactId>httpunit</artifactId>
  <version>1.6.2</version>
</dependency>

and two complex cases -

<dependency>
    <groupId>mockobjects</groupId>
    <artifactId>mockobjects-jdk1.4-j2ee1.3</artifactId>
    <version>0.09</version>
</dependency>


<dependency>
  <groupId>jtidy</groupId>
  <artifactId>jtidy</artifactId>
  <version>4aug2000r7-dev</version>
</dependency>

The simple case is just httpunit-1.6.2,jar, and that is easy.

The first complex case is -
mockobjects-jdk1.4-j2ee1.3-0.09.jar

  --- Notice there are several hyphens in the artifactId, as the next case shows, it is
     not possible to know which one denotes the boundary between version and artifactId. ---

The second complex case is -
jtidy-4aug2000r7-dev.jar

  --- Here it is possible to surmise from the meaning that 4aug2000r7-dev belongs to the version,
     it would be difficult to make a reliable rule for this, as the previous case shows. ---

I have also seen in private repositories an underscore used in place of a hyphen and artifacts anonymised when placed into the repository from their source, i.e. this information striped out of their name, only to be renamed differently where they are consumed.
 --- This may seem particularly strange, but when you consider that the artifact is playing a role in a
     different project with its own release versions and naming, it makes some sort of sense. ---

I don't think it is possible to order artifacts without consulting an online service such as jarvana.
This would have benefits in that other information can be gleaned at the same time, such as dependencies and, of course, the missing groupId.
However, it can be that a needed dependency is incorrectly refered to. How does this happen?

This was needed.

<groupId>org.apache</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>2.5.1-final-20040804</version>

This is what was found.

<groupId>poi</groupId>
<artifactId>poi-scratchpad-2.5.1-final</artifactId>
<version>20040804</version>

I coped with this through renaming, and I assume that the dependency is to be found in some prior POM downloaded with another artifact? I should look into this, but either was something has to be re-edited.
Actually I think I made the wrong choice, the thing to do would have been to look into the offending POM dependency and correct it, this way the correct artifact could be downloaded to a correct position.

So, in the end, jarvana to the rescue and my ant/maven1 ---> maven2 tool has a bit of a way to go still.


Rat Outzipair


Rat Outzipair


29/10/09 18:22

This is the default high level user.
Edited by Lab Zipzipace.
One or other of these fabulous names must be lost if another one of the pair is to be created using some other name.
But how to add other users?
This is, perhaps, the most surprising, but also the most lua aspect of nanoki. Users are created according to IP!

How's it done?

I wonder what that name generation algorithm is, maybe just a text file, will check at some point.

Lab Zipzipace


29/10/09 11:58

Eurydice by Sue Hubbard

I am not afraid as I descend,
step by step, leaving behind
the salt winds blowing up the
corrugated river

The damp city streets
their sodium glare of
rush hour headlights
pitted with pearls of rain
for my eyes still reflect the half remembered moon

Already your face receads beneath the station clock
a damp smudge among the shadows
mirrored in the trains wet glass.

Will you forget me?
Steal tracks lead you out past crains and crematoria
boat yards and bike sheds

ruby shards of roman glass
and wolfbone mummified in mud.
These rows of curtains windows like
eyelids heavy with sleep to the citys green edge.

Now I stop
my ears with wax
hold fast to the memory of the song you once whispered in my ear
it's echoes tangle like briars in my thick hair

You turned to look
second fly past like birds
my hands grow cold
i am ice and cloud.

This path unravels deep in hidden rooms
filled with dust and sour night breath
the lost city is sleeping

Above the hurt sky is weeping,
soaked nightingales have ceased to sing.
Dusk has come early. I am drowning in blue.

I dream of a green garden
where the sun feathers my face
like your once eager kiss

soon, soon
i will climb from the blackened earth
into the diffident light.

Nanoki

29/10/09 01:19




Nanoki is a simple wiki engine implemented in Lua, allowing you to create, edit, and link web pages easily.

Nanoki



Run Nanoki


Start Nanoki from the command line:

cd Nanoki
lua Nanoki.lua . localhost 1080

The above command will start Nanoki on your local host at port 1080, using the local directory for storage:


Command synopsis:

Nanoki [location] [address] [port] [not|forwarded] [not|secure]

location tells Nanoki where to store its data.

address indicates which network address to bind the Nanoki server to.

port indicates what port number to use.

forwarded indicates whether x-forwarded-for should be trusted.

secure indicates whether https should be used.

Run



Create a page


To create a new page, type its name in your browser address bar:

New

If the page doesn't exists yet, Nanoki will redirect you to the page editor:

New Editor



Edit a page


To edit a page, click on its title. This will take you to the page editor:

Editor

The editor uses Markdown syntax to describe the page content.

To save your text, press Preview and then Save.



Upload a file


From the editor, you can upload files to Nanoki:

File

Each page can have its own files. You can refer to those files like so:

![Run][1]
[1]: nanoki/file/run.png

File link synopsis:

[page]/file/[name]

page is the name of the page under which the file is located.

name is the file name.



Control panel

From the editor, you can access the control panel to rename or delete a page:

Control



Revision


From the editor, you can access a page revision history by clicking on its title:

Revision

Clicking a revision number will display the page content as it was then.



Revision differences


From the revision page, you can access the revision differences by clicking on its title:

Revision differences



Related pages


Each page keep tracks of which other pages links to it:

Related



Breadcrumb navigation


Each page reflects its location using a breadcrumb trail:

Breadcrumb



Index navigation


Nanoki provides a table of content, indexed by page title:

Index



Date navigation


The date navigation indexes pages by their publication date:

Date



Recent changes


The recent changes page lists what has changed in Nanoki recently:

Recent



Search


The search allows to locate pages by their title:

Search



XML feed


Aside from the editor, most pages provide an XML feed:

Feed

Feed view



System page


The system page provides basic information about Nanoki itself:

System
top