Dr. Andrius Kulikauskas
"Pamąstymai apie matricų kombinatorikos taikymą automatų rūšiavimui ir NP ir P automatų klasių tyrimui"


Pasidalinsiu mintimis apie Chomskio automatų hierarchija. Siūlysiu ją taikyti ne tik automatams, bet ir jų operacijoms. Manau, hierarchijos esmė glūdi jau operacijų lygmenyje. Parodysiu matricų kombinatorika pagrįstą bendrą automato konstrukciją generuojančią visų rūšių automatus. Jos tyrimui galėtumėme naudoti mano daktarinę disertaciją "Symmetric Functions of the Eigenvalues of a Martix". Įsivaizduoju, galėtumėme tirti, ar NP ir P automatų klasės sutampa. (Šis uždavinys laikomas vienu iš septynių mūsų šimtmečio matematikos uždavinių, jo sprendėjams skirs milijono dolerių premiją.) Tad būtų įdomu ir naudinga pasišnekėti su specialistais. Taip pat ir apie mano įsteigtą "Minčių sodą".

Pridedu laišką anglų kalba su savo mintimis apie tą bendrą automatą.

Būtų smagu pabendrauti, ar nors susipažinti!

Andrius

Andrius Kulikauskas
Minčių sodas
http://www.ms.lt
ms@ms.lt
+370 5264 5950
Vilnius

---------------------------------------------------------------

Dr. Andrius Kulikauskas
"Application of matrix combinatorics for sorting of automatons and investigation of classes of NP and P automatons"


It's exciting when any discipline, including computer science, introduces concepts and observations that touch upon our intuitions about human life. In that sense, we are modeling our experience.

In developing computer science, it has been observed that there are, intuitively, several different ways to think about computation, and they can be considered in classes of different capability.

A simple computer is one such that, given state A, you find fixed instructions that tell you what states B you can go to. For example, a state may be a valid position on a tic-tac-toe board. You can describe a "finite state automata" that would tell you, starting from the empty board, how to make valid moves to valid boards, and ultimately get to all the final boards. You can also describe a "finite state automata" that would play well for one side, or for another side, or in whatever particular way, so long as it is fixed. If there is one or no choice at the state, then it is called deterministic, and if you allow more than one choice, it is called nondeterministic.

It was noted that we can study the "languages" that consist of all possible paths through the machine, we can talk about the paths "accepted" by the machine. These are called regular languages. All tic-tac-toe games is a regular language, but so is, I think, all chess games. And even, all chess games where one or both players play without mistakes. Simply because there are only so many chess positions. You can return to the same chess position, but these would simply be loops, cycles.

It turns out, that we can imagine computers that are smarter than that! For example, consider the set of parentheses expressions, consisting of left and right parentheses that are balanced, such as: (()()) It can be shown that this is not a regular language! Basically, in order to "accept" such a string, you need memory, you need to remember to close each parentheses that you open. And there is a different kind of computer that can do this, it is more powerful. One formulation for it is a "push-down automata". I don't understand it too well, but you have a "stack" of memory, like a stack of cafeteria trays, that you build by laying down on top of each other, and then they are in "memory" so that you can go back to them, as you take them off. So, for example, (()()) might be generated by saying that ( means put down a tray, and ) means pick it back up. So (()()) would be: put down, put down, pick up, put down, pick up, pick up. And in this spirit, each tray can be a different color and come with instructions that tell you what you are allowed to do next, for example, pick up or put down trays if they happen to be a certain color. The languages are given by the sequence of trays you go through, and called context-free languages and Chomsky noted that there is something similar going on in human language, the way we build sentences and have to keep track of what we're up to syntactically. This is also important for building parsers, for parsing commands. But it's also important in everyday life: I can "make a commitment", which opens a left parentheses that waits for a right parentheses.

You have to go down that stack in order. (Just a side note: Jesus of Nazareth had a cryptic saying, "The first will be last, and the last will be first", and that's pretty much what's going on here: the first tray put down will be the last tray taken off, and the last one put down will be the first one taken off, unless you put even more down.)

Well, there is a more powerful computer, a "linear bounded automata", which we may say lets you move your position on that stack. We may think of this as a magic tray that you can move up and down the stack. What this does, is it allows you to reference how big is your stack. You can have that magic tray go into a "gliding" state where it is effectively bringing with it another tray of a particular color, perhaps leaving it at the bottom. In this way, it can effectively "count" the number of trays in the stack, and effectively void certain numbers of trays by entering an "unhappy" state. This means that you can generate languages such as "a to the 2 to the i": {aa, aaaa, aaaaaaaa, aaaaaaaaaaaaaaaa, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,...} which is growing exponentially. A normal pushdown automata couldn't do that because it observes the "pumping lemma". Note that all of these computers have only finitely many states and instructions. But they may typically accept strings of arbitrary length. The pumping lemma observes that, when a simple computer, like a finite state automata, or a pushdown automata, is found to accept a sufficiently long string, then we can show there must be some pattern in that string that can be repeated, like a do loop, so that all such are likewise accepted. But in the case of the linear bounded automata, it can manage to break any linear pattern, any rhythm, by going into its own gliding modes, which lets it do its own internal counting. The languages it accepts are called "context sensitive languages".

Finally, the most powerful computer known, which has all manner of equivalent definitions, is called a "Turing machine". It can model absolutely every computation that we can describe. And there are many that it models that we can't even describe for practical purposes. It can go on calculating and we can never be sure if it will actually halt or not, which is to say, we may never be sure if it actually accepts a word, or not. The heart of the power of the Turing Machine is that it can destroy evidence. Imagine being allowed to delete part of the history of the trays that you've laid down. That pretty much makes it effectively impossible to keep track of all of the ways things may have been developing. It means that a very simple situation may come from a very complicated situation. Whereas with the "linear bounded automata" you always know that things are growing in complexity, time flows in one way, so to speak, and you know when you can rule things out, when they're surely not going to happen, especially for short strings. With Turing machines it's possible that you may never know for sure.

Mathematicians approach this by talking about a hierarchy of classes of languages. But, as somebody modeling life, I know that's a locally evident quality, not only globally evident. To the extent that this is helpful to model life, we should be able to say that "here the machine is acting with less sophistication, and here with more sophistication", or "this is a less sophisticated instruction, this is a more sophisticated instruction". Mathematicians don't do this, they see that as a value judgement. So that is a major goal of mine here. (For example, the "perfect chess player" may be coded with a finite state automata, but a lot more intelligence may be revealed as a pushdown automata, or linear bounded automata, or Turing machine. And I think that distinction in intelligence can be observed instruction by instruction.)

In order to do that, we need a shared framework for the different kinds of automata. Now, I haven't been exact or classical in terms of defining the different kinds of machines here. But the upshot is that, I don't have to be, because for each class there are many ways to define the associated machines, and we can choose the definitions that work best for us.

I look for the "most natural" framework. Metaphysically, in my experience, the most basic and general objects are matrices. A matrix A(ij) defines a relationship from each state i to each state j. This is metaphysically very basic, like our mind moving from one thought to another. We don't even need to define numbers to work with matrices, they can simply be understood to relate states.

Let us define the states with a matrix X(ij) where the states are given by X(ii) and when i does not equal j, we have zeroes. Then we can use the following operations to model the different kinds of instructions:

Matrix multiplication X -> AX gives what finite automata do: generating walks along the edges of A. Two-sided multiplication X -> AXA yields what the pushdown automata can do. The replacement XA -> AX turns out to leave the terms from A unaffected, and simply "shift" the cursor recorded by X. And then, we can have an operation that lets us remove a cycle of terms from A.

My instinct is that these building blocks can be used to define equivalents for all of the machines that we have discussed above.

This cycle removal operation is very interesting and I suspect related to an outstanding math problem which asks if there are problems that can be solved using a polynomial amount of resources (such as time) by a nondeterministic machine (many calculation branchings allowed), but not any deterministic one (no branchings allowed in the calculation). Basically, allowing a calculation to branch is like guessing, and is in a sense a multiplication of the calculational power, a sort of exponentiation. But if that is only internal, and the external bounds apply, its not clear if that is any real help or not.

A sample problem is to take a graph (perhaps coded by the edges of a matrix) and ask, does it have any cycle of edges that go through all the nodes? If you allow for guessing, branching, then you can design an efficient algorithm that just quickly branches through all sets of edges, and then in each case it does not take a lot of resources to check if it works. (Maybe n-squared?) Whereas if you don't allow for the branching, then nobody has found any efficient algorithm, it seems that its exponential.

A down-to-earth way to think about this is, if you add another node, say to the graph, so that the size goes from N to N+1, then how does that affect your computation? How does it compare with all that you've already done? If it's the same order of magnitude, or bigger, then it's exponential growth! That seems to be the case for analyzing those cycles. If it's a smaller order of magnitude, that is more like polynomial growth. For example, if you're adding the numbers from 1 to N, then that's roughly N-squared, which is significantly bigger than N+1, especially when N gets large.

The funky thing about the universal framework that I've set up is that it seems to generate all languages. You can specialize it by setting particular values for A and X. But, in a sense, by the basic operations, it is representing (and tracking!) every single possible machine. Furthermore, with the exception of the "Turing machine operations", it only grows in complexity. Strings are "accepted" simply by being generated. So we have a very clear way of seeing the linear boundedness of the resources. It is simply the number of operations compared with the length of the string.

Now my hunch is that each time that we remove a cycle, we are adding to the resources as if by a factor of the length of the cycle. And, in defining the operation, we don't do it for particular cycles, the cycle may be of any length, so we must assume that the cycle is as long as it might be. All of the operations are quite general! which is odd but powerful. In this set up, we can select the particular machine at the very end, but for now we are dealing with all of them at once.

If we look at the kind of problems that are considered "tough", they may be all thought to be looking for structures that lack "sub-cycles". They are looking for machines that "cannot be pumped", that have no regular pattern generator. So, for example, if in a graph with N nodes, we are looking for a cycle through all the nodes, my point is that the cycle, by definition, doesn't have any sub-cycles.

Well, in our set up, these kinds of problems are very easy to set-up. We simply code the matrix A accordingly, putting 0 for nonexisting edges. And we can do this at the end! The question is: Can we use our basic operations to generate a language that would answer our question? The answer is Yes. We can generate a language where the objects of length N are precisely those, if any, which do not have any cycles. But the only way we can restrict ourselves is if we use our cycle eliminator! And we have to do this N-1 times if we want to eliminate any subcycles. And for this particular algorithm it seems like we are using exponential resources, something like N factorial, it seems.

So this says that we have found one bad algorithm. But does that say anything about the good ones?

It might! That is because our set-up, although powerful enough to capture the different kinds of automata, is extremely restrictive. So we might be able to show that, in this set-up, there really isn't any other way to get the language we want. The only tool that we have is the cycle eliminator.

We may be able to show that, in our framework, for the resources that we have defined, there is basically no simpler approach to the problem. This might involve showing that, any restatement of the problem is actually a transformation of this statement. As a transformation, it may have essentially the same eigenvalues. And having the same eigenvalues, there may be no way of getting it to simplify.

We may also be able to show that there is a transformation from this framework into a more classical one with the traditional definitions for time resources.

It's interesting what this might mean metaphysically, if anything. But there seems to be a connection between the exponential power of our ability to make guesses (allow for several options to travel down) and the Turing machine's exponential power to destroy evidence (forcing us to travel down several options to track what is going on).

Of course, very speculative, but I think, imaginative in a useful way, at least for me. I invite such speculation on subjects that matter to us.

Peace,
Andrius

Andrius Kulikauskas
Minciu Sodas
http://www.ms.lt
ms@ms.lt
+370 52645950
Vilnius, Lithuania