Frolov A., Húsek Dušan, Muraviev P. Igor, Polyakov P. Y.
Boolean Factor Analysis by Attractor Neural Network
In: IEEE Transactions on Neural Networks, Volume: 18, No: 3, IEEE, 2007, pp. 698-707.
A common problem encountered in disciplines such as statistics, data analysis, signal processing, textual data representation, and neural network research, is finding a suitable representation of the data in the lower dimension space. One of the principles used for this reason is a factor analysis. In this paper, we show that Hebbian learning and a Hopfield-like neural network could be used for a natural procedure for Boolean factor analysis. To ensure efficient Boolean factor analysis, we propose our original modification not only of Hopfield network architecture but also its dynamics as well. In this paper, we describe neural network implementation of the Boolean factor analysis method. We show the advantages of our Hopfield-like network modification step by step on artificially generated data. At the end, we show the efficiency of the method on artificial data containing a known list of factors. Our approach has the advantage of being able to analyze very large data sets while preserving the nature of the data.
Galamboš Leo, Lánský Jan, Žemlička M., Chernik K.
Compression of Semistructured Documents
In: International Journal of Information Technology, Volume: 4, No: 1, Elsevier, 2007, pp. 11-17.
EGOTHOR is a search engine that indexes the Web
and allows us to search the Web documents. Its hit list contains URL
and title of the hits, and also some snippet which tries to shortly
show a match. The snippet can be almost always assembled by an
algorithm that has a full knowledge of the original document (mostly
HTML page). It implies that the search engine is required to store
the full text of the documents as a part of the index.
Such a requirement leads us to pick up an appropriate compression
algorithm which would reduce the space demand. One of the solutions
could be some use of common compression methods, for instance
gzip or bzip2, but it might be preferable to develop a new method
which would take advantage of the document structure, or rather, the
textual character of the documents.
There already exist special compression text algorithms and methods
for a compression of XML documents. The aim of this paper is
an integration of the two approaches to achieve an optimal level of
the compression ratio.
In: SOFSEM 2007, LNCS 4362, Springer, Berlin, 2007.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
Complexity of data with respect to a particular class of neural networks is studied. Data complexity is measured by the magnitude
of a certain norm of either the regression function induced by a probability measure describing the data or a function interpolating a sample
ofinput/output pairs of training data chosen with respect to this probability. The norm is tailored to a type of computational units in the
network class. It is shown that for data for which this norm is small,
convergence of infima of error functionals over networks with increasing number of hidden units to the global minima is relatively fast. Thus
for such data, networks with a reasonable model complexity can achieve
good performance during learning. For perceptron networks, the relationship between data complexity, data dimensionality and smoothness
is investigated.
In: SOFSEM 2007, LNCS 4362, Springer, Berlin, 2007.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
We design a formal model of an amorphous computer suit-
able for theoretical investigation of its computational properties. The
model consists of a ¯nite set of nodes created by RAMs with restricted
memory, which are dispersed uniformly in a given area. Within a limited
radius the nodes can communicate with their neighbors via a single-
channel radio. The assumptions on low-level communication abilities are
among the weakest possible: the nodes work asynchronously, there is no
broadcasting collision detection mechanism and no network addresses.
For the underlying network we design a randomized communication pro-
tocol and analyze its e±ciency. The subsequent experiments and combi-
natorial analysis of random networks show that the expectations under
which our protocol was designed are met by the vast majority of the
instances of our amorphous computer model.
In: Transactions on Database Systems (TODS), Volume: 32, No: 4, ACM, 2007, pp. 1-47.
In multimedia systems we usually need to retrieve database (DB) objects based on their similarity
to a query object, while the similarity assessment is provided by a measure which defines a
(dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure
is required to be a metric, where the triangle inequality is utilized to speed up the search
for relevant objects by use of metric access methods (MAMs), for example, the M-tree. A recent
research has shown, however, that nonmetric measures are more appropriate for similarity modeling
due to their robustness and ease to model a made-to-measure similarity. Unfortunately, due to
the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From
another point of view, some sophisticated similarity measures could be available in a black-box
nonanalytic form (e.g., as an algorithm or even a hardware device), where no information about
their topological properties is provided, so we have to consider them as nonmetric measures as well.
From yet another point of view, the concept of similarity measuring itself is inherently imprecise
and we often prefer fast but approximate retrieval over an exact but slower one.
To date, the mentioned aspects of similarity retrieval have been solved separately, that is, exact
versus approximate search or metric versus nonmetric search. In this article we introduce a similarity
retrieval framework which incorporates both of the aspects into a single unified model. Based
on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we
are able to change the “amount” of triangle inequality, and so obtain an approximate or full metric
which can be used for MAM-based retrieval. Due to the varying “amount” of triangle inequality,
the measure is modified in a way suitable for either an exact but slower or an approximate but
faster retrieval. Additionally, we introduce the TriGen algorithm aimed at constructing the desired
modification of any black-box distance automatically, using just a small fraction of the database.