Abdelsalam Almarimi, Pokorný Jaroslav
Schema Management for Data Integration: A Short Survey
In: Acta Polytechnica, Volume: 45, No: 1, Czech Technical University in Prague, Prague, 2005, pp. 24-27.
Schema management is a basic problem in many database application domains such as data integration systems. Users need to access and manipulate data from several databases. In this context, in order to integrate data from distributed heterogeneous database sources, data integration systems demand the resolution of several issues that arise in managing schemas. In this paper, we present a brief survey of the problem of schema matching which is used for solving problems of schema integration processing. Moreover, we propose a technique for integrating and querying distributed heterogeneous XML schemas.
Abdelsalam Almarimi, Pokorný Jaroslav
A Mediation Layer for Heterogenous XML Schemas
In: International Journal of Web Information Systems, Volume: 1, No: 1, Troubador Publishing LTD, 2005, pp. 25-32.
Presented at: iiWAS2004 Information Integration and Web Based Applications & Services, 27-29.09.2004, Jakarta,
Indonesia.
This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed as a tool for XML data integration system. A global XML schema is specified by the designer to provide a homogeneous view over heterogeneous XML data. An XML mediation layer is introduced to manage: (1) establishing appropriate mappings between the global schema and the schemas of the sources; (2) querying XML data sources in terms of the global schema. The XML data sources are described by XML Schema language. The former task is performed through a semi-automatic process that generates local and global paths. A tree structure for each XML schema is constructed and represented by a simple form. This is in turn used for assigning indices manually to match local paths to corresponding global paths. By gathering all paths with the same indices, the equivalent local and global paths are grouped automatically, and an XML Metadata Document is constructed. An XML Query Translator for the latter task is described to translate a global user query into local queries by using the mappings that are defined in the XML Metadata Document.
Batko Michal, Skopal Tomáš, Lokoč Jakub
New Dynamic Construction Techniques for M-tree
In: Journal of Discrete Algorithms, Elsevier, Amsterdam, The Netherlands, 2008.
(in_print)
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
Scalability Comparison of Peer-to-Peer Similarity Search Structures
In: Future Generation Computer Systems, Volume: 24, No: 8, Elsevier, Amsterdam, The Netherlands, 2008, pp. 834-848.
Batko Michal, Skopal Tomáš, Lokoč Jakub
New Dynamic Construction Techniques for M-tree
In: Journal of Discrete Algorithms, Elsevier, Amsterdam, The Netherlands, 2008.
(in_print)
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
Scalability Comparison of Peer-to-Peer Similarity Search Structures
In: Future Generation Computer Systems, Volume: 24, No: 8, Elsevier, Amsterdam, The Netherlands, 2008, pp. 834-848.
Daniel Milan
Generalization of the Classic Combination Rules to DSm Hyper-Power Sets
In: Information & Security, Volume: 20, 2006, pp. 50-64.
In this article, the author generalizes Dempster`s rule, Yager`s rule,
and Dubois-Prade`s rule for belief functions combination in order to be applicable
to hyper-power sets according to the Dezert-Smarandache (DSm) Theory. A
comparison of the rules with the DSm rule of combination is further presented.
Daniel Milan
Comments on Josang's Normal Coarsening and Consensus Operator
In: IJICIC, Volume: 4, No: 5, 2008, pp. 1079-1088.
Definitions of two different ways of coarsening of basic belief assignments to
opinions the simple coarsening and the normal coarsening are recalled in this contribution.
A relation of results of combination of the normal opinions using the consensus
operator to belief functions on an original n-element frame of discernment is examined.
A questionable meaning of the normal coarsening is discussed.
Falchi Fabrizio, Gennaro Claudio, Zezula Pavel
Nearest neighbor search in metric spaces through Content-Addressable Networks
In: Information Processing and Management, Volume: 44, No: 1, Elsevier, 2008, pp. 411-429.
Frolov A., Húsek Dušan, Muraviev P. Igor, Polyakov P. Y.
Boolean Factor Analysis by Attractor Neural Network
In: IEEE Transactions on Neural Networks, Volume: 18, No: 3, IEEE, 2007, pp. 698-707.
A common problem encountered in disciplines such as statistics, data analysis, signal processing, textual data representation, and neural network research, is finding a suitable representation of the data in the lower dimension space. One of the principles used for this reason is a factor analysis. In this paper, we show that Hebbian learning and a Hopfield-like neural network could be used for a natural procedure for Boolean factor analysis. To ensure efficient Boolean factor analysis, we propose our original modification not only of Hopfield network architecture but also its dynamics as well. In this paper, we describe neural network implementation of the Boolean factor analysis method. We show the advantages of our Hopfield-like network modification step by step on artificially generated data. At the end, we show the efficiency of the method on artificial data containing a known list of factors. Our approach has the advantage of being able to analyze very large data sets while preserving the nature of the data.
Galamboš Leo
Dynamic Inverted Index Maintenance
In: International Journal of Computer Science, Volume: 1, No: 2, 2006, pp. 157-162.
Galamboš Leo, Lánský Jan, Chernik K.
Compression of Semistructured Documents
In: International Enformatika Conference IEC 2006, Enformatika, Transactions on Engieering, Computing and Technology, 2006, pp. 222-227.
Galamboš Leo, Lánský Jan, Žemlička M., Chernik K.
Compression of Semistructured Documents
In: International Journal of Information Technology, Volume: 4, No: 1, Elsevier, 2007, pp. 11-17.
EGOTHOR is a search engine that indexes the Web
and allows us to search the Web documents. Its hit list contains URL
and title of the hits, and also some snippet which tries to shortly
show a match. The snippet can be almost always assembled by an
algorithm that has a full knowledge of the original document (mostly
HTML page). It implies that the search engine is required to store
the full text of the documents as a part of the index.
Such a requirement leads us to pick up an appropriate compression
algorithm which would reduce the space demand. One of the solutions
could be some use of common compression methods, for instance
gzip or bzip2, but it might be preferable to develop a new method
which would take advantage of the document structure, or rather, the
textual character of the documents.
There already exist special compression text algorithms and methods
for a compression of XML documents. The aim of this paper is
an integration of the two approaches to achieve an optimal level of
the compression ratio.
Hájek Petr
Making fuzzy description logic more general
In: Fuzzy Sets and Systems, Volume: 154, 2005, pp. 1-15.
A version of fuzzy description logic based on the basic (continuous t-norm based) fuzzy predicate logic BL is presented. Problems of satisfiability, validity and subsumption of concepts are discussed and reduced to problems of fuzzy propositional logic known to be decidable for any continuous t-norm. For Lukasiewicz t-norm some stronger results are obtained.
Hájek Petr, Mesiar R.
On copulas, quasicopulas and fuzzy logic
In: Soft Computing, Volume: 12, No: 12, Springer, 2008, pp. 1239-1243.
Hliněná Dana, Vojtáš Peter
A Note on an Example of Use of Fuzzy Preference Structures
In: Acta Universitatis Matthiae Belii, Volume: 14, 2008, pp. 29-39.
Kůrková Věra, Sanguineti Marcello
Learning with generalization capability by kernel methods of bounded complexity
In: Journal of Complexity, Volume: 21, Elsevier, 2005, pp. 350-367.
Learning from data with generalization capability is studied in the framework of minimization of regularized empirical error functionals over nested families of hypothesis sets with increasing model complexity. ForTikhonov`s regularization with kernel stabilizers, minimization over restricted hypothesis sets containing for a fixed integer n only linear combinations of all n-tuples of kernel functions is investigated. Upper bounds are derived on the rate of convergence of suboptimal solutions from such sets to the optimal solution achievable without restrictions on model complexity.The bounds are of the form 1/sqrt(n) multiplied by a term that depends on the size of the sample of empirical data, the vector of output data, the Gram matrix of the kernel with respect to the input data, and the regularization parameter.
Kůrková Věra
Inverse Problem in Data Analysis
In: Przeglad elektrotechniczny, Volume: 82, No: 4, 2006, pp. 41-47.
It is shown that learning from data modelled as minimaization of error functionals can be reformulated in terms of inverse problems. This reformulation allows to characterize optimal input-output functions of networks with kernel units.
Kůrková Věra
Estimates of Data Complexity in Neural-Network Learning
In: SOFSEM 2007, LNCS 4362, Springer, Berlin, 2007.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
Complexity of data with respect to a particular class of neural networks is studied. Data complexity is measured by the magnitude
of a certain norm of either the regression function induced by a probability measure describing the data or a function interpolating a sample
ofinput/output pairs of training data chosen with respect to this probability. The norm is tailored to a type of computational units in the
network class. It is shown that for data for which this norm is small,
convergence of infima of error functionals over networks with increasing number of hidden units to the global minima is relatively fast. Thus
for such data, networks with a reasonable model complexity can achieve
good performance during learning. For perceptron networks, the relationship between data complexity, data dimensionality and smoothness
is investigated.
Nedbal Radim
Relational Databases with Ordered Relations
In: Logic Journal of the IGPL, Volume: 13, 2005, pp. 587-597.
Presented at: ERCIM 2004, 12.-17.07.2004, Vienna,
Austria.
The paper deals with expressing preferences in the framework of the relational data model. Preferences have usually a form of a partial ordering. Therefore the question arises how to provide the relational data model with such an ordering.
Neruda Roman
Implementation of Ontology Mapping for Computational Agents
In: WSEAS Transactions on Computers Research, Volume: 1, 2006, pp. 58-63.
This paper describes ontological description of computational agents, their properties and abilities. The
goal of the work is to allow for autonomous behavior and semi-automatic composition of agents within a multiagent
system. The system has to be create foundation for the interchangeability of computational components, and
emergence of new models. This paper focuses on ways of representing agents and systems in standard formalisms,
such as description logics, OWL, and Jade.
Neruda Roman, Beuster Gerd
Toward Dynamic Generation of Computational Agents by Means of Logical Descriptions
In: International Transactions on Systems Science and Applications, Volume: 4, No: 1, 2008.
A formalism for the logical description of computational
agents and multi-agent systems is given. It is explained
how it such a formal description can be used to configure
and reason about multi-agent systems realizing computational
intelligence models. A usage within a real software system
Bang 3 is demonstrated. A way to extend the system toward
dynamic environments with migrating agents is discussed.
Petrů Lukáš, Wiedermann Jiří
A Model of an Amorphous Computer and its Communication Protocol
In: SOFSEM 2007, LNCS 4362, Springer, Berlin, 2007.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
We design a formal model of an amorphous computer suit-
able for theoretical investigation of its computational properties. The
model consists of a ¯nite set of nodes created by RAMs with restricted
memory, which are dispersed uniformly in a given area. Within a limited
radius the nodes can communicate with their neighbors via a single-
channel radio. The assumptions on low-level communication abilities are
among the weakest possible: the nodes work asynchronously, there is no
broadcasting collision detection mechanism and no network addresses.
For the underlying network we design a randomized communication pro-
tocol and analyze its e±ciency. The subsequent experiments and combi-
natorial analysis of random networks show that the expectations under
which our protocol was designed are met by the vast majority of the
instances of our amorphous computer model.
Petrů Lukáš, Wiedermann Jiří
On the universal computing power of amorphous computing systems
In: Theory of Computing Systems, Springer, 2008.
(in_print)
Amorphous computing differs from the classical ideas about
computations almost in every aspect. The architecture of amorphous
computers is random, since they consist of a plethora of identical computational units spread randomly over a given area. Within a limited radius
the units can communicate wirelessly with their neighbors via a single-channel radio.We consider a model whose assumptions on the underlying
computing and communication abilities are among the weakest possible:
all computational units are finite state probabilistic automata working
asynchronously, there is no broadcasting collision detection mechanism
and no network addresses. We show that under reasonable probabilistic
assumptions such amorphous computing systems can possess universal
computing power with a high probability. The underlying theory makes
use of properties of random graphs and that of probabilistic analysis of
algorithms. To the best of our knowledge this is the first result showing
the universality of such computing systems.
Pokorný Jaroslav
Database architectures: current trends and their relationships to environmental data management
In: Environmental Modelling & Software, Volume: 21, No: 11, Elsevier Science, 2006, pp. 1579-1586.
Skopal Tomáš, Snášel Václav
An Application of LSI and M-tree in Image Retrieval
In: GESTS International Transactions on Computer Science and Engineering, Volume: 34, No: 1, GEST Society, 2006, pp. 212-225.
When dealing with image databases, we often need to solve the problem of how to retrieve a desired set of images effectively and efficiently. As a representation of images, there are commonly used some high-dimensional vectors of extracted features, since in such a way the content-based image retrieval is turned into a geometric-search problem. In this article we present a study of feature extraction from raw image data by means of the LSI method (singular-value decomposition, respectively). Simultaneously, we show how such a kind of feature extraction can be used for efficient and effective similarity retrieval using the M-tree index. Because of the application to image retrieval, we also show some interesting effects of LSI, which are not directly obvious in the area of text retrieval (where LSI came from).
Skopal Tomáš
Unified Framework for Exact and Approximate Search in Dissimilarity Spaces
In: Transactions on Database Systems (TODS), Volume: 32, No: 4, ACM, 2007, pp. 1-47.
In multimedia systems we usually need to retrieve database (DB) objects based on their similarity
to a query object, while the similarity assessment is provided by a measure which defines a
(dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure
is required to be a metric, where the triangle inequality is utilized to speed up the search
for relevant objects by use of metric access methods (MAMs), for example, the M-tree. A recent
research has shown, however, that nonmetric measures are more appropriate for similarity modeling
due to their robustness and ease to model a made-to-measure similarity. Unfortunately, due to
the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From
another point of view, some sophisticated similarity measures could be available in a black-box
nonanalytic form (e.g., as an algorithm or even a hardware device), where no information about
their topological properties is provided, so we have to consider them as nonmetric measures as well.
From yet another point of view, the concept of similarity measuring itself is inherently imprecise
and we often prefer fast but approximate retrieval over an exact but slower one.
To date, the mentioned aspects of similarity retrieval have been solved separately, that is, exact
versus approximate search or metric versus nonmetric search. In this article we introduce a similarity
retrieval framework which incorporates both of the aspects into a single unified model. Based
on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we
are able to change the “amount” of triangle inequality, and so obtain an approximate or full metric
which can be used for MAM-based retrieval. Due to the varying “amount” of triangle inequality,
the measure is modified in a way suitable for either an exact but slower or an approximate but
faster retrieval. Additionally, we introduce the TriGen algorithm aimed at constructing the desired
modification of any black-box distance automatically, using just a small fraction of the database.
Snášel Václav, Moravec Pavel, Pokorný Jaroslav
Using BFA with WordNet Based Model for Web Retrieval
In: Journal of Digital Information Management, Volume: 4, No: 2, 2006, pp. 107-111.