Bednárek David
Optimizing XQuery/XSLT programs using backward analysis
In: Proceedings of ITAT 2007, Information Technologies - Applications and Theory, (Ed. Vojtáš P.), PONT s.r.o., Seňa, 2007, pp. 17-22.
Presented at: Konferencia o informačných (inteligentných) technológiách - aplikácie a teória 2007, 21.-27.9.2007, Polana,
Slovakia.
Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Semantic Web Infrastructure
In: Proc. of the First IEEE International Conference on Semantic Computing, IEEE, 2007, pp. 209-215.
Presented at: ICSC 2007, 17.-19.9.2007, Irvine,
California.
The Semantic Web is not widespread as it has been expected by its founders. This is partially caused by lack of standard and working infrastructure for the Semantic Web. We have built a working, portable, stable, highperformance infrastructure for the Semantic Web. This paper is focused on tasks performed by the infrastructure.
Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Semantic Web Repository and Interfaces
In: Proc. of SEMAPRO (Int. Conf. on Advances in Semantic Processing), IEEE, 2007.
Presented at: SEMAPRO (Int. Conf. on Advances in Semantic Processing), 4.-9.11.2007, Papeete,
French Polynesia (Tahiti) .
The Semantic Web is not widespread as it has been
expected by its founders. This is partially caused by
lack of standard and working infrastructure for the Semantic
Web. We have built a working, portable, stable,
high-performance infrastructure for the Semantic
Web. This enables various experiments with the Semantic
Web in the real world.
Dokulil Jiří, Katreniaková J.
Visualization of large schemaless RDF data
In: Proc. of SEMAPRO (Int. Conf. on Advances in Semantic Processing), IEEE, 2007, pp. 243-248.
Presented at: SEMAPRO (Int. Conf. on Advances in Semantic Processing), 4.-9.11.2007, Papeete,
French Polynesia (Tahiti) .
Since many XML documents do not contain any schema definition, we expected that there will be also RDF documents without RDF schema or ontology.Then the data can only be viewed as a general labeled directed graph and the idea to present the data to the user by drawing the graph seems natural. Because the data can be extremely large, it is impossible to display the whole graph at one time. Only a suitable start node is displayed and the rest of the graph can be explored by incremental navigation.To conserve space and show possible directions of further navigation to the user we have come up with a technique called node merging. By combining suitable graph drawing and navigation techniques we get a tool that can give the user good idea about structure and content of the data.
Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Experimental Platform for the Semantic Web
In: Proceedings of ITAT 2007, Information Technologies - Applications and Theory, (Ed. Vojtáš P.), PONT s.r.o., Seňa, 2007, pp. 67-72.
Presented at: Konferencia o informačných (inteligentných) technológiách - aplikácie a teória 2007, 21.-27.9.2007, Polana,
Slovakia.
Dokulil Jiří, Katreniaková J.
Vizualizácia RDF dát pomocou techniky zlučovania vrcholov
In: Proceedings of ITAT 2007, Information Technologies - Applications and Theory, (Ed. Vojtáš P.), PONT s.r.o., Seňa, 2007, pp. 23-28.
Presented at: Konferencia o informačných (inteligentných) technológiách - aplikácie a teória 2007, 21.-27.9.2007, Polana,
Slovakia.
Eckhardt Alan, Horváth T., Maruščák D., Novotný R., Vojtáš Peter
Uncertainty Issues in Automating Process Connecting Web and User
In: Proc. of Uncertainty Reasoning for the Semantic Web Workshop 2007, (Ed. F. Bobillo), CEUR Workshop Proc., 2007, pp. 1-12.
Presented at: Dateso 2008: Annual International Workshop on DAtabases, TExts, Specifications and Objects, 16.4.-18.4.2008, Desná - Černá Říčka,
Czech Republic.
Eckhardt Alan, Horváth T., Vojtáš Peter
Learning different user profile annotated rules for fuzzy preference top-k quering
In: Scalable Uncertainty Management, Springer, LNAI 4772, Berlin, 2007, pp. 116-130.
Presented at: SUM 2007 International Conference, 10.10.-12.10.2007, Washington,
US.
Uncertainty querying of large data can be solved by providing top-k answers according to a user fuzzy ranking/scoring function. Usually different users have different fuzzy scoring function a user preference model. Main goal of this paper is to assign a user a preference model automatically. To achieve this we decompose user’s fuzzy ranking function to ordering of particular attributes and to a combination function. To solve the problem of automatic assignment of user model we design two algorithms, one for learning user preference on particular attribute and second for learning the combination function. Methods were integrated into a Fagin-like top-k querying system with some new heuristics and tested.
Eckhardt Alan, Vojtáš Peter
Uživatelské preference při hledání ve webovských zdrojích
In: Znalosti 2007, Fakulta elektrotechniky a informatiky, VŠB - Technická univerzita Ostrava, 2007, pp. 179-190.
Presented at: Znalosti 2007, 21.2.-23.2.2007, Ostrava,
Czech Republic.
Eckhardt Alan
Inductive Models of User Preferences for Semantic Web
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 103-114.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
User preferences became recently a hot topic. The massive
use of internet shops and social webs require the presence of a user modelling,
which helps users to orient them selfs on a page. There are many
different approaches to model user preferences. In this paper, we will
overview the current state-of-the-art in the area of acquisition of user
preferences and their induction. Main focus will be on the models of user
preferences and on the induction of these models, but also the process of
extracting preferences from the user behaviour will be studied. We will
also present our contribution to the probabilistic user models.
Eckhardt Alan, Pokorný Jaroslav, Vojtáš Peter
Integrating user and group preferences for top-k search from distributed web resources
In: Proc. of DEXA Workshop Decision Support for Structural Health Monitoring and Flexible Query Processing, (Ed. Tjoa A.M., Wagner R.R..), IEEE, 2007, pp. 317-322.
Presented at: DEXA Workshop, 3.-7.9.2007, Regensburg,
Germany.
We discuss models of user and group preferences in social networks and the Semantic web. We construct a model for user and group preference querying over RDF data as well as for ordering of answers by aggregation of particular attribute ranking. We have implemented our methods and heuristics into the Tokaf middleware framework prototype. We describe also experiments with Tokaf.
Eckhardt Alan, Pokorný Jaroslav, Vojtáš Peter
A system recommending top-k objects for multiple users preference
In: Proc. of FUZZ-IEEE 2007 International Conference on Fuzzy Systems, IEEE, 2007, pp. 1101-1106.
Presented at: FUZZ-IEEE 2007, 23.-26.7.2007, London,
UK.
We discuss models of user preferences in Web environment. We construct a model for user preference querying over a number of data sources and ordering of answers by a combination of particular attribute rankings. We generalize Fagin's algorithm in two directions - we develop some new heuristics for top-k search in the model without random access and propose a method of ordering lists of objects by user fuzzy function. To enable different user preferences our system does not require objects to be sorted - instead we use a B+- tree on each of the attribute domains. This leads to a more realistic model of Web services. We implement our methods and heuristics for search of top-k answers into Tokaf middleware framework prototype. We describe experiments with Tokaf and compare different performance measures with some other methods.
Galamboš Leo
Vyhledávání na Webu
In: DATAKON 2007, (Ed. Popelínský L., Výborný O.), Masaryk university, 2007, pp. 17-24.
Presented at: DATAKON 2007, 20.10.-23.10.2007, Brno,
Czech Republic.
Galamboš Leo, Lánský Jan, Žemlička M., Chernik K.
Compression of Semistructured Documents
In: International Journal of Information Technology, Volume: 4, No: 1, Elsevier, 2007, pp. 11-17.
EGOTHOR is a search engine that indexes the Web
and allows us to search the Web documents. Its hit list contains URL
and title of the hits, and also some snippet which tries to shortly
show a match. The snippet can be almost always assembled by an
algorithm that has a full knowledge of the original document (mostly
HTML page). It implies that the search engine is required to store
the full text of the documents as a part of the index.
Such a requirement leads us to pick up an appropriate compression
algorithm which would reduce the space demand. One of the solutions
could be some use of common compression methods, for instance
gzip or bzip2, but it might be preferable to develop a new method
which would take advantage of the document structure, or rather, the
textual character of the documents.
There already exist special compression text algorithms and methods
for a compression of XML documents. The aim of this paper is
an integration of the two approaches to achieve an optimal level of
the compression ratio.
Gurský Peter, Vojtáš Peter
Multikriteriálne vyhľadávanie najlepších objektov s podporou viacerých užívateľov
In: Znalosti 2007, Fakulta elektrotechniky a informatiky, VŠB - Technická univerzita Ostrava, 2007, pp. 52-62.
Presented at: Znalosti 2007, 21.2.-23.2.2007, Ostrava,
Czech Republic.
Gurský Peter, Horváth T., Jirásek J., Krajči S., Novotný R., Vaneková Veronika, Vojtáš Peter
Web Search with Variable User Model
In: DATAKON 2007, (Ed. Popelínský L., Výborný O.), Masaryk university, 2007, pp. 111-121.
Presented at: DATAKON 2007, 20.10.-23.10.2007, Brno,
Czech Republic.
Húsek Dušan, Pokorný Jaroslav, Řezanková Hana, Snášel Václav
Data clustering: From documents to the Web
In: Web Data Management Practices: Emerging Techniques and Technologies, (Ed. Vakali A., Pallis G.), Idea Group Inc., 2007, pp. 1-33.
The chapter provides a survey of some clustering methods relevant to the clustering document collections and, in consequence, Web data. We start with classical methods of cluster analysis which seem to be relevant in approaching to cluster Web data. The graph clustering is also described since its methods contribute significantly to clustering Web data. A use of artificial neural networks for clustering has the same motivation. Based on previously presented material, the core of the chapter provides an overview of approaches to clustering in the Web environment. Particularly, we focus on clustering web search results, in which clustering search engines arrange the search results into groups around a common theme. We conclude with some general considerations concerning the justification of so many clustering algorithms and their application in the Web environment.
Kuthan T., Lánský Jan
Genetic Algorithms in Syllable-Based text Compression
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 21-34.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
Syllable based text compression is a new approach to compression
by symbols. In this concept syllables are used as the compression
symbols instead of the more common characters or words. This new
technique has proven itself worthy especially on short to middle-length
text files. The effectiveness of the compression is greatly affected by the
quality of dictionaries of syllables characteristic for the certain language.
These dictionaries are usually created with a straight-forward analysis
of text corpora. In this paper we would like to introduce an other way of
obtaining these dictionaries using genetic algorithm. We believe, that
dictionaries built this way, may help us lower the compress ratio. We will
measure this effect on a set of Czech and English texts.
Lánský Jan, Chernik K., Vlčková Z.
Syllable-Based Burrows-Wheeler Transform
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 1-10.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
The Burrows-Wheeler Transform (BWT) is a compression
method which reorders an input string into the form, which is preferable
to another compression. Usually Move-To-Front transform and then
Huffman coding is used to the permutated string. The original method [3]
from 1994 was designed for an alphabet compression. In 2001, versions
working with word and n-grams alphabet were presented. The newest
version copes with the syllable alphabet [7]. The goal of this article is to
compare the BWT compression working with alphabet of letters, syllables,
words, 3-grams and 5-grams.
Lánský Jan, Žemlička M.
Compression of a Set of Strings
In: Proc. of 2007 Data Compression Conference (DCC 2007), IEEE Computer Society Press, 2007, pp. 390-390.
Presented at: DCC 2007 Data Compression Conference, 27.-29.3.2007, Snowbird, Utah,
USA.
Lánský Jan, Chernik K., Vlčková Z.
Comparison of Text Models for BWT
In: Proc. of 2007 Data Compression Conference (DCC 2007), IEEE Computer Society Press, 2007, pp. 389-389.
Presented at: DCC 2007 Data Compression Conference, 27.-29.3.2007, Snowbird, Utah,
USA.
Matousek T., Zavoral Filip
Extracting Zing Models from C Source Code
In: SOFSEM 2007, LNCS 4362, Springer, Berlin, 2007, pp. 900-910.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
In the paper, we propose an approach to an automatic extraction of verification models for the C language source code. We primarily focus on the representation of pointers and arrays, which make the extraction from the C language specific. We provide an implementation of the model extractor as a part of our broader effort to develop a verifier of Windows kernel drivers based on the Zing model checker. To demonstrate the feasibility of our approach, we give examples of the extraction results on a practical synchronization problem.
Mlýnková Irena
UserMap - an Enhancing of User-Driven XML-to-Relational Mapping Strategies
Technical Report: 2007/3, Charles University, Prague, 2007, 38 p.
As XML has undoubtedly become a standard for data representation, it is inevitable to propose and implement techniques for
efficient managing of XML data. A natural alternative is to exploit features and functions of (object-)relational database systems, i.e. to rely
on their long theoretical and practical history. The main concern of such
techniques is the choice of an appropriate XML-to-relational mapping
strategy.
In this paper we focus on enhancing of user-driven techniques which
leave the mapping decisions in hands of users. We propose an algorithm
which exploits the user-given annotations more deeply searching the
user-specified "hints" in the rest of the schema and applies an adaptive
method on the remaining schema fragments. We describe the proposed
algorithm, the similarity measure designed for this purpose, sample implementation of key features of the proposal called UserMap, and results
of experimental testing on real XML data.
Mlýnková Irena
XML Data in (Object-)Relational Databases
In: Diploma Thesis, Charles University, Prague, 2007, pp. 142.
Mlýnková Irena
An XML-to-Relational User-driven Mapping Strategy Based on Similarity and Adaptivity
In: Proc. of SYRCoDIS `07 4th Spring Young Researchers Colloquium on Databases and Information Systems, Volume: 256, CEUR Woskhop Proc., 2007, pp. 9-20.
Presented at: SYRCoDIS`07, 31.5.-1.6.2007, Moscow,
Russia.
As XML has become a standard for data representation,
it is inevitable to propose and implement
techniques for efficient managing of XML
data. A natural alternative is to exploit features
and functions of (object-)relational database
systems, i.e. to rely on their long theoretical
and practical history. The main concern of
such techniques is the choice of an appropriate
XML-to-relational mapping strategy.
In this paper we focus on enhancing of userdriven
techniques which leave the mapping decisions
in hands of users. We propose an algorithm
which exploits the user-given annotations
more deeply searching the user-specified
“hints” in the rest of the schema and applies an
adaptive method on the remaining schema fragments.
We describe the algorithm theoretically,
discussing the key ideas of the approach, chosen
solutions, their reasons, and consequences.
Finally, we overview the open issues related to
implementation of the proposed algorithm and
its experimental testing on real XML data.
Mlýnková Irena, Pokorný Jaroslav
Similarity and XML Technologies
In: Proc. of IADIS International Conference WWW/Internet 2007, (Ed. Isaias P., Nunes M.B., Barroso J.), IADIS, 2007, pp. 277-287.
Presented at: WWW/Internet 2007, 5.-8.10.2007, Vila Real,
Portugal.
As XML technologies have undoubtedly become a standard for data representation, it is inevitable to provide efficient implementations of W3C recommendations. A possible optimization of particular types of techniques can be found in exploitation of similarity of XML data and/or matching of XML patterns. In this paper we provide an overview and classification of such techniques from various points of view. We briefly describe the best known representatives of particular ideas and we discuss their key advantages and disadvantages. The text should serve as a good starting point for proposing an appropriate similarity-based optimization.
Mlýnková Irena, Pokorný Jaroslav
Similarity of XML Schema Fragments Based on XML Data Statistics
In: Proc. of Innovations '07: Proceedings of the 4th International Conference on Innovations in Information Technology, IEEE Computer Society Press, 2007, pp. 243-247.
Presented at: 4th International Conference on Innovations in Information Technology, 18.-20.11.2007, Dubai,
United Arab Emirates.
As XML has become a standard for data representation, it can be found in plenty of information technologies. A possible optimization of XML-based approaches can be exploitation of similarity of XML data. In this paper we propose a technique for evaluating similarity of XML schema fragments focusing on two often omitted aspects - structural level of similarity and tuning of parameters of the similarity measure. In the former case we exploit the results of statistical analysis of real-world XML data. In the latter case we show that the tuning problem is a kind of constraints optimization problem and can be solved using corresponding approaches. We have analyzed (dis) advantages of two of them, genetic algorithms and simulated annealing, and in further experiments we show that appropriate tuning produces a more precise similarity measure.
Mlýnková Irena
UserMap - an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems
Technical Report: 2007/8, Charles University, Prague, 2007, 26 p.
As the XML has become a standard for data representation, it is inevitable
to propose and implement techniques for efficient managing of XML
data. A natural alternative is to exploit features of (object-)relational database systems,
i.e. to rely on their long theoretical and practical history. The main concern
of such techniques is the choice of an appropriate XML-to-relational mapping
strategy.
In this paper we focus on enhancing of user-driven techniques which leave the
mapping decisions in hands of users who specify their requirements using schema
annotations.We describe our prototype implementation called UserMap which is
able to exploit the annotations more deeply searching the user-specified “hints” in
the rest of the schema and applies an adaptive method on the remaining schema
fragments. Using a sample set of supported fixed mapping methods we discuss
problems related to query evaluation for storage strategies generated by the system,
in particular correction of the candidate set of annotations and related query
translation. And finally, we describe the architecture of the whole system.
Nečaský Martin
Conceptual modeling for XML
In: Diploma Thesis, Charles University, Prague, 2007, pp. 153 p..
Nečaský Martin
XSEM - A Conceptual Model for XML
In: Proceedings of the Fourth Asia-Pacific Conference on Conceptual Modelling (APCCM 2007) , (Ed. Roddick J. F., Annika H.), 2007, pp. 37-48.
Presented at: The Fourth Asia-Pacific Conference on Conceptual Modelling (APCCM 2007), 30.1.-2.2.2007, Ballarat, Victoria,
Australia.
We propose a new conceptual model for XML data
called XSEM as a combination of several approaches
in the area of the conceptual modeling for XML.
The model divides the conceptual modeling process of
XML data to two levels. On the first level, a designer
designs an overall non-hierarchical conceptual schema
of a domain. On the second level, he or she derives
different hierarchical representations of parts of the
overall conceptual schema using transformation op-
erators. These hierarchical representations describe
how the data is organized in an XML form.
Nečaský Martin
Using XSEM for Modeling XML Interfaces of Services in SOA
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 35-46.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
In this paper we briefly describe a new conceptual model for
XML data called XSEM and how to use it for modeling XML interfaces
of services in service oriented architecture (SOA). The model is a
combination of several approaches in the area of conceptual modeling of
XML data. It divides the process of conceptual modeling of XML data to
two levels. The first level consists of designing an overall non-hierarchical
conceptual schema of the domain. The second level consists of deriving
different hierarchical representations of parts of the overall conceptual
schema using transformation operators. Each hierarchical representation
models an XML schema describing the structure of the data exchanged
between a service interface and external services.
Nečaský Martin, Pokorný Jaroslav
Extending E-R for Modelling XML Keys
In: Proc. of IEEE ICDIM 2007: Proc. of The Second International Conference on Digital Information Management, IEEE Computer Society, 2007, pp. 236-241.
Presented at: ICDIM 2007: The Second International Conference on Digital Information Management, 28.-31.10.2007, Lyon,
France.
With the growing popularity of XML there is a need not only to describe the structure of XML data but also its semantics. For the conceptual modelling of XML we can use existing conceptual models. However, special features of XML require extensions of these models. In this paper, we study conceptual modelling of XML keys. We extend the notion of E-R keys to be suitable for modelling the semantics of XML keys and we show how to express them on the XML logical level.
Obdržálek David, Benda J.
GFE - Graphical Finite State Machine Editor for Parallel Execution
In: ICEC 2007, (Ed. Ma L., Nakatsu R., Rauterberg M.), LNCS 4740, Springer, IFIP, 2007, pp. 401-406.
Presented at: ICEC 2007 - International Conference on Entertainment Computing, 20.-23.06.2005, Shanghai,
China.
Skopal Tomáš, Hoksza D.
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs
In: Advances in Databases and Information Systems, LNCS 4690, Springer, Berlin, 2007, pp. 172-188.
Presented at: ADBIS 2007, 29.9.-3.10.2007, Varna,
Bulgaria.
The M-tree and its variants have been proved to provide an efficient similarity search in database environments. In order to further improve their performance, in this paper we propose an extension of the M-tree family, which makes use of nearest-neighbor (NN) graphs. Each tree node maintains its own NN-graph, a structure that stores for each node entry a reference (and distance) to its nearest neighbor, considering just entries of the node. The NN-graph can be used to improve filtering of non-relevant subtrees when searching (or inserting new data). The filtering is based on using ”sacrifices” selected entries in the node serving as pivots to all entries being their reverse nearest neighbors (RNNs). We propose several heuristics for sacrifice selection; modified insertion; range and kNN query algorithms. The experiments have shown the M-tree (and variants) enhanced by NN-graphs can perform significantly faster, while keeping the construction cheap.
Skopal Tomáš
Unified Framework for Exact and Approximate Search in Dissimilarity Spaces
In: Transactions on Database Systems (TODS), Volume: 32, No: 4, ACM, 2007, pp. 1-47.
In multimedia systems we usually need to retrieve database (DB) objects based on their similarity
to a query object, while the similarity assessment is provided by a measure which defines a
(dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure
is required to be a metric, where the triangle inequality is utilized to speed up the search
for relevant objects by use of metric access methods (MAMs), for example, the M-tree. A recent
research has shown, however, that nonmetric measures are more appropriate for similarity modeling
due to their robustness and ease to model a made-to-measure similarity. Unfortunately, due to
the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From
another point of view, some sophisticated similarity measures could be available in a black-box
nonanalytic form (e.g., as an algorithm or even a hardware device), where no information about
their topological properties is provided, so we have to consider them as nonmetric measures as well.
From yet another point of view, the concept of similarity measuring itself is inherently imprecise
and we often prefer fast but approximate retrieval over an exact but slower one.
To date, the mentioned aspects of similarity retrieval have been solved separately, that is, exact
versus approximate search or metric versus nonmetric search. In this article we introduce a similarity
retrieval framework which incorporates both of the aspects into a single unified model. Based
on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we
are able to change the “amount” of triangle inequality, and so obtain an approximate or full metric
which can be used for MAM-based retrieval. Due to the varying “amount” of triangle inequality,
the measure is modified in a way suitable for either an exact but slower or an approximate but
faster retrieval. Additionally, we introduce the TriGen algorithm aimed at constructing the desired
modification of any black-box distance automatically, using just a small fraction of the database.
Vlčková Z., Galamboš Leo
Dynamizace gridu
In: Proceedings of ITAT 2007, Information Technologies - Applications and Theory, (Ed. Vojtáš P.), PONT s.r.o., Seňa, 2007, pp. 115-121.
Presented at: Konferencia o informačných (inteligentných) technológiách - aplikácie a teória 2007, 21.-27.9.2007, Polana,
Slovakia.
Vojtáš Peter
EL description logic with aggregation of user preference concepts
In: Frontiers in Artificial Intelligence and applications 154, Information modelling and Knowledge Bases XVIII, IOS Press, Amsterdam, 2007, pp. 154-165.