Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Semantic Web Infrastructure
In: Proc. of the First IEEE International Conference on Semantic Computing, IEEE, 2007, pp. 209-215.
Presented at: ICSC 2007, 17.-19.9.2007, Irvine,
California.
The Semantic Web is not widespread as it has been expected by its founders. This is partially caused by lack of standard and working infrastructure for the Semantic Web. We have built a working, portable, stable, highperformance infrastructure for the Semantic Web. This paper is focused on tasks performed by the infrastructure.
Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Semantic Web Repository and Interfaces
In: Proc. of SEMAPRO (Int. Conf. on Advances in Semantic Processing), IEEE, 2007.
Presented at: SEMAPRO (Int. Conf. on Advances in Semantic Processing), 4.-9.11.2007, Papeete,
French Polynesia (Tahiti) .
The Semantic Web is not widespread as it has been
expected by its founders. This is partially caused by
lack of standard and working infrastructure for the Semantic
Web. We have built a working, portable, stable,
high-performance infrastructure for the Semantic
Web. This enables various experiments with the Semantic
Web in the real world.
Dokulil Jiří, Tykal J., Yaghob Jakub, Zavoral Filip
Experimental Platform for the Semantic Web
In: Proceedings of ITAT 2007, Information Technologies - Applications and Theory, (Ed. Vojtáš P.), PONT s.r.o., Seňa, 2007, pp. 67-72.
Presented at: Konferencia o informačných (inteligentných) technológiách - aplikácie a teória 2007, 21.-27.9.2007, Polana,
Slovakia.
Eckhardt Alan, Horváth T., Vojtáš Peter
Learning different user profile annotated rules for fuzzy preference top-k quering
In: Scalable Uncertainty Management, Springer, LNAI 4772, Berlin, 2007, pp. 116-130.
Presented at: SUM 2007 International Conference, 10.10.-12.10.2007, Washington,
US.
Uncertainty querying of large data can be solved by providing top-k answers according to a user fuzzy ranking/scoring function. Usually different users have different fuzzy scoring function a user preference model. Main goal of this paper is to assign a user a preference model automatically. To achieve this we decompose user’s fuzzy ranking function to ordering of particular attributes and to a combination function. To solve the problem of automatic assignment of user model we design two algorithms, one for learning user preference on particular attribute and second for learning the combination function. Methods were integrated into a Fagin-like top-k querying system with some new heuristics and tested.
Eckhardt Alan
Inductive Models of User Preferences for Semantic Web
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 103-114.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
User preferences became recently a hot topic. The massive
use of internet shops and social webs require the presence of a user modelling,
which helps users to orient them selfs on a page. There are many
different approaches to model user preferences. In this paper, we will
overview the current state-of-the-art in the area of acquisition of user
preferences and their induction. Main focus will be on the models of user
preferences and on the induction of these models, but also the process of
extracting preferences from the user behaviour will be studied. We will
also present our contribution to the probabilistic user models.
Eckhardt Alan, Pokorný Jaroslav, Vojtáš Peter
A system recommending top-k objects for multiple users preference
In: Proc. of FUZZ-IEEE 2007 International Conference on Fuzzy Systems, IEEE, 2007, pp. 1101-1106.
Presented at: FUZZ-IEEE 2007, 23.-26.7.2007, London,
UK.
We discuss models of user preferences in Web environment. We construct a model for user preference querying over a number of data sources and ordering of answers by a combination of particular attribute rankings. We generalize Fagin's algorithm in two directions - we develop some new heuristics for top-k search in the model without random access and propose a method of ordering lists of objects by user fuzzy function. To enable different user preferences our system does not require objects to be sorted - instead we use a B+- tree on each of the attribute domains. This leads to a more realistic model of Web services. We implement our methods and heuristics for search of top-k answers into Tokaf middleware framework prototype. We describe experiments with Tokaf and compare different performance measures with some other methods.
Eckhardt Alan, Horváth T., Vojtáš Peter
PHASES: A User Profile Learning Approach for Web Search
In: Web Intelligence, IEEE Computer SocietyScalable Uncertainty Management, Los Alamitos, 2007, pp. 780-783.
Presented at: WI 2007. IEEE/WIC/ACM International Conference on Web Intelligence, 2.11.-5.11.2007, Silicon Valley,
US.
Web search heuristics based on Fagin’s threshold
algorithm assume we have the user profile in the form
of particular attribute ordering and a fuzzy
aggregation function representing the user combining
function. Having these, there are sufficient algorithms
for searching top-k answers. Finding particular
attribute ordering and aggregation for a user still
remains a problem. In this short paper our main
contribution is a proof of concept of a new iterative
process of acquisition of user preferences and attribute
ordering .
Galamboš Leo, Lánský Jan, Žemlička M., Chernik K.
Compression of Semistructured Documents
In: International Journal of Information Technology, Volume: 4, No: 1, Elsevier, 2007, pp. 11-17.
EGOTHOR is a search engine that indexes the Web
and allows us to search the Web documents. Its hit list contains URL
and title of the hits, and also some snippet which tries to shortly
show a match. The snippet can be almost always assembled by an
algorithm that has a full knowledge of the original document (mostly
HTML page). It implies that the search engine is required to store
the full text of the documents as a part of the index.
Such a requirement leads us to pick up an appropriate compression
algorithm which would reduce the space demand. One of the solutions
could be some use of common compression methods, for instance
gzip or bzip2, but it might be preferable to develop a new method
which would take advantage of the document structure, or rather, the
textual character of the documents.
There already exist special compression text algorithms and methods
for a compression of XML documents. The aim of this paper is
an integration of the two approaches to achieve an optimal level of
the compression ratio.
Kuthan T., Lánský Jan
Genetic Algorithms in Syllable-Based text Compression
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 21-34.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
Syllable based text compression is a new approach to compression
by symbols. In this concept syllables are used as the compression
symbols instead of the more common characters or words. This new
technique has proven itself worthy especially on short to middle-length
text files. The effectiveness of the compression is greatly affected by the
quality of dictionaries of syllables characteristic for the certain language.
These dictionaries are usually created with a straight-forward analysis
of text corpora. In this paper we would like to introduce an other way of
obtaining these dictionaries using genetic algorithm. We believe, that
dictionaries built this way, may help us lower the compress ratio. We will
measure this effect on a set of Czech and English texts.
Lánský Jan, Chernik K., Vlčková Z.
Syllable-Based Burrows-Wheeler Transform
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 1-10.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
The Burrows-Wheeler Transform (BWT) is a compression
method which reorders an input string into the form, which is preferable
to another compression. Usually Move-To-Front transform and then
Huffman coding is used to the permutated string. The original method [3]
from 1994 was designed for an alphabet compression. In 2001, versions
working with word and n-grams alphabet were presented. The newest
version copes with the syllable alphabet [7]. The goal of this article is to
compare the BWT compression working with alphabet of letters, syllables,
words, 3-grams and 5-grams.
Lánský Jan, Žemlička M.
Compression of a Set of Strings
In: Proc. of 2007 Data Compression Conference (DCC 2007), IEEE Computer Society Press, 2007, pp. 390-390.
Presented at: DCC 2007 Data Compression Conference, 27.-29.3.2007, Snowbird, Utah,
USA.
Lánský Jan, Chernik K., Vlčková Z.
Comparison of Text Models for BWT
In: Proc. of 2007 Data Compression Conference (DCC 2007), IEEE Computer Society Press, 2007, pp. 389-389.
Presented at: DCC 2007 Data Compression Conference, 27.-29.3.2007, Snowbird, Utah,
USA.
Linková Zdeňka, Nedbal Radim
Ontology approach to integration of geographical data
In: WETDAP 2007, Proceedings of the 1st Workshop Evolutionary Techniques in Data-processing, In Conjunction with Znalosti (Knowledge) 2007, Faculty of Electrical Engineering and Computer Science, VŠB - Technical University of Ostrava, Ostrava, 2007, pp. 35-41.
Presented at: Workshop Evolutionary Techniques in Data-processing, Associated with ZNALOSTI 2007 conference
, 21.-23.2.2007, Ostrava,
Czech Republic.
A key point in modern automated data processing is metadata semantics representation. Employing Semantic Web existing features - ontologies - is a promising option. Ontologies open a novel approach to knowledge representation.
The paper presents a GIS (Geographic Information System) domain application illustrating ontological approach to data integration and data
processing automation in the specific system. This VirGIS system is an integration system that works with spatio-temporal data. We start our
study with developing the data representation based on common Semantic Web techniques and build a VirGIS ontology.
Linková Zdeňka
Ontology-Based Schema Integration
In: Proceedings of SOFSEM 2007, ICS AS CR, Prague, 2007, pp. 71-80.
Presented at: SOFSEM 2007, 20.2.-26.2.2007, Harrachov,
Czech Republic.
Data integration usually provides a unified global view over
several data sources. A crucial part of the task is the establishment of the
connection between the global view and the local sources. For this purpose, two basic mapping approaches have been proposed: GAV (Global
As View) and LAV (Local As View). On the Semantic Web, there can
be considered also an ontological approach.
In this paper, data integration is solved using ontologies of the sources. To
express relationships between the global view and local source schemas,
an ontology for the integration system is built. Thus, a schema integration task is transformed to an ontology merging task.
Linková Zdeňka
Schema Matching in the SemanticWeb Environment
In: Doktorandský den 07, (Ed. F. Hakl), MATFYZPRESS, 2007, pp. 36-42.
Presented at: Doktorandské dny 2007, 17.-19.9.2007, Malá Úpa,
Czech Republic.
The paper deals with one step of non-materialized data integration - schema matching task. It works with data
sources on the Semantic Web; the crucial assumption for the considered task is available ontologies describing data
to integrate. Source ontologies are used to find correspondences between source schemas elements. For this, also
techniques known from ontology alignment and ontology merging field are used.
Linková Zdeňka
Mapování schémat v prostředí Sémantického webu
In: Doktorandské dny na KM FJFI 07, 2007, pp. 117-126.
ISBN: 978-80-01-03913-7
Článek se zabývá úlohami, které je třeba řešit při nematerializované
integraci dat. Zaměřuje se na hledání korespondencí mezi schématy a
mapování schémat. Návrh přístupu řešení těchto úloh na Sémantickém
webu těží z dostupných ontologiích popisujících integrované zdroje.
Ontologie jsou využity jak k hledání mapování, tak i při jejich
popisu.
Mlýnková Irena
UserMap - an Enhancing of User-Driven XML-to-Relational Mapping Strategies
Technical Report: 2007/3, Charles University, Prague, 2007, 38 p.
As XML has undoubtedly become a standard for data representation, it is inevitable to propose and implement techniques for
efficient managing of XML data. A natural alternative is to exploit features and functions of (object-)relational database systems, i.e. to rely
on their long theoretical and practical history. The main concern of such
techniques is the choice of an appropriate XML-to-relational mapping
strategy.
In this paper we focus on enhancing of user-driven techniques which
leave the mapping decisions in hands of users. We propose an algorithm
which exploits the user-given annotations more deeply searching the
user-specified "hints" in the rest of the schema and applies an adaptive
method on the remaining schema fragments. We describe the proposed
algorithm, the similarity measure designed for this purpose, sample implementation of key features of the proposal called UserMap, and results
of experimental testing on real XML data.
Mlýnková Irena
XML Data in (Object-)Relational Databases
In: Diploma Thesis, Charles University, Prague, 2007, pp. 142.
Mlýnková Irena
UserMap - an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems
Technical Report: 2007/8, Charles University, Prague, 2007, 26 p.
As the XML has become a standard for data representation, it is inevitable
to propose and implement techniques for efficient managing of XML
data. A natural alternative is to exploit features of (object-)relational database systems,
i.e. to rely on their long theoretical and practical history. The main concern
of such techniques is the choice of an appropriate XML-to-relational mapping
strategy.
In this paper we focus on enhancing of user-driven techniques which leave the
mapping decisions in hands of users who specify their requirements using schema
annotations.We describe our prototype implementation called UserMap which is
able to exploit the annotations more deeply searching the user-specified “hints” in
the rest of the schema and applies an adaptive method on the remaining schema
fragments. Using a sample set of supported fixed mapping methods we discuss
problems related to query evaluation for storage strategies generated by the system,
in particular correction of the candidate set of annotations and related query
translation. And finally, we describe the architecture of the whole system.
Nečaský Martin
Conceptual modeling for XML
In: Diploma Thesis, Charles University, Prague, 2007, pp. 153 p..
Nečaský Martin
XSEM - A Conceptual Model for XML
In: Proceedings of the Fourth Asia-Pacific Conference on Conceptual Modelling (APCCM 2007) , (Ed. Roddick J. F., Annika H.), 2007, pp. 37-48.
Presented at: The Fourth Asia-Pacific Conference on Conceptual Modelling (APCCM 2007), 30.1.-2.2.2007, Ballarat, Victoria,
Australia.
We propose a new conceptual model for XML data
called XSEM as a combination of several approaches
in the area of the conceptual modeling for XML.
The model divides the conceptual modeling process of
XML data to two levels. On the first level, a designer
designs an overall non-hierarchical conceptual schema
of a domain. On the second level, he or she derives
different hierarchical representations of parts of the
overall conceptual schema using transformation op-
erators. These hierarchical representations describe
how the data is organized in an XML form.
Nečaský Martin
Using XSEM for Modeling XML Interfaces of Services in SOA
In: Proceedings of the Dateso 2007, CEUR Workshop Proc., 2007, pp. 35-46.
Presented at: Dateso 2007 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 18.4.-20.4.2007, Desná - Černá Říčka,
Czech Republic.
In this paper we briefly describe a new conceptual model for
XML data called XSEM and how to use it for modeling XML interfaces
of services in service oriented architecture (SOA). The model is a
combination of several approaches in the area of conceptual modeling of
XML data. It divides the process of conceptual modeling of XML data to
two levels. The first level consists of designing an overall non-hierarchical
conceptual schema of the domain. The second level consists of deriving
different hierarchical representations of parts of the overall conceptual
schema using transformation operators. Each hierarchical representation
models an XML schema describing the structure of the data exchanged
between a service interface and external services.
Nedbal Radim
Various Kinds of Preferences in Database Queries
In: Doktorandský den 07, (Ed. F. Hakl), MATFYZPRESS, 2007, pp. 49-59.
Presented at: Doktorandské dny 2007, 17.-19.9.2007, Malá Úpa,
Czech Republic.
The paper resumes recent advances in the
field of logic of preference and presents their
application in the field of database queries.
Namely, non-monotonic reasoning mechanisms
including various kinds of preferences are reviewed,
and a way of suiting them to practical
database applications is shown: reasoning including
sixteen strict and non-strict kinds of preferences,
inclusive of ceteris paribus preferences,
is feasible. However, to make the mechanisms
useful for practical applications, the assumption
of preference specification consistency
has to be relinquished. This is achieved in two
steps: firstly, all the kinds of preferences are de-
fined so that some uncertainty is inherent, and
secondly, not a notion of a total pre-order but a
partial pre-order is used in the semantics, which
enables to indicate some kind of conflict among
preferences. Most importantly, the semantics of
a set of preferences is related to that of a disjunctive
logic program.
Nedbal Radim
Algebraic Optimization of Database Queries with Preferences
In: Doktorandské dny na KM FJFI 07, 2007, pp. 157-167.
ISBN: 978-80-01-03913-7
The paper resumes a logical framework for formulating preferences and proposes
their embedding into relational algebra through a single preference operator parameterized by
a set of user preferences of sixteen various kinds, inclusive of ceteris paribus preferences, and
returning only the most preferred subsets of its argument relation. Most importantly, conflicting
set of preferences is permitted and preferences between sets of elements can be expressed.
Formal foundation for algebraic optimization, applying heuristics like push preference, also
is provided: abstract properties of the preference operator and a variety of algebraic laws
describing its interaction with other relational algebra operators are presented.
Řimnáč Martin
Advanced Features of Attribute Annotated Data Sets
In: WETDAP 2007, Proceedings of the 1st Workshop Evolutionary Techniques in Data-processing, In Conjunction with Znalosti (Knowledge) 2007, Faculty of Electrical Engineering and Computer Science, VŠB - Technical University of Ostrava, Ostrava, 2007, pp. 54-59.
Presented at: Workshop Evolutionary Techniques in Data-processing, Associated with ZNALOSTI 2007 conference
, 21.-23.2.2007, Ostrava,
Czech Republic.
The paper compares features of learning and querying process
in the situation, when values in the input data set are annotated by
attributes or this information is not available. The attribute annotation
enables to consider global relationships, which are useful to express the
data semantics in a explicit way. It will be shown data can be accessed
with no semantic interpretation and then, after the evaluation process,
the result can be interpreted.
Řimnáč Martin
Minimalising Binary Predicate Knowledge Base using Transitivity Rule in Incremental Algorithm
Presented as an invited talk: 22nd European Conference on Operational Research EURO 2007
, 8.-11.7.2007, Prague,
Czech Republic.
Machine learning methods can be seen as an optimalisation task reducing differences
between an expected and returned result on a given data set. A corresponding
knowledge base can be expressed in many ways, for example, by a binary predicate
formalism.
The talk deals with a minimalisation of predicate ammount in such a repository,
which is enabled by a transitivity. The transitive reduction algorithm will be
detaily given for an incremental (attribute annotated data driven) building of a
knowledge base; a base model with higher expressiveness will be prefered.
Finally, an effect of the selected model to estimated explicit semantic definitions
of symbols (internal base interpretation) will be mentioned as well.
Řimnáč Martin
Redukce datových modelů
In: Doktorandský den 07, (Ed. F. Hakl), MATFYZPRESS, 2007, pp. 80-86.
Presented at: Doktorandské dny 2007, 17.-19.9.2007, Malá Úpa,
Czech Republic.
Přıspěvek se zabývá aspekty optimalizace paměťových nároků binárního úložiště atributově anotovaných dat
na základě transitivní redukce zobecněného systému funkčních závislostí. Tento systém buď může být předem
daný modelem, v tomto případě se ukazuje, že je možné optimalizaci použít jednorázově; a nebo tento model
je inkremetálním způsobem odhadován a pak se ukazuje vhodným pouze již jednou naoptimalizované úložiště
pouze upravovat opět inkrementálním způsobem. V poslední sekci se příspěvek zaobírá rozborem nejednoznačnosti
výsledku včetně detailního rozboru vlastností základních konfigurací částí modelu způsobující tuto nejednoznačnost.
V neposlední řadě je analyzována složitost dílčích operací v úložišti.
Řimnáč Martin, Linková Zdeňka
Automatizovaný návrh pravidel pro integraci dat
Řimnáč Martin, Špánek Roman, Linková Zdeňka
Sémantický web: vize globálního úložiště dat?
In: DATAKON 2007, (Ed. Popelínský L., Výborný O.), Masaryk university, 2007, pp. 176-186.
Presented at: DATAKON 2007, 20.10.-23.10.2007, Brno,
Czech Republic.
Cílem příspěvku je předložit vizi nových přístupů pro sdílení a vyhledávání dat na internetu. Opírá se o prověřené technologie pracující nad textovými webovými dokumenty a propojuje je se sémantickým webem, moderním prostředkem pro výměnu dat a aktuálními trendy ve vývoji internetu jako celku.
Řimnáč Martin, Špánek Roman, Linková Zdeňka
SemanticWeb: Vision of Distributed and Trusted Data Environment?
In: WWM 2007, 2007, pp. 627-634.
Presented at: WWM 2007, 1st International Web X.0 and Web Mining Workshop, held in collocation with ICDIM 2007, 28.10.-31.10.2007, Lyon,
France.
The vision of the semantic web as a distributed and
trusted environment for data sharing together with related
issues are presented. The paper brings a basic binary
matrix formalism for the internal representation of sources
and shows the clasical issues as a data inconsistency and a
data integration. Aspects of these issues lead to the binary
formalism to be generalised into the <0,1> interval one to
enable the consideration of uncertainty at various level.
Finally, the need of a source trust definition is presented
and discussed with respect to a semantic web.
Špánek Roman
Maintaining Trust in Large Scale Environments
In: Doktorandský den 07, (Ed. F. Hakl), MATFYZPRESS, 2007, pp. 94-102.
Presented at: Doktorandské dny 2007, 17.-19.9.2007, Malá Úpa,
Czech Republic.
Špánek Roman
Supporting Secure Communication in Distributed Environments
Špánek Roman
Reputation System for Large Scale Environments
In: WWM 2007, 2007, pp. 621-626.
Presented at: WWM 2007, 1st International Web X.0 and Web Mining Workshop, held in collocation with ICDIM 2007, 28.10.-31.10.2007, Lyon,
France.
The paper describes a new approach for treating trust in
reconfigurable groups of users with special accent on trust
in the next generations of the Internet. The proposed model
uses properties of weighted hypergraphs. Model flexibility
enables description of relations between nodes such that
these relations are preserved under frequent changes. The
ideas can be straightforwardly generalized to other concepts
describable by weighted hypergraphs. The consistency
of the proposal was verified in a couple of experiments
with our pilot implementation SecGRID.
Špánek Roman, Pirkl Pavel, Kovář P.
The Blue Game Project: Ad-hoc Multiplayer Mobole Game with Social Dimension
In: CoNEXT 2007, New York, 2007.
Presented at: 3rd Annual CoNEXT Conference, 10.-13.12.2007, New York,
USA.
The paper presents the BlueGame project an ad-hoc multiplayer
mobile game based on the Dungeons&Dragons board
game. The main idea lies in the adoption of Bluetooth Piconet
configuration and direct face to face contact of players
in real environments.
Tyl Pavel
Problematika integrace ontologií
In: Doktorandský den 07, (Ed. F. Hakl), MATFYZPRESS, 2007, pp. 110-115.
Presented at: Doktorandské dny 2007, 17.-19.9.2007, Malá Úpa,
Czech Republic.
Internet je ohromným zdrojem provázaných, ale většinou neuspořádaných dat. Sémantický web, jako rozšíření
webu současného, se snaží tuto neuspořádanost řešit a to nejen bezprostředně pro lidského uživatele, ale zejména
z hlediska možnosti strojového zpracování informací. Cílem je doplnit data o metadata, která mají být srozumitelná
jak pro člověka, tak pro počítač. Tato metadata jsou nejčastěji vyjádřena pomocí ontologií, které jsou jedním
ze základních stavebních prvků sémantického webu. V příspěvku se snažím nastínit některé z možností integrace
(slučování) ontologií za účelem sdílení informací.
Wiedermann Jiří, Petrů Lukáš
On the Universal Computing Power of Amorphous Computing Systems
Technical Report: V-1009, ICS AS CR, Prague, 2007, 11 p.
Amorphous computing differs from the classical ideas about computations almost in every aspect. The
architecture of amorphous computers is random, since they consist of a plethora of identical computational
units spread randomly over a given area. Within a limited radius the units can communicate wirelessly
with their neighbors via a single-channel radio. We consider a model whose assumptions on the underlying
computing and communication abilities are among the weakest possible: all computational units are finite
state probabilistic automata working asynchronously, there is no broadcasting collision detection mechanism
and no network addresses. We show that under reasonable probabilistic assumptions such amorphous
computing systems can possess universal computing power with a high probability. The underlying theory
makes use of properties of random graphs and that of probabilistic analysis of algorithms. To the best of
our knowledge this is the first result showing the universality of such computing systems.