Húsek Dušan, Pokorný Jaroslav, Řezanková Hana, Snášel Václav
Data clustering: From documents to the Web Chapter 1 in the book: Web Data Management Practices: Emerging Techniques and Technologies, (Eds. Vakali A., Pallis G.), Idea Group Inc., 2007, pp. 1-33.
The chapter provides a survey of some clustering methods relevant to the clustering document collections and, in consequence, Web data. We start with classical methods of cluster analysis which seem to be relevant in approaching to cluster Web data. The graph clustering is also described since its methods contribute significantly to clustering Web data. A use of artificial neural networks for clustering has the same motivation. Based on previously presented material, the core of the chapter provides an overview of approaches to clustering in the Web environment. Particularly, we focus on clustering web search results, in which clustering search engines arrange the search results into groups around a common theme. We conclude with some general considerations concerning the justification of so many clustering algorithms and their application in the Web environment.
Pokorný Jaroslav, Reschke, J.
Exporting relational data into a native XML store Chapter in the book: Advances in Information Systems Development - Bridging the Gap between Academia and Industry. Vol. 2, Edited by A.G. Nilsson et al, Springer Verlag, 2006, pp. 807-818 (ISBN 0-387-30834-2)
Vojtáš Peter
Fuzzy Logic Aggregation for Semantic Web Search for the Best Answer
In the book: Fuzzy Logic and the Semantic Web. (Ed. Sanchez E.), Capturing Intelligence Series, 1, Elsevier, 2006, pp. 341-359 ISBN: 0-444-51948-3
Pokorný Jaroslav
Database Architectures: Current Trends and Their Relationships to Requirements of Practice
In: Proceedings of
ISD’ 06 Conference, 31.8.-2.9.2006, Budapest,
Hungaryto appear in: INFORMATION SYSTEMS DEVELOPMENT Series, Springer Verlag, 2006.
Galamboš Leo
Dynamic Inverted Index Maintenance
In: International Journal of Computer Science, Volume 1, Number 2, 2006, pp. 157-162.
Pokorný Jaroslav
Database architectures: current trends and their relationships to environmental data management
In: Environmental Modelling & Software, Vol. 21, Issue 11, pp. 1579-1586, Elsevier Science, 2006. (ISSN: 1364-8152)
Snášel Václav, Moravec P., Pokorný Jaroslav
Using BFA with WordNet Based Model for Web Retrieval
In: Journal of Digital Information Management, Vol. 4, No. 2, June 2006, pp. 107-111.
Bustos B., Skopal Tomáš
Dynamic Similarity Search in Multi-Metric Spaces
In: Accepted for ACM MIR 2006 (a workshop at ACM Multimedia 2006), Santa Barbara, CA, USA
Bednárek David
Turingovské vzory v XSLT programech
Accepted for: ITAT, 26.9.–1.10.2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry,
Slovakia, 2006.
Lánský J., Galamboš Leo, Chernik K.
Komprese webového uložiště
Accepted for: ITAT, 26.9.–1.10.2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry,
Slovakia, 2006.
Yaghob Jakub, Zavoral Filip
Budování infrastruktury sémantického webu
Accepted for: ITAT, 26.9.–1.10.2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry,
Slovakia, 2006.
Nečaský Martin
XSEM – A Conceptual model for XML DataIn: Proceedings of Communications and Doctoral Consortium, 7th International Baltic Conference on Databases and Information Systems, Vilnius, 2006, pp. 328-331.
Recently XML is the standard format used for the exchange of data between information systems and is also frequently applied as a logical database model. If we use XML as a logical database model we need a conceptual model for the description of its semantics. In this paper, we describe our work on a new conceptual model for XML called XSEM created as a combination of several approaches applied in the area of conceptual modeling for XML.
Toman Kamil, Mlýnková Irena
XML Data - The Current State of AffairsIn: Proceedings of
XML Prague 2006 conference, 17.6.-18.6.2006, Prague, Czech Republic, ITI Series, MFF UK, June 2006, pp. 87 - 102.
At present the eXtensible Markup Language (XML) is used almost in all spheres of human activities. Its popularity results especially from the fact that it is a self-descriptive metaformat that allows to define the structure of XML data using other powerful tools such as DTD or XML Schema. Consequently, we can witness a massive boom of techniques for managing, querying, updating, exchanging, or compressing XML data.
On the other hand, for majority of the XML processing techniques we can find various spots which cause worsening of their time or space efficiency. Probably the main reason is that most of them consider XML data too globally, involving all their possible features, though the real data are often much simpler. If they do restrict the input data, the restrictions are often unnatural.
In this contribution we discuss the level of complexity of real XML collections and their schemes, which turns out to be surprisingly low. We involve and compare results and findings of existing papers on similar topics as well as our own analysis and we try to ¯nd the reasons for these tendencies and their consequences.
Pokorný Jaroslav
Databázové architektury: současné trendy a jejich vztah k novým požadavkům praxe
In: Sborník příspěvků 20. ročníku konference
Moderní databáze, 30.5.-31.5.2006, Hotel Legner, Zvánovice, KOMIX, pp. 5-14 (ISBN 80-239-7109-3)
Nečaský Martin
Conceptual Modeling for XML: A SurveyIn: Snášel, V., Richta, K., and Pokorný, J.: Proceedings of the
Dateso 2006 Annual International Workshop on DAtabases, TExts, Specifications and Objects, 26.4.-28.4.2006, Desná - Černá Říčka,
Czech Republic, CEUR-WS, Vol. 176, pp. 40-53.
Recently XML is the standard format used for the exchange of data between information systems and is also frequently applied as a logical database model. If we use XML as a logical database model we need a conceptual model for the description of its semantics. However, XML as a logical database model has some special characteristics which makes existing conceptual models as E-R or UML unsuitable. In this paper, the current approaches to the conceptual modeling of XML data are described in an uniform style. A list of requirements for XML conceptual models is presented and described approaches are compared on the base of the requirements.
Skopal Tomáš
On Fast Non-Metric Similarity Search by Metric Access MethodsIn: Proceedings of 10th International Conference on Extending Database Technology
EDBT 2006, 26.3.-31.3.2006, Munich,
Germany, Eds. Y. Ioannidis et al., 2006, pp. 718-736 (ISBN: 3-540-32960-9)
The retrieval of objects from a multimedia database employs a measure which defines a similarity score for every pair of objects. The measure should effectively follow the nature of similarity, hence, it should not be limited by the triangular inequality, regarded as a restriction in similarity modeling. On the other hand, the retrieval should be as efficient (or fast) as possible. The measure is thus often restricted to a metric, because then the search can be handled by metric access methods (MAMs). In this paper we propose a general method of non-metric search by MAMs. We show the triangular inequality can be enforced for any semimetric (reflexive, non-negative and symmetric measure), resulting in a metric that preserves the original similarity orderings (retrieval effectiveness). We propose the TriGen algorithm for turning any blackbox semimetric into (approximated) metric, just by use of distance distribution in a fraction of the database. The algorithm finds such a metric for which the retrieval efficiency is maximized, considering any MAM.
Vojtáš Peter
Model Theoretic and Fixpoint Semantics for Preference Queries over Imperfect DataIn: Proceedings of
Inconsistency and Incompleteness in Databases, International Workshop Collocated with the 10 th International Conference on Extending Database Technology, 26.3.2006, Munich,
Germany, Jan Chomicki, Jef Wijsen (Eds.), 2006, pp. 87-91.
We present an overview of our results on model theoretic and fixpoint semantics for a relational algebra using a model of many valued Datalog with similarity. Using our previous results on equivalence of our model and certain variant of generalized annotated programs, we base our querying on fuzzy aggregation operators (also called annotation terms, combining functions, utility functions). Using of fuzzy aggregation operators (distinct from database aggregations) enables us to reduce tuning of various linguistic variables. In practice we can learn fuzzy aggregator operators by an ILP procedure for every user profile. Our approach enables also integration of data from different sources via aggregation and similarity. Extending domains we discuss difference between fuzzy elements and fuzzy subsets. We also discuss an alternative, when all extensional data are stored crisp and fuzziness is in rules interpreting data, context and in user query.
Obdržálek David, Kulhánek Jiří
Generating and handling of differential data in DataPile-oriented systems In: Proceedings of the IASTED International Conference on Databases and Applications (
DBA 2006) as part of the 24th IASTED International Multi-Conference on Applied Informatics, 13.2.-15.2.2006, Innsbruck,
Austria, Ed. M. H. Hamza, 2006. (ISBN: 0-88986-560-4)
Basics of the DataPile structure for data handling systems have been theoretically designed and published. During implementation of such system, numerous problems which were not addressed during the theoretical design phase arose. In a real production environment, the applications connected to the DataPile core need special treatment and set important requirements on the data synchronization process. This article concerns with generating of differential data being distributed from the central DataPile storage to individual applications. It is shown that the synchronization part of DataPile-structured system can be implemented and run efficiently despite of the restrictions or limitations these individual applications impose.
Wiedermann Jiří, Tel G., Pokorný Jaroslav, Bieliková M., Štuller Július
Editors: SOFSEM 2006: Theory and Practice of Computer ScienceEds: Proceedings of
SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic, LNCS 3831, Springer, Berlin, 2006. (ISBN: 3-540-31198-X)
Wiedermann Jiří, Tel G., Pokorný Jaroslav, Bieliková M., Štuller Július
Editors: SOFSEM 2006: Theory and Practice of Computer Science
Eds: Proceedings of
SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic, Volume II, ICS AS CR, Prague, 2006. (ISBN 80-903298-4-5)
Mlýnková Irena, Toman Kamil, Pokorný Jaroslav
Statistical Analysis of Real XML Data CollectionsTechnical Report 2006/5, MFF UK, June 2006, 39 p.
Recently XML has achieved the leading role among languages for data representation and thus we can witness a massive boom of corresponding techniques for managing XML data. Most of the processing techniques however suffer from various bottlenecks worsening their time and/or space efficiency.We assume that the main reason is they consider XML collections too globally, involving all their possible features, although real data are often much simpler. Even though some techniques do restrict the input data, the restrictions are often unnatural. In this paper we analyze existing XML data, their structure and real complexity in particular.We have gathered more than 20GB of real XML collections and implemented a robust automatic analyzer. The analysis considers existing papers on similar topics, trying to confirm or confute their observations as well as to bring new findings. It focuses on frequent but often ignored XML items (such as mixed content or recursion) and relationship between schemes and their instances.
Nečaský Martin
Conceptual Modeling for XML: A SurveyTechnical Report No. 2006-3, Dep. of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, 2006, 54 p.
Recently XML is the standard format used for the exchange of data between information systems and is also frequently applied as a logical database model. If we use XML as a logical database model we need a conceptual model for the description of its semantics. However, XML as a logical database model has some special characteristics which makes existing conceptual models as E-R or UML unsuitable. In this paper, the current approaches to the conceptual modeling of XML data are described in an uniform style. A list of requirements for XML conceptual models is presented and described approaches are compared on the base of the requirements.
Ali K., Pokorný Jaroslav
XML-based Temporal ModelsResearch Report DC-2006-02, Dep. of Comp. Sc. and Engineering, FEE TU Prague, May 2006, 39 s.
Much research work has recently focused on the problem of representing historical information in XML. This report describes a number of temporal XML data models and provides their comparison according to the following properties: time dimension (valid time, transaction time), support of temporal elements and attributes, querying possibilities, association to XML Schema/DTD, and in uence on XML syntax.