Azarová Irina, Sinopalníková Anna
Using Corpus Statistics for Wordnet Structuring
In: Proceedings of the Second International Conference on Corpus Linguistics (Corpora-2004), Saint-Petersburg State University Press, Saint-Petersburg, Russia, 2004, pp. 3-11.
Presented at: Second International Conference on Corpus Linguistics (Corpora-2004), , Saint-Petersburg, Russia.
Azarová Irina, Sinopalníková Anna, Yavorskaya Maria
Guidelines for RussNet structuring (Guidelines for RussNet structuring)
In: Proceedings of the Dialogue 2004 International Conference on the Computational Linguistics and Intellectual Technologies, Moscow: Nauka, Moscow, 2004, pp. 232-241.
ISBN: 5-02-002826-6
Presented at: Dialogue 2004 International Conference on the Computational Linguistics and Intellectual Technologies, 2004, Moscow, Russia.
Bartoň Stanislav
Indexing Structure for Discovering Relationships in RDF Graph Recursively Applying Tree Transformation
In: Proceedings of Semantic Web Workshop at 27th Annual International ACM SIGIR Conference, 2004, pp. 58-68.
Presented at: Semantic Web Workshop at 27th Annual International ACM SIGIR Conference, 25.7.-29.7. 2004, University of Sheffield, Sheffield,
Great Britain.
Discovering the complex relationships between entities is one way of benefitting from the Semantic Web. This paper discusses new approaches to implementing rho-operators into RDF querying engines which will enable discovering such relationships viable. The cornerstone of such implementation is creating an index which describes the original RDF graoh. The index is created by recursive application of a transformation of graph to forest of trees and then to each tree its extended signature is created. The signatures are accompanied by the additional information about transformed problematic nodes breaking the tree structure. The components described by the signatures are assumed as a single node in the following step. The transitions between the signatures represent edges.
Bartoň Stanislav, Zezula Pavel
RhoIndex - An Index for Graph Structured Data
Presented at: 8th International DELOS Workshop on Future Digital Library Management Systems, 29.3.-1.4.2005, Schloss Dagstuhl,
Germany.
The effort described in this paper introduces an indexing structure for path search in the graph structured data called rho-index. It is based on a graph segmentation S(G) that is meant to represent the indexed graph G in a simpler manor yet having similar properties as the graph G had. This is achieved using graph transformations and a special type of a matrix used to represent the transformed graph.
Bartoň Stanislav, Zezula Pavel
Designing and Evaluating an Index for Graph Structured Data
In: Proceedings of ICDM MCD 2006, IEEE Press, Hong Kong, China, 2006, pp. 1-5.
Presented at: The Second International Workshop on Mining Complex Data - MCD'06 - In Conjunction with IEEE ICDM'06, 18.12.-22.12.2006, Hong Kong,
China.
Bartoň Stanislav, Zezula Pavel
Mining Citation Graphs Employing an Index for Graph Structured Data
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 1-9.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
In this paper a correlation between rho-operators and indirect
relations in citation analysis is presented. The rho-operators were defined
to explore complex relationships in graph structured data. Various direct
and indirect relations identified in citation analysis are used to study the
semantics within the citation network. The rho-index was used to implement the rho - path search in the citation network and gained results are
evaluated and discussed.
Bartoň Stanislav, Zezula Pavel
rho-Index - Designing and Evaluating an Indexing Structure for Graph Structured Data
Technical Report: FIMU-RS-2006-07, FI MU, Brno, 2006, 24 p.
Bartoň Stanislav
Searching Indirect Relationships in Citation Analysis Using an Index for Graph Structured Data
In: 2nd Doctoral Workshop on Mathematical and Engineering Methods in Computer Science MEMICS 2006, 2006, pp. 9-16.
Presented at: 2nd Doctoral Workshop on Mathematical and Engineering Methods in Computer Science MEMICS 2006, 27.10.-29.10.2006, Mikulov,
Czech Republic.
Bartoň Stanislav, Dohnal Vlastislav, Sedmidubský Jan, Zezula Pavel
Gauging the Evolution of Metric Social Network
In: 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007) held at 33rd International Conference on Very Large Data Bases (VLDB 2007), 2007, pp. 12.
Presented at: Fifth International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007), 24.9.2007, Vienna,
Austria.
In this paper, we tackle the issues of analyzing the struc-
tural evolution of the metric social network. The metric social network
operates in a P2P environment where peers maintain their own data
and the relationships among them are formed on the basis of the pro-
cessed similarity queries. The evolution is analyzed by traditional social
networking tools the characteristic path length and the clustering co-
efficient. Nonetheless, due to the special structure of the metric social
network, own designed gauges the average overlap and robustness of
description coefficients are presented to analyze the structure of emerg-
ing communities encompassing similar data.
Bartoň Stanislav, Zezula Pavel
Indexing Structure for Graph-Structured Data
In: Studies in Computational Intelligence, Volume: 165, Springer Berlin/Heidelberg, Berlin, 2008, pp. 167-188.
Bartoň Stanislav, Dohnal Vlastislav, Sedmidubský Jan, Zezula Pavel
Building Self-Organized Image Retrieval Network
In: Proceedings of 6th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR '08), ACM, USA, 2008.
(in_print)
Batko Michal, Dohnal Vlastislav, Zezula Pavel
M-Grid: Similarity Searching in Grids
In: Proceedings of International Workshop on Information Retrieval in Peer-to-Peer Networks, ACM Press, Arlington, 2006, pp. 1-8.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
On Scalability of the Similarity Search in the World of Peers
In: InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, ACM Press, New York, NY, USA, 2006, pp. 1-12.
ISBN: 1-59593-428-6
Due to the increasing complexity of current digital data,
similarity search has become a fundamental computational
task in many applications. Unfortunately, its costs are still
high and the linear scalability of single server implementations
prevents from efficient searching in large data volumes.
In this paper, we shortly describe four recent scalable
distributed similarity search techniques and study their performance
of executing queries on three different datasets.
Though all the methods employ parallelism to speed up
query execution, different advantages for different objectives
have been identified by experiments. The reported results
can be exploited for choosing the best implementations
for specific applications. They can also be used for designing
new and better indexing structures in the future.
Batko Michal, Novák David, Zezula Pavel
MESSIF: Metric Similarity Search Implementation Framework
In: Digital Libraries: Research and Development, Springer-Verlag, LNCS 4877, Berlin, Heidelberg, 2007, pp. 1-10.
ISBN: 978-3-540-77087-9
Batko Michal, Novák David, Zezula Pavel
MESSIF: Metric Similarity Search Implementation Framework
In: DELOS Conference 2007 - Working Notes, Information Society Technologies, Pisa, Italy, 2007, pp. 11-23.
Presented at: DELOS Conference 2007, 13-14.2.2007, Pisa,
Italy.
The similarity search has become a fundamental computational task in many applications. One
of the mathematical models of the similarity the metric space has drawn attention of many
researchers resulting in several sophisticated metric-indexing techniques. An important part of a
research in this area is typically a prototype implementation and subsequent experimental evaluation
of the proposed data structure. This paper describes an implementation framework called MESSIF
that eases the task of building such prototypes. It provides a number of modules from basic storage
management to automatic collecting of performance statistics. Due to its open and modular design it
is also easy to implement additional modules if necessary. The MESSIF also offers several ready-to-use
generic clients that allow to control and test the index structures and also measure its performance.
Batko Michal, Skopal Tomáš, Lokoč Jakub
New Dynamic Construction Techniques for M-tree
In: Journal of Discrete Algorithms, Elsevier, Amsterdam, The Netherlands, 2008.
(in_print)
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.
Batko Michal, Novák David, Falchi Fabrizio, Zezula Pavel
Scalability Comparison of Peer-to-Peer Similarity Search Structures
In: Future Generation Computer Systems, Volume: 24, No: 8, Elsevier, Amsterdam, The Netherlands, 2008, pp. 834-848.
Batko Michal, Kohoutková Petra, Zezula Pavel
Combining Metric Features in Large Collections
In: 1st International Workshop on Similarity Search and Applications (SISAP 2008), IEEE Computer Society, Los Alamitos CA, Washington, Tokyo, 2008, pp. 79-86.
Batko Michal, Falchi Fabrizio, Lucchese Claudio, Novák David, Perego Raffaele, Rabitti Fausto, Sedmidubský Jan, Zezula Pavel
Crawling, Indexing, and Similarity Searching Images on the Web
In: Proceedings of the Sixteenth Italian Symposium on Advanced Database, 2008, pp. 382-389.
Bosch Sonja, Fellbaum Christiane, Pala Karel
Derivational Relations in English, Czech and Bantu Wordnet
In: Proc. of Fourth Global WordNet Conference, University of Szeged, Department of Informatics, 2008, pp. 74-90.
Presented at: GWC 2008, 22.-25.1.2008, Szeged,
Hungary.
Dohnal Vlastislav, Gennaro Claudio, Zezula Pavel
Efficiency and Scalability Issues in Metric Access Methods
In: Computational Intelligence in Medical Informatics, Springer Verlag, Berlin, Germany, 2008.
ISBN: 978-3-540-75766-5
The metric space paradigm has recently received attention as an
important model of similarity in the area of Bioinformatics. Numerous techniques have been proposed to solve similarity (range or
nearest-neighbor) queries on collections of data from metric domains. Though important representatives are outlined, this chapter is not trying to
substitute existing comprehensive surveys. The main objective is to explain and prove by experiments that similarity searching is typically an expensive
process which does not easily scale to very large volumes of data, thus distributed architectures able to exploit parallelism must be employed.
After a review of applications using the metric space approach in the field of Bioinformatics, the chapter provides an overview of methods used for
creating index structures able to speedup retrieval. In the metric space approach, only pair-wise distances between objects are quantified, so they
represent the level of dissimilarity. The key idea of index structures is to partition the data into subsets so that queries are evaluated without
examining entire collections -- minimizing both the number of distance computations and the number of I/O accesses. These objectives are obtained
by exploiting the property of metric spaces called the triangle inequality which states that if two objects are near a third object, they cannot be too
distant to one another. Unfortunately, computational costs are still high and the linear scalability of single-computer implementations prevents from
searching in large and ever growing data files efficiently. For these reasons, we describe very recent parallel and distributed similarity search
techniques and study performance of their implementations. Specifically, Section 12.1 presents the metric space approach and its applications in the
field of Bioinformatics. Section 12.2 describes some of the most popular centralized disk-based metric indexes. Consequently, Section
12.3 concentrates on parallel and distributed access methods which can deal with data collections that for practical purposes can be arbitrary large, which
is typical for Bioinformatics workloads. An experimental evaluation of the presented distributed approaches on real-life data sets is presented in 12.4.
The chapter concludes in Section 12.5.
Dohnal Vlastislav, Sedmidubský Jan, Zezula Pavel, Novák David
Similarity Searching: Towards Bulk-loading Peer-to-Peer Networks
In: 1st International Workshop on Similarity Search and Applications (SISAP 2008), IEEE, 2008, pp. 87-94.
Presented at: SISAP 2008 - Workshop at ICDE 2008, 11.-12.04.2008, Cancun,
Mexico.
Due to the exponential growth of digital data and its complexity,
we need a technique which allows us to search such collections efficiently.
A suitable solution is based on the peer-to-peer (P2P) network paradigm and
the metric-space model of similarity. When a large volume of data is being
inserted, the P2P network must expand to new peers in order to maintain its
efficiency. Thus, many peers must be split. During a peer split, the data is
halved and one half is migrated to a new peer. In this paper, we study the
problem of peer splits and propose a specialized algorithm for speeding it
up. In particular, we use the structured P2P network called the M-Chord.
Search performance within a single peer is enhanced by the M-tree. In
experimental evaluation, we compare the proposed algorithm with several
straightforward solutions on a real network organizing 10 million images.
Our algorithm provides a significant performance boost.
Falchi Fabrizio, Gennaro Claudio, Rabitti Fausto, Zezula Pavel
A distributed incremental nearest neighbor algorithm
In: International Conference on Scalable Information Systems, Volume: 304, ACM Press, New York, 2007, pp. 1-10.
Presented at: INFOSCALE 2007, 6.-8.6.2007, Suzhou,
China.
Falchi Fabrizio, Gennaro Claudio, Zezula Pavel
Nearest neighbor search in metric spaces through Content-Addressable Networks
In: Information Processing and Management, Volume: 44, No: 1, Elsevier, 2008, pp. 411-429.
Hanks Patrick
The Organization of the Lexicon: Semantic Types and Lexical Sets
In: Proceedings of Euralex Conference, University of Torino, 2006.
Presented at: The 12th EURALEX International Congress, 6.9.-9.9.2006, Torino,
Italy.
This paper reports a new kind of lexicon currently being developed as a resource for
natural language processing, language teaching, and other applications. This is a "Pattern
Dictionary of English", based on detailed and extensive corpus analysis of each sense of each verb in the language. A pattern consists of a verb with its valencies, plus semantic values for each valency and other relevant clues, and is associated with an implicature that associates the meaning with the context rather than with the word in isolation. For each verb, all normal patterns are recorded. The semantic types in each argument slot are linked to actual words via a large ontology.
The dictionary is aimed primarily at the NLP community, but it also has relevance for language teaching. For NLP purposes, matching actual uses of verbs in previously unseen text to patterns in the pattern dictionary offers some hope of solving the “Word Sense Disambiguation (WSD) problem”.
The paper discusses the relationship between A) words as they are actually used and B) semantic types and functions in a theoretical lexicon. An attempt will be made in the full paper to relate empirically observable, corpus-based facts about ordinary word use to the theoretical abstractions of Generative Lexicon Theory of James Pustejovsky. Lexicography and linguistic theory are often uneasy bedfellows, but I shall suggest that in at least these two cases, there is a possibility of a harmonious and productive relationship.
Hanks Patrick
Why Bother with Corpus Evidence
In: Proceedings of the Second International Conference of the German Cognitive Linguistics Association, 2007.
(in_print)
Presented at: Second International Conference of the German Cognitive Linguistics Association, 5.10.-7.10.2006, Munich,
Germany.
Hanks Patrick, Pala Karel, Rychlý Pavel
Using Corpus Analysis to Mapping Lexical Sets onto Semantic Types through Corpus Analysis
In: Proceedings of the Fourth International Workshop on Generative Approaches to the Lexicon, 2007.
(in print)
Presented at: Fourth International Workshop on Generative Approaches to the Lexicon, 10-11.5.2007, Paris,
France.
Hanks Patrick, Pala Karel
Towards an empirically well-founded semantic ontology for NLP
In: Proceedings of the Fourth International Workshop on Generative Approaches to the Lexicon, 2007.
Presented at: Fourth International Workshop on Generative Approaches to the Lexicon, 10-11.5.2007, Paris,
France.
This paper examines some issues involved in
building a corpus-based ontology for use in
determining the meaning of words in text, in the
context of creating a “pattern dictionary”. How do
words cluster in paradigmatic lexical sets in actual
usage (as reflected in a large corpus), and can these
clusters be mapped onto a semantically structured
ontology? What semantic notions need to be
distinguished for this purpose, and what are the
appropriate theoretical foundations? What other
elements are needed for the application of
determining meaning in text?
Hanks Patrick
Editorial: Cognition and the Lexicon
In: Lexicology, (Ed. Hanks P.), Volume: 5, Routledge, Taylor and Francis Group, 2007.
ISBN: 978-0-415-70098-6
Hanks Patrick
Editorial: Formal Approaches to the Lexicon
In: Lexicology, (Ed. Hanks P.), Volume: 6, Routledge, Taylor and Francis Group, 2007.
ISBN: 978-0-415-70098-6
Hlaváčková D., Horák Aleš, Kadlec V.
Exploitation of the VerbaLex Verb Valency Lexicon in the Syntactic Analysis of Czech
In: Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006, LNCS 4188, Springer-Verlag, Berlin / Heidelberg, 2006, pp. 79-86.
Presented at: Ninth International Conference on TEXT, SPEECH and DIALOGUE TSD 2006,, 11.9.-15.9.2006, Brno,
Czech Republic.
Hlaváčková D., Pala Karel
Surface and Deep Valency Frames in Czech
In: Proceedings of the 25th International Conference on Lexis and Grammar, 2007.
(in_print)
Presented at: The 25th International Conference on Lexis and Grammar, 6.9.-10.9.2006, Palermo,
Italy.
Hlaváčková D., Pala Karel
Computer Processing Derivational Relations in Czech
In: Computer Treatment of Slavic and East European Languages, L. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, 2007, pp. 198-208.
Presented at: Slovko 2007, 25.-27.10.2007, Bratislava,
Slovakia.
In the paper we deal with the derivational relations in Czech that form typical derivational nests (or subnets). Derivational relations are mostly of semantic nature and their regularity in Czech allows us to describe them in a way suitable for computer processing and then add them to the electronic databases such as WordNet almost automatically. For this purpose we have used the derivational version of morphological analyzer Ajka that is able to handle the basic and most productive derivational relations in Czech. A special derivational interface has been developed in our NLP Lab at FI MU by means of which we have explored the semantic nature of the selected noun derivational suffixes (22) as well as verb prefixes and established a set of the semantically labeled derivational relations, presently 14. With regard to the verbs we have paid attention to the selected verb semantic classes in connection with the derivational relations between selected prefixes (4) and corresponding Czech verbs. As an application we have added the selected derivational relations to the Czech WordNet and in this way enriched it with approx. 30 000 new Czech synsets.
Horák Aleš, Pala Karel, Rambousek Adam, Povolný Martin
DEBVisDic – First Version of New Client-Server Wordnet Browsing and Editing Tool
In: Proceedings of the Third International Wordnet Conference, 2006.
Presented at: The 3rd International WordNet Conference (GWC-06), 22.1.-26.1.2006, Jeju Island,
Korea.
In this paper, we present the new wordnet development
tool called DEBVisDic. It is built
on the recently developed platform for clientserver
XML databases, called DEBii. This
platform is able to cover many possible applications,
from which we concentrate on the new,
complete reimplementation of one of the mostspread
wordnet editor and browser – VisDic.
We argue for the benefits the new DEBii platform
brings to wordnet editing and to XML
databases in general. In the paper, we describe
the state of the implementation, the insides and
interfaces of the DEBVisDic tool. We also discuss
its functionality and some distinctions in
comparison with other dictionary writing systems.
Horák Aleš, Rambousek Adam
Administration Framework for the DEB Dictionary Server
In: Computer Treatment of Slavic and East European Languages, L. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, 2007, pp. 70-79.
Presented at: Slovko 2007, 25.-27.10.2007, Bratislava,
Slovakia.
This paper presents a new implementation of administration framework for the DEBII dictionary writing system. We present the details and examples of the user management part as well as graphical scenarios for dictionary service setup, adaptation and automatic generation of user application based on the dictionary XML schema.
Tento článek představuje novou implementaci administračního rozhraní systému pro tvorbu slovníků DEBII. V článku je podrobně popsán systém správy uživatelů a také grafikou doplněný postup vytvoření nového slovníku, jeho přizpůsobení a automatické generování uživatelské aplikace pomocí XML schématu slovníku.
Horák Aleš, Rambousek Adam
DEB Platform Deployment - Current Applications
In: RASLAN 2007: Recent Advances in Slavonic Natural Language Processing, Masaryk University, Brno, 2007, pp. 3-11.
In this paper, we summarize the latest development regarding the client dictionary writing applications based on the DEB development platform. The DEB framework is nowadays used in several full grown projects for preparation of high quality lexicographic data created within (possibly distant) teams of researchers. We briefly present the current list of DEB applications with the relevant projects and their phases. For each of the applications, we offer display the view of the interface with overview description of the most important features.
Horák Aleš, Rambousek Adam
Dictionary Management System for DEB Development Platform
In: NLPCS 2007: Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science, INSTICC PRESS, Funchal, Portugal, 2007, pp. 129-138.
Presented at: NLPCS 2007, 12.-16.6.2007, Funchal,
Madeira - Portugal.
In the paper, we introduce new dictionary management interface for design, preparation and presentation of generic electronic XML dictionaries using the DEB (Dictionary Editing and Browsing) development platform. The DEB platform provides a strict client-server environment for general dictionary writing systems. So far several successful NLP tools have been implemented on this platform, one of the most known being the DEBVisDic tool for wordnet semantic network editing and visualization. This paper describes a new part of the DEB platform -- the Administration interface that is shared by all DEB applications running on one server machine.
Článek představuje nové rozhraní pro správu slovníků, které umožňuje návrh, přípravu a prezentaci obecných elektronických slovníků ve formátu XML s použitím vývojové platformy DEB (Dictionary Editing and Browsing). Platforma DEB poskytuje prostředí v architektuře klient-server pro obecné systémy pro vytváření slovníků. V současné době bylo na této platformě implementováno několik úspěšných NLP nástrojů, nejznámnější je nástroj DEBVisDic pro editaci a vizualizaci sémantických sítí typu wordnet. Tento článek popisuje novou součást platformy DEB -- administrační rozhraní, které sdílí všechny DEB aplikace spuštěné na jednom serveru.
Horák Aleš, Pala Karel, Rambousek Adam
The Global WordNet Grid Software Design
In: Proc. of Fourth Global WordNet Conference, University of Szeged, Department of Informatics, 2008, pp. 194-199.
Presented at: GWC 2008, 22.-25.1.2008, Szeged,
Hungary.
Horák Aleš, Holan Tomáš, Kadlec V., Kovář Vojtěch
Dependency and Phrasal Parsers of the Czech Language: A Comparison
In: Proceedings of Text, Speech and Dialogue 2007, Springer, LNAI 4629, Berlin, Heidelberg, 2007, pp. 76-84.
Presented at: TSD 2007, 3.-7.9.2007, Plzeň,
Czech Republic.
In the paper, we present the results of an experiment with comparing the effectiveness of real text parsers of Czech language based on completely different approaches stochastic parsers that provide dependency trees as their outputs and a meta-grammar parser that generates a resulting chart structure representing a packed forest of phrasal derivation trees.
We describe and formulate main questions and problems accompanying such experiment, try to offer answers to these questions and finally display also factual results of the tests measured on 10 thousand Czech sentences.
Horák Aleš
Computer Processing of Czech Syntax and Semantics
In:
Horák Aleš, Pala Karel, Rambousek Adam
Tools for Managing Multiligual Lexical Resources
In: Proc. of International Conference Inteligent Information Systems, Polish Academy of Sciences, 2008, pp. 451-460.
Presented at: International Conference Inteligent Information Systems, , Zakopane, Poland.
Ivanova K., Heid U., Schulte im Walde S., Kilgarriff A., Pomikálek Jan
Evaluating a German Sketch Grammar: A Case Study on Noun Phrase Case
In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), European Language Resources Association (ELRA), 2008.
Presented at: International Conference on Language Resources and Evaluation, , Marrakech, Morocco.
Kovář Vojtěch, Horák Aleš
Reducing the Number of Resulting Parsing Trees for the Czech Language Using the Beautified Chart Method
In: Proceedings of 3rd Language and Technology Conference, Wydawnictwo Poznańskie, Poznań, 2007, pp. 433-437.
Presented at: LTC`07, 5.-7.2007, Poznań,
Poland.
In the paper, we present the beautified chart method used for reducing the number of output derivation trees for the Czech syntax parser synt. We show the evaluation results of the method, describe the appropriate algorithms and the parser internal data structures as well as problems with their implementation.
Článek popisuje metodu beautified chart pro omezení počtu výstupních derivačních stromů syntaktického analyzátoru češtiny synt. Je popsána naměřená úspěšnost metody, příslušné algoritmy, datové struktury a některé problémy při implementaci.
Nováček Vít, Smrž Pavel
Ontology Acquisition for Automatic Building of Scientific Portals
In: Proceedings of SOFSEM 2006: Theory and Practice of Computer Science, LNCS 3831, Springer-Verlag, Berlin, 2006, pp. 493-500.
ISBN: 3-540-31198-X
Presented at: SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic.
Ontologies are commonly considered as one of the essential parts of the Semantic Web vision, providing a theoretical basis and implementation framework for conceptual integration and information sharing among various domains. In this paper, we present the main principles of a new ontology acquisition framework applied for semi-automatic generation of scientific portals. Extracted ontological relations play a crucial role in the structuring of the information at the portal pages, automatic classification of the presented documents as well as for personalisation at the presentation level.
Nováček Vít
Motivations of Extensive Incorporation of Uncertainty in OLE Ontologies
In: Proceedings of SOFSEM 2006, Volume: II, ICS AS CR, Prague, 2006, pp. 145-154.
ISBN: 80-903298-4-5
Presented at: SOFSEM 2006: Theory and Practice of Computer Science, 21.1.-27.1.2006, Měřín,
Czech Republic.
Recently, the significance of uncertain information representation has become obvious in the Semantic Web community. This paper presents an ongoing research of uncertainty handling in automatically created ontologies. Proposal of a specific framework is provided. The research is related to OLE (Ontology LEarning), a project aimed at bottom-up generation a nd merging of domain specific ontologies. Formal systems that underlie the uncertai nty representation are briefly introduced. We will discuss a universal internal form at of uncertain conceptual structures in OLE then. The proposed format serves as a basis for inference tasks performed among an ontology. These topics are outlined as motivations of our future work.
Nováček Vít, Smrž Pavel
BOLE - A New Bio-Ontology Learning Platform
In: Proceedings of ECCB`05 Workshop, Workshop on Biomedical Ontologies and Text Processing, 2005.
Presented at: ECCB`05 Workshop, Workshop on Biomedical Ontologies and Text Processing, 28.9.2005, Madrid,
Spain.
This paper presents BOLE — a new platform for bottomup generation and merging of bio-ontologies. In contrast to other ontology-learning systems that are currently available, BOLE can be characterized by the modular architecture enabling integrating and comparing various methods of the automatic acquisition of semantic relations. We introduce the architecture of the tool and discuss the methodology of the employed synthetic bottom-up approach. OLITE — the central component responsible for the automatic acquisition of semantic relations from texts is described in detail. The presented preliminary results prove the efficiency of the implemented framework. We also provide a brief comparative overview of other relevant approaches and outline the future work on representation of uncertain knowledge for bio-ontology merging.
Nováček Vít, Smrž Pavel
OLE - A New Ontology Learning Platform
In: Proceedings of International Workshop on Text Mining Research, Practice and Opportunities, Incoma Ltd., 2005, pp. 12-16.
ISBN: 954-91743-1-X
Presented at: International Workshop on Text Mining Research, Practice and Opportunities, 24.9.2005, Borovets,
Bulgaria.
This paper presents OLE — a new platform for bottom-up generation and merging of ontologies. In contrast to other ontology-learning systems that are currently available, OLE can be characterized by the modular architecture enabling integrating and comparing various methods of the automatic acquisition of semantic relations. We introduce the architecture of the tool and discuss the methodology of the employed synthetic bottom-up approach. OLITE — the central component responsible for the automatic acquisition of semantic relations from texts is described in detail. The presented preliminary results prove the efficiency of the implemented framework. We also provide a brief comparative overview of other relevant approaches and outline the future work on representation of uncertain knowledge for ontology merging.
Nováček Vít, Smrž Pavel
Empirical Merging of Ontologies - A Proposal of Universal Uncertainty Representation Framework
In: The Semantic Web: Research and Applications - Proceedings of ESWC`06 - 3rd European Semantic Web Conference, LNCS 4011, Springer-Verlag, Berlin, 2006, pp. 65-79.
ISBN: 3-540-34544-2
Presented at: ESWC`06 - 3rd European Semantic Web Conference, 11.6.-14.6.2006, Budva,
Montenegro.
The significance of uncertainty representation has become obvious in the Semantic Web community recently. This paper presents our research on uncertainty handling in automatically created ontologies. A new framework for uncertain information processing is proposed. The research is related to OLE (Ontology LEarning) - a project aimed at bottom-up generation and merging of domain-specific ontologies. Formal systems that underlie the uncertainty representation are briefly introduced. We discuss the universal internal format of uncertain conceptual structures in OLE then and offer a utilisation example then. The proposed format serves as a basis for empirical improvement of initial knowledge acquisition methods as well as for general explicit inference tasks.
Nováček Vít, Smrž Pavel, Pomikálek Jan
Text Mining for Semantic Relations as a Support Base of a Scientific Portal Generator
In: Proceedings of LREC 2006 - 5th International Conference on Language Resources and Evaluation, ELRA, Paris, 2006, pp. 1338-1343.
ISBN: 2-9517408-2-4
Presented at: LREC 2006 - 5th International Conference on Language Resources and Evaluation, 24.5.-26.5.2006, Genoa,
Italy.
Current Semantic Web implementation efforts pose a number of challenges. One of the big ones among them is development and evolution of specific resources—the ontologies—as a base for representation of the meaning of the web. This paper deals with the automatic acquisition of semantic relations from the text of scientific publications (journal articles, conference papers, project descriptions, etc.). We also describe the process of building of corresponding ontological resources and their application for semi–automatic generation of scientific portals. Extracted relations and ontologies are crucial for the structuring of the information at the portal pages, automatic classification of the presented documents as well as for personalisation at the presentation level. Besides a general description of the portal generating system, we give also a detailed overview of extraction of semantic relations in the form of a domain–specific ontology. The overview consists of presentation of an architecture of the ontology extraction system, description of methods used for mining of semantic relations and analysis of selected results and examples.
Nováček Vít
Ontology Learning
In: Diploma Thesis, Faculty of Informatics, Masaryk University, Brno, 2006, pp. 1-65.
Ontology learning is one of the essential topics in the scope of an important area of current computer science and artificial intelligence - the upcoming Semantic Web. As the Semantic Web idea comprises semantically annotated descendant of the current world wide web and related tools and resources, the need of vast and reliable knowledge repositories is obvious. Ontologies present well defined, straightforward and standardised form of these repositories. There are many possible utilisations of ontologies - from automatic annotation of web resources to domain representation and reasoning tasks. However, the ontology creation process is very expensive, time-consuming and unobjective when performed manually. So a framework for automatic acquisition of ontologies would be very advantageous. In this work we present such a framework called OLE (an acronym for Ontology LEarning) and current results of its application. The main relevant topics, state of the art methods and techniques related to ontology acquisition are discussed as a part of theoretical background for the presentation of the OLE framework and respective results. Moreover, we describe also preliminary results of progressive research in the area of uncertain fuzzy ontology representation that will provide us with natural and reasonable instruments for dealing with inconsistencies in empiric data as well as for reasoning. Main future milestones of the ongoing research are debated as well.
Nováček Vít
Ontology Acquisition Supported by Imprecise Conceptual Refinement - New Results and Reasoning Perspectives
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 91-101.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
The significance of uncertainty representation has become
obvious in the Semantic Web community recently. This paper presents
new results of our research on uncertainty handling in ontologies created
automatically by means of Human Language Technologies. The research
is related to OLE (Ontology LEarning) a project aimed at bottom-up generation and merging of domain-specific ontologies. It utilises a
proposal of expressive fuzzy knowledge representation framework called
ANUIC. We discuss current achievements in taxonomy acquisition and
outline some interesting applications of the framework regarding non-traditional reasoning perspectives.
Nováček Vít
Imprecise Empirical Ontology Refinement: Application to Taxonomy Acquisition
In: Proceedings of ICEIS 2007, Kluwer Academic Publishing, Artificial Intelligence and Decision Support Systems, London, 2007, pp. 8.
(in_print)
Enterprise Information Systems (ICEIS 2007, revised selected papers), Springer, 2007, pp. 8.
(in_print)
Presented at: ICEIS 2007, 12.-16.6.2007, Funchal,
Madeira - Portugal.
Nováček Vít, Laera Loredana, Handschuh Siegfried
Dynamic Integration of Medical Ontologies in Large Scale
In: Proceedings of WWW2007/HCLSDI, ACM Press, New York, 2007, pp. 10.
(in_print)
Nováček Vít, Laera Loredana, Handschuh Siegfried
Aiding the Data Integration in Medicinal Settings by Means of Semantic Technologies
In: Making Semantics Work for Business, Semantic Technology Institutes International Workshop at European Semantic Technology Conference, Vienna, Austria, 2007.
(in_print)
Nováček Vít
A Non-traditional Inference Paradigm for Learned Ontologies
In: Proceedings of ESWC 2007 PhD Symposium, CEUR Workshop proceedings Workshop at ESWC 2007, Innsbruck, 2007.
Nováček Vít, Laera Loredana, Handschuh Siegfried
Semi-automatic Integration of Learned Ontologies into a Collaborative Framework
In: Proceedings of IWOD/ESWC 2007, Springer Verlag, Innsbruck, 2007, pp. 14.
(in_print)
Nováček Vít, Dabrowski Maciej, Kruk Sebastian R.
Extending Community Ontology Using Automatically Generated Suggestions
In: Proceedings of FLAIRS 2007, AAAI Press, Menlo Park, CA, 2007, pp. 6.
(in_print)
Nováček Vít, Handschuh Siegfried, Laera Loredana, Maynard Diana, Voelkel Max
Dynamic Ontology Lifecycle Scenario in Translational Medicine
In: Proceedings of the 5th European Conference of Computational Biology (ECCB 2006) - Book of Abstracts, Oxford University Press, Oxford, 2007, pp. 5.
(in print)
Nováček Vít
Automatic Knowledge Acquisition and Integration Technique: Application to Large Scale Taxonomy Extraction and Document Annotation
In: Proceedings of ICEIS 2007, Kluwer Academic Publishing, Artificial Intelligence and Decision Support Systems, London, 2008, pp. 160-172.
Enterprise Information Systems (ICEIS 2007, revised selected papers), Springer, 2008, pp. 160-172.
Presented at: ICEIS 2007, 12.-16.6.2007, Funchal,
Madeira - Portugal.
Novák David, Zezula Pavel
LOBS: Load Balancing for Similarity Peer-to-Peer Structures
Technical Report: FIMU-RS-2007-04, Faculty of Informatics, Masaryk University, Brno, 2007, 22 p.
Novák David, Zezula Pavel
Indexing the Distance Using Chord: A Distributed Similarity Search Structure
Presented at: 8th International DELOS Workshop on Future Digital Library Management Systems, 29.3.-1.4.2005, Schloss Dagstuhl,
Germany.
The need of search mechanisms based on data content rather then attributes values has recently lead to formation of the metric-based similarity retrieval. The computational complexity of such retrieval and the large volume of processed data call for distributed processing. In this paper, we propose chiDistance, a distributed data structure for similarity search in metric spaces. The structure is based on the idea of a vectorbased index method iDistance which enables to transform the issue of similarity search into the one-dimensional range search problem. A Peerto-Peer system based on the Chord protocol is created to distribute the storage space and to parallelize the execution of similarity queries. In the experiments conducted on our prototype implementation we study the system performance concentrating on several aspects of parallelism of the range search algorithm.
Novák David, Zezula Pavel
M-Chord: A Scalable Distributed Similarity Search Structure
In: InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, ACM Press, New York, NY, USA, 2006, pp. 1-10.
ISBN: 1-59593-428-6
The need for a retrieval based not on the attribute values but on the very data content has recently led to rise of
the metric-based similarity search. The computational complexity
of such a retrieval and large volumes of processed
data call for distributed processing which allows to achieve
scalability. In this paper, we propose M-Chord, a distributed
data structure for metric-based similarity search.
The structure takes advantage of the idea of a vector index
method iDistance in order to transform the issue of similarity
searching into the problem of interval search in one
dimension. The proposed peer-to-peer organization, based
on the Chord protocol, distributes the storage space and
parallelizes the execution of similarity queries. Promising
features of the structure are validated by experiments on the
prototype implementation and two real-life datasets.
Novák David
Image Similarity Search: Theory and Practice
In: Third Doctoral Workshop on Mathematical and Engineering Methods in Computer Science MEMICS 2007, Masaryk University and Technical University of Brno, Brno, 2007, pp. 154-160.
Presented at: MEMICS 2007, 26.10.-28.10.2007, Znojmo,
Czech Republic.
Novák David, Zezula Pavel
LOBS: Load Balancing for Similarity Peer-to-Peer Structures
In: Databases Information Systems and Peer-to-Peer Computing 2007, Springer Verlag, Berlin Heidelberg New York, 2007, pp. 1-8.
Presented at: DBISP2P 2007, 24.9.2007, Vienna,
Austria.
Novák David, Batko Michal, Dohnal Vlastislav, Zezula Pavel
Scaling up the Image Content-based Retrieval
In: Second DELOS Conference 2007 - Working Notes, DELOS Network of Excellence, Pisa, Italy, 2007, pp. 1-10.
Presented at: DELOS Conference 2007, 13-14.2.2007, Pisa,
Italy.
Novák David, Batko Michal, Zezula Pavel
Content-based Image Retrieval on the Web
In: Proceedings of the Poster and Demonstration Paper Track of the 1st Future Internet Symposium (FIS 2008), CEUR Workshop Proceedings, Vienna, 2008, pp. 1-3.
Novák David, Batko Michal, Zezula Pavel
Web-scale System for Image Similarity Search: When the Dreams Are Coming True
In: Proceedings of the Sixth International Workshop on Content-Based Multimedia Indexing (CBMI 2008), IEEE, London, 2008, pp. 446-453.
Pala Karel
The Balkanet Experience
In: Proceedings of the GLDV (German Linguistische Daten Vorarbeitung) Conference, Bonn, 2005.
Presented at: GLDV (German Linguistische Daten Vorarbeitung) Conference, 30.3.-1.4.2005, Bonn,
Germany.
This paper describes the exhaustive results obtained within IST 290388 Project Balkanet, which went on 2001-2004. The attention is paid to the restructuring and final shaping the individual Balkan WordNets. In comparison with the EuroWordNet Project some new results have been obtained: The sets of Base Concepts have been extended and a set of the Balkanet
1. Common Synsets has been introduced (8,000 synsets). These were relinked to Princeton WordNet 2.0 (PWN) and converted to XML standard format,
2. The language specific synsets that do not have translation equivalents in PWN 2.0 have been established for Balkanet languages,
3. Valency frames have been developed for Czech, Bulgarian and Romanian,
4. Domains have been added to Balkanet WordNets and implemented in the VisDic browser,
5. Integrating derivational relations into Czech WordNet and adding semantic relations into Turkish WordNet exploiting Turkish derivational morphology,
6. Links to the SUMO/MILO Ontology were added to and implemented in VisDic.
Pala Karel
Word Sketches and Semantic Roles
In: Proceedings of Corpus Linguistic Conference 2000, Saint-Petersburg State University, 2006.
ISBN: 5-288-04181-4
Pala Karel, Horák Aleš, Rambousek Adam, Vetulani Zygmunt, Konieczka Paweł, Marciniak Jacek, Obrębski Tomasz, Rzepecki Przemysław, Walkowska Justyna
DEB Platform tools for effective development of WordNets in application to PolNet
In: Proceedings of 3rd Language & Technology Conference, Fundacja Uniwersytetu im. A. Mickiewicza, Poznań, 2007, pp. 514-518.
Presented at: LTC`07, 5.-7.2007, Poznań,
Poland.
Pomikálek Jan, Řehůřek R.
The Influence of Preprocessing Parameters on Text Categorization
In: International Conference on Computer, Information and Systems Science and Engineering, Springer, 2007.
(in_print)
Presented at: XIX International Conference on Computer, Information and Systems Science and Engineering, 29.1.-31.1.2007, Bangkok,
Thailand.
Pomikálek Jan, Rychlý Pavel
Detecting Co-Derivative Documents in Large Text Collections
In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), European Language Resources Association (ELRA), 2008, pp. 132-135.
Presented at: International Conference on Language Resources and Evaluation, , Marrakech, Morocco.
Rychlý Pavel, Smrž Pavel
Manatee, Bonito and Word Sketches for Czech
In: Proceedings of the Second International Conference on Corpus Linguistics (Corpora-2004), Saint-Petersburg State University Press, Saint-Petersburg, Russia, 2004, pp. 124-132.
ISBN: 5-288-03531-8
Presented at: Second International Conference on Corpus Linguistics (Corpora-2004), , Saint-Petersburg,
Russia.
This paper deals with a newly designed and developed system Manatee that can be employed to manage corpora, especially extremely large ones with billions of words, and enables the efficient evaluation of complex queries and the computation of advanced statistics. The main functions of the tool are presented here, together with the introduction of its web-based graphical user interface, Bonito. The sophisticated statistical processing is demonstrated in an example of computing of Word Sketches. Special attention is paid to the definition of the word sketches for Czech and problems connected to its free word order.
Rychlý Pavel, Kovář Vojtěch
Displaying Bidirectional Text Concordances in KWIC format
In: Proceedings of 5th Biennial Conference of the Asian Association for Lexicography, University of Madras, Chennai, India, 2007, pp. 96-100.
Presented at: Asialex 2007, 6.-8.12.2007, Chennai,
India.
Rychlý Pavel
Manatee/Bonito - A Modular Corpus Manager
In: RASLAN 2007: Recent Advances in Slavonic Natural Language Processing, Masaryk University, Brno, 2007, pp. 97-102.
Rychlý Pavel, Kilgarriff A.
An Efficient Algorithm for Building a distributed Thesaurus (and other Sketch Engine Development)
In: Association for Computational Linguistics, Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, 2007, pp. 41-44.
Presented at: ACL 2007, 23.-30.6.2007, Prague,
Czech Republic.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav, Zezula Pavel
Adaptive Approximate Similarity Searching through Metric Social Networks
Technical Report: FIMU-RS-2007-06, Faculty of Informatics, Masaryk University, Brno, 2007, 22 p.
Exploiting the concepts of social networking represents a novel approach to the approximate
similarity query processing. We present an unstructured and dynamic P2P environment in
which a metric social network is built. Social communities of peers giving similar results
to specific queries are established and such ties are exploited for answering future queries.
Based on the universal law of generalization, a new query forwarding algorithmis introduced
and evaluated. The same principle is used to manage query histories of individual peers with
the possibility to tune the tradeoff between the extent of the history and the level of the queryanswer
approximation. All proposed algorithms are tested on real data and medium-sized
P2P networks consisting of tens of computers.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav, Zezula Pavel
Querying Similarity in Metric Social Networks
In: Network-Based Information Systems, First International Conference, NBiS 2007, Springer, Berlin, 2007, pp. 278-287.
Presented at: NBiS 2007, 3.-7.9.2007, Regensburg,
Germany.
In this paper we tackle the issues of exploiting the concepts of social networking in processing similarity queries in the environment of a P2P network. The processed similarity queries are laying the base on which the relationships among peers are created. Consequently, the communities encompassing similar data emerge in the network. The architecture of the presented metric social network is formally defined using the acquaintance and friendship relations. Two version of the navigation algorithm are presented and thoroughly experimentally evaluated. Finally, learning ability of the metric social network is presented and discussed.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav, Zezula Pavel
Adaptive Approximate Similarity Searching through Metric Social Networks
In: 24th International Conference on Data Engineering (ICDE 2008), 2008, pp. 3.
Presented at: 24th International Conference on Data Engineering, 7.-12.4.2008, Cancún,
Mexico.
Exploiting the concepts of social networking represents a novel
approach to the approximate similarity query processing. We present a metric
social network where relations between peers, giving similar results, are
established on per-query basis. Based on the universal law of
generalization, a new query forwarding algorithm is proposed. The same
principle is used to manage query histories of individual peers with the
possibility to tune the tradeoff between the extent of the history and the
level of the query-answer approximation. All algorithms are tested on real
data and real network of computers.
Sedmidubský Jan, Bartoň Stanislav, Dohnal Vlastislav
mSN: Metric Social Network for Similarity Searching
SW prototype
This prototype implements the idea of social networking in metric
spaces. The metric social network is a peer-to-peer network in which users
can share their data without the need to send them to a centralized node.
Searching in this system is based on the notion of similarity which is
modelled using metric spaces. The architecture of the metric social network
is formally defined by using acquaintance and friendship relations. The
implementation builds on top of the MESSIF framework library.
Sedmidubský Jan, Dohnal Vlastislav, Bartoň Stanislav, Zezula Pavel
A Self-organized System for Content-based Search in Multimedia.
In: IEEE International Symposium on Multimedia (ISM 2008), Patrick Kellenberger, Los Alamitos, California, 2008.
(in_print)
Smrž Pavel, Povolný Martin, Sinopalníková Anna
OASIS - A New Tool for the Transformation of XML Knowledge Resources into OWL
Presented at: ISWC 2004, 7.11.-11.11. 2004, Hiroshima,
Japan.
This paper presents OASIS – a new tool that enables (semi)automatic conversion of existing knowledge bases, semantic networks, terminological databases and various other resources to complex ontologies into OWL. The tool is implemented as a client of DEB (Dictionary Editor and Browser) which is able to store, index and efficiently retrieve lexical data. The architecture is based on XML and related W3C standards (XSLT, XML Schema, XPath, DOM). The main feature which brings the efficiency of the transformation is the extension of a standard XSLT processor with the ability to obtain additional data from the server through the mechanism of nested queries. This technique allows formulation of complex constraints needed in the conversion to OWL
Sojka Petr
Towards Digital Mathematical Library
In: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu, Ústav informatiky AV ČR, Prague, 2006, pp. 110-113.
ISBN: 80-903298-7-X
Presented at: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu - Seminář projektu programu Informační společnost, 5.10.-7.10.2006, Zadov,
Czech Republic.
This paper describes a prototype of the OCR math engine
built in the DML-CZ project. Solution stands on the combination of FineReader and InftyReader programmes. The achieved error rate (counting
not only character errors, but also errors in the recognition of structure
of mathematics notation) decreased from an initial 12% to under 1%.
Sojka Petr, Choi Key-Sun, Fellbaum Christiane, Vossen Piek
Proceedings of the Third International WordNet Conference, GWC 2006
In: Proceedings of the Third International Wordnet Conference, 2006.
Sojka Petr, Kopeček Ivan, Pala Karel
Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006
In: Text, Speech and Dialogue - Proceedings of the 9th International Conference, TSD 2006, LNCS 4188, Springer-Verlag, Berlin / Heidelberg, 2006.
ISBN: 978-3-540-39090-9
Zezula Pavel, Giuseppe Amato, Dohnal Vlastislav, Batko Michal
Similarity Search - The Metric Space Approach.
In: Advances in Database Systems, Volume: 32, Springer, 2006, pp. 220.
ISBN: 0-387-29146-6
Zezula Pavel
P2P Similarity Search Structures
In: Proceedings of the 14th Italian Symposium on Advanced Database Systems, peQuod, Ancona, Italy, 2006, pp. 1-12.
Presented as an invited talk: SEBD 2006 Fourteenth Italian Symposium on Advanced Database Systems, 18.6.-21.6.2006, Ancona,
Italy.
Zezula Pavel
Scalable Similarity Search in Computer Networks
In: Advances in Databases and Information Systems, LNCS 4152, Springer-Verlag, Berlin, 2006, pp. 3-3.
ISBN: 3-540-37899-5
Presented as an invited talk: Tenth East-European Conference on Advances in Databases and Information Systems, 3.9.-7.9.2006, Thessaloniki, Hellas,
Greece.
Zezula Pavel, Dohnal Vlastislav, Novák David
Towards Scalability of Similarity Searching
In: Global Data Management, (Ed. Baldoni R., Cortese G., Davide F., Melpignano A.), Volume: 8 of Emerging Communication: Studies in New Technologies and Practices in Communication, IOS Press, Amsterdam, The Netherlands, 2006.
ISBN: 1-58603-629-7
With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.
Zezula Pavel, Giuseppe Amato, Dohnal Vlastislav
Similarity Search: The Metric Space Approach
In: ACM SAC 2007 Conference. ACM SAC 2007 Conference Tutorial, ACM, Seoul, Korea, 2007.
Presented at: ACM SAC 2007, , Seoul,
Korea.
Similarity searching has become afundamental computational task in a variety of application areas, including multimedia information retrieval, data mining, pattern recognition, machine learning, computer vision, biomedical databases, data compression and statistical data analysis. In such environments, an exact match has little meaning, and proximity/distance (similarity/dissimilarity) concepts are typically much more fruitful for searching. In this tutorial, we review the state of the art in developing similarity search mechanisms that accept the metric space paradigm. We explain the high extensibility of the metric space approach and demonstrate its capability with examples of distance functions. The efforts to further speed up retrieval are demonstrated by a class of approximated techniques and the very recent proposals of scalable and distributed structures based on the P2P communication paradigm.
Similarity searching has become afundamental computational task in a variety of application areas, including multimedia information retrieval, data mining, pattern recognition, machine learning, computer vision, biomedical databases, data compression and statistical data analysis. In such environments, an exact match has little meaning, and proximity/distance (similarity/dissimilarity) concepts are typically much more fruitful for searching. In this tutorial, we review the state of the art in developing similarity search mechanisms that accept the metric space paradigm. We explain the high extensibility of the metric space approach and demonstrate its capability with examples of distance functions. The efforts to further speed up retrieval are demonstrated by a class of approximated techniques and the very recent proposals of scalable and distributed structures based on the P2P communication paradigm.
Zezula Pavel, Dohnal Vlastislav, Batko Michal
File Organizations
In: Wiley Encyclopedia of Computer Science and Engineering, Wiley-Interscience, San Francisco, CA, USA, 2008, pp. 1-11.
Zezula Pavel, Batko Michal, Dohnal Vlastislav
Indexing Metric Spaces
In: Database Management and Information Retrieval, Springer-Verlag, New York, 2008, pp. 1-4.