Große Auswahl an günstigen Büchern
Schnelle Lieferung per Post und DHL

Bücher der Reihe Synthesis Lectures on Data Management

Filter
Filter
Ordnen nachSortieren Reihenfolge der Serie
  • von Pingcheng Ruan
    53,00 €

    This book takes readers through the sensational history of blockchains and their potential to revolutionize database systems of the future. In order to demystify blockchains, the book capitalizes on decades of research and field testing of existing database and distributed systems and applies these familiar concepts to the novel blockchain system. It then utilizes this framework to explore the essential block platform underpinning blockchains, which is often misunderstood as a specific attribute of cryptocurrencies rather than the core of the decentralized system independent of application. The book explores the nature of these decentralized systems, which have no single owner and build robustness through a multitude of stakeholder contributions. In this way, blockchains can build trust into existing systems and thus present attractive solutions for various domains across both academia and industry. Despite this, high-impact and real-world applications of blockchain have yet to be realized outside of cryptocurrencies like Bitcoin. The book establishes how this new data system, if properly applied, can disrupt the sector in much the same way databases did so many years ago. The book explores the fundamental technical limitations that may be preventing blockchain from realizing this potential and how to overcome or mitigate them. Readers who are completely new to blockchains will find this book to be a comprehensive survey of the state of the art in blockchain technology. Readers with some experience of blockchains, for example through developing cryptocurrencies, will likely find the book's database perspective enlightening. Finally, researchers already working with blockchain will learn to identify existing gaps in the design space and explore potential solutions for creating the next generation of blockchain systems.

  • von Yunyao Li
    37,00 €

    This book presents a comprehensive overview of Natural Language Interfaces to Databases (NLIDBs), an indispensable tool in the ever-expanding realm of data-driven exploration and decision making. After first demonstrating the importance of the field using an interactive ChatGPT session, the book explores the remarkable progress and general challenges faced with real-world deployment of NLIDBs. It goes on to provide readers with a holistic understanding of the intricate anatomy, essential components, and mechanisms underlying NLIDBs and how to build them. Key concepts in representing, querying, and processing structured data as well as approaches for optimizing user queries are established for the reader before their application in NLIDBs is explored. The book discusses text to data through early relevant work on semantic parsing and meaning representation before turning to cutting-edge advancements in how NLIDBs are empowered to comprehend and interpret human languages. Various evaluation methodologies, metrics, datasets and benchmarks that play a pivotal role in assessing the effectiveness of mapping natural language queries to formal queries in a database and the overall performance of a system are explored. The book then covers data to text, where formal representations of structured data are transformed into coherent and contextually relevant human-readable narratives. It closes with an exploration of the challenges and opportunities related to interactivity and its corresponding techniques for each dimension, such as instances of conversational NLIDBs and multi-modal NLIDBs where user input is beyond natural language. This book provides a balanced mixture of theoretical insights, practical knowledge, and real-world applications that will be an invaluable resource for researchers, practitioners, and students eager to explore the fundamental concepts of NLIDBs.

  • von Lei Chen
    40,00 €

    This book examines the recent trend of extending data dependencies to adapt to rich data types in order to address variety and veracity issues in big data. Readers will be guided through the full range of rich data types where data dependencies have been successfully applied, including categorical data with equality relationships, heterogeneous data with similarity relationships, numerical data with order relationships, sequential data with timestamps, and graph data with complicated structures. The text will also discuss interesting constraints on ordering or similarity relationships contained in novel classes of data dependencies in addition to those in equality relationships, e.g., considered in functional dependencies (FDs). In addition to exploring the concepts of these data dependency notations, the book investigates the extension relationships between data dependencies, such as conditional functional dependencies (CFDs) that extend conventional functional dependencies (FDs). This forms in the book a family tree of extensions, mostly rooted in FDs, that help illuminate the expressive power of various data dependencies. Moreover, the book points to work on the discovery of dependencies from data, since data dependencies are often unlikely to be manually specified in a traditional way, given the huge volume and high variety in big data. It further outlines the applications of the extended data dependencies, in particular in data quality practice. Altogether, this book provides a comprehensive guide for readers to select proper data dependencies for their applications that have sufficient expressive power and reasonable discovery cost. Finally, the book concludes with several directions of future studies on emerging data.

  • von Jeffrey Xu Yu
    29,00 €

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from finding documents that contain all the user-given keywords. The former focuses on the interconnected object structures, whereas the latter focuses on the object content. The book is organized as follows. In Chapter 1, we highlight the main research issues on the structural keyword search in different contexts. In Chapter 2, we focus on supporting structural keyword search in a relational database management system using the SQL query language. We concentrate on how to generate a set of SQL queries that can find all the structural information among records in a relational database completely, and how to evaluate the generated set of SQL queries efficiently. In Chapter 3, we discuss graph algorithms for structural keyword search by treating an entire relational database as a large data graph. In Chapter 4, we discuss structural keyword search in a large tree-structured XML database. In Chapter 5, we highlight several interesting research issues regarding keyword search on databases. The book can be used as either an extended survey for people who are interested in the structural keyword search or a reference book for a postgraduate course on the related topics. Table of Contents: Introduction / Schema-Based Keyword Search on Relational Databases / Graph-Based Keyword Search / Keyword Search in XML Databases / Other Topics for Keyword Search on Databases

  • von Venkatesh Ganti
    26,00 €

    Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.

  • von Lei Chen
    35,00 €

    Due to measurement errors, transmission lost, or injected noise for privacy protection, uncertainty exists in the data of many real applications. However, query processing techniques for deterministic data cannot be directly applied to uncertain data because they do not have mechanisms to handle data uncertainty. Therefore, efficient and effective manipulation of uncertain data is a practical yet challenging research topic. In this book, we start from the data models for imprecise and uncertain data, move on to defining different semantics for queries on uncertain data, and finally discuss the advanced query processing techniques for various probabilistic queries in uncertain databases. The book serves as a comprehensive guideline for query processing over uncertain databases. Table of Contents: Introduction / Uncertain Data Models / Spatial Query Semantics over Uncertain Data Models / Spatial Query Processing over Uncertain Databases / Conclusion

  • von Divyakant Agrawal
    35,00 €

    Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications that are being deployed in various cloud platforms. There has also been an increase in the scale of the data generated as well as consumed by such applications. Scalable database management systems form a critical part of the cloud infrastructure. The attempt to address the challenges posed by the management of big data has led to a plethora of systems. This book aims to clarify some of the important concepts in the design space of scalable data management in cloud computing infrastructures. Some of the questions that this book aims to answer are: the appropriate systems for a specific set of application requirements, the research challenges in data management for the cloud, and what is novel in the cloud for database researchers? We also aim to address one basic question: whether cloud computing poses new challenges in scalable data management or it is just a reincarnation of old problems? We provide a comprehensive background study of state-of-the-art systems for scalable data management and analysis. We also identify important aspects in the design of different systems and the applicability and scope of these systems. A thorough understanding of current solutions and a precise characterization of the design space are essential for clearing the "e;cloudy skies of data management"e; and ensuring the success of DBMSs in the cloud, thus emulating the success enjoyed by relational databases in traditional enterprise settings. Table of Contents: Introduction / Distributed Data Management / Cloud Data Management: Early Trends / Transactions on Co-located Data / Transactions on Distributed Data / Multi-tenant Database Systems / Concluding Remarks

  • von Amit Sheth
    35,00 €

    After the traditional document-centric Web 1.0 and user-generated content focused Web 2.0, Web 3.0 has become a repository of an ever growing variety of Web resources that include data and services associated with enterprises, social networks, sensors, cloud, as well as mobile and other devices that constitute the Internet of Things. These pose unprecedented challenges in terms of heterogeneity (variety), scale (volume), and continuous changes (velocity), as well as present corresponding opportunities if they can be exploited. Just as semantics has played a critical role in dealing with data heterogeneity in the past to provide interoperability and integration, it is playing an even more critical role in dealing with the challenges and helping users and applications exploit all forms of Web 3.0 data. This book presents a unified approach to harness and exploit all forms of contemporary Web resources using the core principles of ability to associate meaning with data through conceptual or domain models and semantic descriptions including annotations, and through advanced semantic techniques for search, integration, and analysis. It discusses the use of Semantic Web standards and techniques when appropriate, but also advocates the use of lighter weight, easier to use, and more scalable options when they are more suitable. The authors' extensive experience spanning research and prototypes to development of operational applications and commercial technologies and products guide the treatment of the material. Table of Contents: Role of Semantics and Metadata / Types and Models of Semantics / Annotation -- Adding Semantics to Data / Semantics for Enterprise Data / Semantics for Services / Semantics for Sensor Data / Semantics for Social Data / Semantics for Cloud Computing / Semantics for Advanced Applications

  • von Sergio Greco
    42,00 €

    The chase has long been used as a central tool to analyze dependencies and their effect on queries. It has been applied to different relevant problems in database theory such as query optimization, query containment and equivalence, dependency implication, and database schema design. Recent years have seen a renewed interest in the chase as an important tool in several database applications, such as data exchange and integration, query answering in incomplete data, and many others. It is well known that the chase algorithm might be non-terminating and thus, in order for it to find practical applicability, it is crucial to identify cases where its termination is guaranteed. Another important aspect to consider when dealing with the chase is that it can introduce null values into the database, thereby leading to incomplete data. Thus, in several scenarios where the chase is used the problem of dealing with data dependencies and incomplete data arises. This book discusses fundamental issues concerning data dependencies and incomplete data with a particular focus on the chase and its applications in different database areas. We report recent results about the crucial issue of identifying conditions that guarantee the chase termination. Different database applications where the chase is a central tool are discussed with particular attention devoted to query answering in the presence of data dependencies and database schema design. Table of Contents: Introduction / Relational Databases / Incomplete Databases / The Chase Algorithm / Chase Termination / Data Dependencies and Normal Forms / Universal Repairs / Chase and Database Applications

  • von Wenfei Fan
    35,00 €

    Data quality is one of the most important problems in data management. A database system typically aims to support the creation, maintenance, and use of large amount of data, focusing on the quantity of data. However, real-life data are often dirty: inconsistent, duplicated, inaccurate, incomplete, or stale. Dirty data in a database routinely generate misleading or biased analytical results and decisions, and lead to loss of revenues, credibility and customers. With this comes the need for data quality management. In contrast to traditional data management tasks, data quality management enables the detection and correction of errors in the data, syntactic or semantic, in order to improve the quality of the data and hence, add value to business processes. While data quality has been a longstanding problem for decades, the prevalent use of the Web has increased the risks, on an unprecedented scale, of creating and propagating dirty data. This monograph gives an overview of fundamental issues underlying central aspects of data quality, namely, data consistency, data deduplication, data accuracy, data currency, and information completeness. We promote a uniform logical framework for dealing with these issues, based on data quality rules. The text is organized into seven chapters, focusing on relational data. Chapter One introduces data quality issues. A conditional dependency theory is developed in Chapter Two, for capturing data inconsistencies. It is followed by practical techniques in Chapter 2b for discovering conditional dependencies, and for detecting inconsistencies and repairing data based on conditional dependencies. Matching dependencies are introduced in Chapter Three, as matching rules for data deduplication. A theory of relative information completeness is studied in Chapter Four, revising the classical Closed World Assumption and the Open World Assumption, to characterize incomplete information in the real world. A data currency model is presented in Chapter Five, to identify the current values of entities in a database and to answer queries with the current values, in the absence of reliable timestamps. Finally, interactions between these data quality issues are explored in Chapter Six. Important theoretical results and practical algorithms are covered, but formal proofs are omitted. The bibliographical notes contain pointers to papers in which the results were presented and proven, as well as references to materials for further reading. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of data quality. The fundamental research on data quality draws on several areas, including mathematical logic, computational complexity and database theory. It has raised as many questions as it has answered, and is a rich source of questions and vitality. Table of Contents: Data Quality: An Overview / Conditional Dependencies / Cleaning Data with Conditional Dependencies / Data Deduplication / Information Completeness / Data Currency / Interactions between Data Quality Issues

  • von Tova Milo
    31,00 €

    While classic data management focuses on the data itself, research on Business Processes also considers the context in which this data is generated and manipulated, namely the processes, users, and goals that this data serves. This provides the analysts a better perspective of the organizational needs centered around the data. As such, this research is of fundamental importance. Much of the success of database systems in the last decade is due to the beauty and elegance of the relational model and its declarative query languages, combined with a rich spectrum of underlying evaluation and optimization techniques, and efficient implementations. Much like the case for traditional database research, elegant modeling and rich underlying technology are likely to be highly beneficiary for the Business Process owners and their users; both can benefit from easy formulation and analysis of the processes. While there have been many important advances in this research in recent years, there is still much to be desired: specifically, there have been many works that focus on the processes behavior (flow), and many that focus on its data, but only very few works have dealt with both the state-of-the-art in a database approach to Business Process modeling and analysis, the progress towards a holistic flow-and-data framework for these tasks, and highlight the current gaps and research directions. Table of Contents: Introduction / Modeling / Querying Business Processes / Other Issues / Conclusion

  • von Elisa Bertino
    35,00 €

    As data represent a key asset for today's organizations, the problem of how to protect this data from theft and misuse is at the forefront of these organizations' minds. Even though today several data security techniques are available to protect data and computing infrastructures, many such techniques -- such as firewalls and network security tools -- are unable to protect data from attacks posed by those working on an organization's "e;inside."e; These "e;insiders"e; usually have authorized access to relevant information systems, making it extremely challenging to block the misuse of information while still allowing them to do their jobs. This book discusses several techniques that can provide effective protection against attacks posed by people working on the inside of an organization. Chapter One introduces the notion of insider threat and reports some data about data breaches due to insider threats. Chapter Two covers authentication and access control techniques, and Chapter Three shows how these general security techniques can be extended and used in the context of protection from insider threats. Chapter Four addresses anomaly detection techniques that are used to determine anomalies in data accesses by insiders. These anomalies are often indicative of potential insider data attacks and therefore play an important role in protection from these attacks. Security information and event management (SIEM) tools and fine-grained auditing are discussed in Chapter Five. These tools aim at collecting, analyzing, and correlating -- in real-time -- any information and event that may be relevant for the security of an organization. As such, they can be a key element in finding a solution to such undesirable insider threats. Chapter Six goes on to provide a survey of techniques for separation-of-duty (SoD). SoD is an important principle that, when implemented in systems and tools, can strengthen data protection from malicious insiders. However, to date, very few approaches have been proposed for implementing SoD in systems. In Chapter Seven, a short survey of a commercial product is presented, which provides different techniques for protection from malicious users with system privileges -- such as a DBA in database management systems. Finally, in Chapter Eight, the book concludes with a few remarks and additional research directions. Table of Contents: Introduction / Authentication / Access Control / Anomaly Detection / Security Information and Event Management and Auditing / Separation of Duty / Case Study: Oracle Database Vault / Conclusion

  • von Eduard C. Dragut
    37,00 €

    There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches. This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration. Table of Contents: Introduction / Query Interface Representation and Extraction / Query Interface Clustering and Categorization / Query Interface Matching / Query Interface Attribute Integration / Query Interface Integration / Summary and Future Research

  • von Esther Pacitti
    35,00 €

    As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g., DHTs) for efficient indexing, or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this book, we will give an overview of these different P2P techniques and architectures, discuss their trade-offs, and illustrate their use for decentralizing several large-scale data sharing applications. Table of Contents: P2P Overlays, Query Routing, and Gossiping / Content Distribution in P2P Systems / Recommendation Systems / Top-k Query Processing in P2P Systems

  • von Hweehwa Pang
    35,00 €

    In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the servers of the publisher may be untrusted or susceptible to attacks, we cannot assume that they would always process queries correctly, hence there is a need for users to authenticate their query answers. This book introduces various notions that the research community has studied for defining the correctness of a query answer. In particular, it is important to guarantee the completeness, authenticity and minimality of the answer, as well as its freshness. We present authentication mechanisms for a wide variety of queries in the context of relational and spatial databases, text retrieval, and data streams. We also explain the cryptographic protocols from which the authentication mechanisms derive their security properties. Table of Contents: Introduction / Cryptography Foundation / Relational Queries / Spatial Queries / Text Search Queries / Data Streams / Conclusion

  • von Boon Thau Loo
    35,00 €

    Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. Declarative networking proposes the use of a declarative query language for specifying and implementing network protocols, and employs a dataflow framework at runtime for communication and maintenance of network state. The primary goal of declarative networking is to greatly simplify the process of specifying, implementing, deploying and evolving a network design. In addition, declarative networking serves as an important step towards an extensible, evolvable network architecture that can support flexible, secure and efficient deployment of new network protocols. This book provides an introduction to basic issues in declarative networking, including language design, optimization and dataflow execution. The methodology behind declarative programming of networks is presented, including roots in Datalog, extensions for networked environments, and the semantics of long-running queries over network state. The book focuses on a representative declarative networking language called Network Datalog (NDlog), which is based on extensions to the Datalog recursive query language. An overview of declarative network protocols written in NDlog is provided, and its usage is illustrated using examples from routing protocols and overlay networks. This book also describes the implementation of a declarative networking engine and NDlog execution strategies that provide eventual consistency semantics with significant flexibility in execution. Two representative declarative networking systems (P2 and its successor RapidNet) are presented. Finally, the book highlights recent advances in declarative networking, and new declarative approaches to related problems. Table of Contents: Introduction / Declarative Networking Language / Declarative Networking Overview / Distributed Recursive Query Processing / Declarative Routing / Declarative Overlays / Optimization of NDlog / Recent Advances in Declarative Networking / Conclusion

  • von Marina Barsky
    29,00 €

    Nowadays, textual databases are among the most rapidly growing collections of data. Some of these collections contain a new type of data that differs from classical numerical or textual data. These are long sequences of symbols, not divided into well-separated small tokens (words). The most prominent among such collections are databases of biological sequences, which are experiencing today an unprecedented growth rate. Starting in 2008, the "e;1000 Genomes Project"e; has been launched with the ultimate goal of collecting sequences of additional 1,500 Human genomes, 500 each of European, African, and East Asian origin. This will produce an extensive catalog of Human genetic variations. The size of just the raw sequences in this catalog would be about 5 terabytes. Querying strings without well-separated tokens poses a different set of challenges, typically addressed by building full-text indexes, which provide effective structures to index all the substrings of the given strings. Since full-text indexes occupy more space than the raw data, it is often necessary to use disk space for their construction. However, until recently, the construction of full-text indexes in secondary storage was considered impractical due to excessive I/O costs. Despite this, algorithms developed in the last decade demonstrated that efficient external construction of full-text indexes is indeed possible. This book is about large-scale construction and usage of full-text indexes. We focus mainly on suffix trees, and show efficient algorithms that can convert suffix trees to other kinds of full-text indexes and vice versa. There are four parts in this book. They are a mix of string searching theory with the reality of external memory constraints. The first part introduces general concepts of full-text indexes and shows the relationships between them. The second part presents the first series of external-memory construction algorithms that can handle the construction of full-text indexes for moderately large strings in the order of few gigabytes. The third part presents algorithms that scale for very large strings. The final part examines queries that can be facilitated by disk-resident full-text indexes. Table of Contents: Structures for Indexing Substrings / External Construction of Suffix Trees / Scaling Up: When the Input Exceeds the Main Memory / Queries for Disk-based Indexes / Conclusions and Open Problems

  • von Nikos Mamoulis
    37,00 €

    Spatial database management deals with the storage, indexing, and querying of data with spatial features, such as location and geometric extent. Many applications require the efficient management of spatial data, including Geographic Information Systems, Computer Aided Design, and Location Based Services. The goal of this book is to provide the reader with an overview of spatial data management technology, with an emphasis on indexing and search techniques. It first introduces spatial data models and queries and discusses the main issues of extending a database system to support spatial data. It presents indexing approaches for spatial data, with a focus on the R-tree. Query evaluation and optimization techniques for the most popular spatial query types (selections, nearest neighbor search, and spatial joins) are portrayed for data in Euclidean spaces and spatial networks. The book concludes by demonstrating the ample application of spatial data management technology on a wide range of related application domains: management of spatio-temporal data and high-dimensional feature vectors, multi-criteria ranking, data mining and OLAP, privacy-preserving data publishing, and spatial keyword search. Table of Contents: Introduction / Spatial Data / Indexing / Spatial Query Evaluation / Spatial Networks / Applications of Spatial Data Management Technology

  • von Leopoldo Bertossi
    35,00 €

    Integrity constraints are semantic conditions that a database should satisfy in order to be an appropriate model of external reality. In practice, and for many reasons, a database may not satisfy those integrity constraints, and for that reason it is said to be inconsistent. However, and most likely, a large portion of the database is still semantically correct, in a sense that has to be made precise. After having provided a formal characterization of consistent data in an inconsistent database, the natural problem emerges of extracting that semantically correct data, as query answers. The consistent data in an inconsistent database is usually characterized as the data that persists across all the database instances that are consistent and minimally differ from the inconsistent instance. Those are the so-called repairs of the database. In particular, the consistent answers to a query posed to the inconsistent database are those answers that can be simultaneously obtained from all the database repairs. As expected, the notion of repair requires an adequate notion of distance that allows for the comparison of databases with respect to how much they differ from the inconsistent instance. On this basis, the minimality condition on repairs can be properly formulated. In this monograph we present and discuss these fundamental concepts, different repair semantics, algorithms for computing consistent answers to queries, and also complexity-theoretic results related to the computation of repairs and doing consistent query answering. Table of Contents: Introduction / The Notions of Repair and Consistent Answer / Tractable CQA and Query Rewriting / Logically Specifying Repairs / Decision Problems in CQA: Complexity and Algorithms / Repairs and Data Cleaning

  • von Amarnath Gupta
    35,00 €

    With the proliferation of citizen reporting, smart mobile devices, and social media, an increasing number of people are beginning to generate information about events they observe and participate in. A significant fraction of this information contains multimedia data to share the experience with their audience. A systematic information modeling and management framework is necessary to capture this widely heterogeneous, schemaless, potentially humongous information produced by many different people. This book is an attempt to examine the modeling, storage, querying, and applications of such an event management system in a holistic manner. It uses a semantic-web style graph-based view of events, and shows how this event model, together with its query facility, can be used toward emerging applications like semi-automated storytelling. Table of Contents: Introduction / Event Data Models / Implementing an Event Data Model / Querying Events / Storytelling with Events / An Emerging Application / Conclusion

  • von David Toman
    35,00 €

    Query compilation is the problem of translating user requests formulated over purely conceptual and domain specific ways of understanding data, commonly called logical designs, to efficient executable programs called query plans. Such plans access various concrete data sources through their low-level often iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and how such capabilities relate to logical design is commonly called a physical design. This book is an introduction to the fundamental methods underlying database technology that solves the problem of query compilation. The methods are presented in terms of first-order logic which serves as the vehicle for specifying physical design, expressing user requests and query plans, and understanding how query plans implement user requests. Table of Contents: Introduction / Logical Design and User Queries / Basic Physical Design and Query Plans / On Practical Physical Design / Query Compilation and Plan Synthesis / Updating Data

  • von Dan Suciu
    37,00 €

    Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases. Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques

  • von Suyash Gupta
    64,00 €

    Since the introduction of Bitcoin-the first widespread application driven by blockchain-the interest of the public and private sectors in blockchain has skyrocketed. In recent years, blockchain-based fabrics have been used to address challenges in diverse fields such as trade, food production, property rights, identity-management, aid delivery, health care, and fraud prevention. This widespread interest follows from fundamental concepts on which blockchains are built that together embed the notion of trust, upon which blockchains are built. 1. Blockchains provide data transparancy. Data in a blockchain is stored in the form of a ledger, which contains an ordered history of all the transactions. This facilitates oversight and auditing. 2. Blockchains ensure data integrity by using strong cryptographic primitives. This guarantees that transactions accepted by the blockchain are authenticated by its issuer, are immutable, and cannot be repudiated by the issuer. This ensures accountability. 3. Blockchains are decentralized, democratic, and resilient. They use consensus-based replication to decentralize the ledger among many independent participants. Thus, it can operate completely decentralized and does not require trust in a single authority. Additions to the chain are performed by consensus, in which all participants have a democratic voice in maintaining the integrity of the blockchain. Due to the usage of replication and consensus, blockchains are also highly resilient to malicious attacks even when a significant portion of the participants are malicious. It further increases the opportunity for fairness and equity through democratization. These fundamental concepts and the technologies behind them-a generic ledger-based data model, cryptographically ensured data integrity, and consensus-based replication-prove to be a powerful and inspiring combination, a catalyst to promote computational trust. In this book, we present an in-depth study of blockchain, unraveling its revolutionary promise to instill computational trust in society, all carefully tailored to a broad audience including students, researchers, and practitioners. We offer a comprehensive overview of theoretical limitations and practical usability of consensus protocols while examining the diverse landscape of how blockchains are manifested in their permissioned and permissionless forms.

  • von Apostolos N. Papadopoulos
    53,00 €

    This book is a gentle introduction to dominance-based query processing techniques and their applications. The book aims to present fundamental as well as some advanced issues in the area in a precise, but easy-to-follow, manner. Dominance is an intuitive concept that can be used in many different ways in diverse application domains. The concept of dominance is based on the values of the attributes of each object. An object ,,,, dominates another object ,,,, if ,,,, is better than ,,,,. This goodness criterion may differ from one user to another. However, all decisions boil down to the minimization or maximization of attribute values. In this book, we will explore algorithms and applications related to dominance-based query processing. The concept of dominance has a long history in finance and multi-criteria optimization. However, the introduction of the concept to the database community in 2001 inspired many researchers to contribute to the area. Therefore, many algorithmic techniques have been proposed for the efficient processing of dominance-based queries, such as skyline queries, ,,,,-dominant queries, and top-,,,, dominating queries, just to name a few.

  • von Zoi Kaoudi
    35,00 €

    Resource Description Framework (or RDF, in short) is set to deliver many of the original semi-structured data promises: flexible structure, optional schema, and rich, flexible Universal Resource Identifiers as a basis for information sharing. Moreover, RDF is uniquely positioned to benefit from the efforts of scientific communities studying databases, knowledge representation, and Web technologies. As a consequence, the RDF data model is used in a variety of applications today for integrating knowledge and information: in open Web or government data via the Linked Open Data initiative, in scientific domains such as bioinformatics, and more recently in search engines and personal assistants of enterprises in the form of knowledge graphs.Managing such large volumes of RDF data is challenging due to the sheer size, heterogeneity, and complexity brought by RDF reasoning. To tackle the size challenge, distributed architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications requiring distributed architectures for the scalability, fault tolerance, and elasticity features it provides. At the same time, interest in massively parallel processing has been renewed by the MapReduce model and many follow-up works, which aim at simplifying the deployment of massively parallel data management tasks in a cloud environment.In this book, we study the state-of-the-art RDF data management in cloud environments and parallel/distributed architectures that were not necessarily intended for the cloud, but can easily be deployed therein. After providing a comprehensive background on RDF and cloud technologies, we explore four aspects that are vital in an RDF data management system: data storage, query processing, query optimization, and reasoning. We conclude the book with a discussion on open problems and future directions.

  • von Xin Huang
    53,00 €

    Communities serve as basic structural building blocks for understanding the organization of many real-world networks, including social, biological, collaboration, and communication networks. Recently, community search over graphs has attracted significantly increasing attention, from small, simple, and static graphs to big, evolving, attributed, and location-based graphs.In this book, we first review the basic concepts of networks, communities, and various kinds of dense subgraph models. We then survey the state of the art in community search techniques on various kinds of networks across different application areas. Specifically, we discuss cohesive community search, attributed community search, social circle discovery, and geo-social group search. We highlight the challenges posed by different community search problems. We present their motivations, principles, methodologies, algorithms, and applications, and provide a comprehensive comparison of the existing techniques. This book finally concludes by listing publicly available real-world datasets and useful tools for facilitating further research, and by offering further readings and future directions of research in this important and growing area.

  • von Goetz Graefe
    69,00 €

    This book contains a number of chapters on transactional database concurrency control. This volume's entire sequence of chapters can summarized as follows: A two-sentence summary of the volume's entire sequence of chapters is this: traditional locking techniques can be improved in multiple dimensions, notably in lock scopes (sizes), lock modes (increment, decrement, and more), lock durations (late acquisition, early release), and lock acquisition sequence (to avoid deadlocks). Even if some of these improvements can be transferred to optimistic concurrency control, notably a fine granularity of concurrency control with serializable transaction isolation including phantom protection, pessimistic concurrency control is categorically superior to optimistic concurrency control, i.e., independent of application, workload, deployment, hardware, and software implementation.

  • von Daniel Oliveira
    64,00 €

    Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines.More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc.Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data.In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

  • von Foto Afrati
    85,00 €

    The topic of using views to answer queries has been popular for a few decades now, as it cuts across domains such as query optimization, information integration, data warehousing, website design and, recently, database-as-a-service and data placement in cloud systems.This book assembles foundational work on answering queries using views in a self-contained manner, with an effort to choose material that constitutes the backbone of the research. It presents efficient algorithms and covers the following problems: query containment; rewriting queries using views in various logical languages; equivalent rewritings and maximally contained rewritings; and computing certain answers in the data-integration and data-exchange settings. Query languages that are considered are fragments of SQL, in particular select-project-join queries, also called conjunctive queries (with or without arithmetic comparisons or negation), and aggregate SQL queries. This second edition includes two new chapters that refer to tree-like data and respective query languages. Chapter 8 presents the data model for XML documents and the XPath query language, and Chapter 9 provides a theoretical presentation of tree-like data model and query language where the tuples of a relation share a tree-structured schema for that relation and the query language is a dialect of SQL with evaluation techniques appropriately modified to fit the richer schema.

  • von Mohammad Sadoghi
    58,00 €

    The last decade has brought groundbreaking developments in transaction processing. This resurgence of an otherwise mature research area has spurred from the diminishing cost per GB of DRAM that allows many transaction processing workloads to be entirely memory-resident. This shift demanded a pause to fundamentally rethink the architecture of database systems. The data storage lexicon has now expanded beyond spinning disks and RAID levels to include the cache hierarchy, memory consistency models, cache coherence and write invalidation costs, NUMA regions, and coherence domains. New memory technologies promise fast non-volatile storage and expose unchartered trade-offs for transactional durability, such as exploiting byte-addressable hot and cold storage through persistent programming that promotes simpler recovery protocols. In the meantime, the plateauing single-threaded processor performance has brought massive concurrency within a single node, first in the form of multi-core, and now with many-core and heterogeneous processors.The exciting possibility to reshape the storage, transaction, logging, and recovery layers of next-generation systems on emerging hardware have prompted the database research community to vigorously debate the trade-offs between specialized kernels that narrowly focus on transaction processing performance vs. designs that permit transactionally consistent data accesses from decision support and analytical workloads. In this book, we aim to classify and distill the new body of work on transaction processing that has surfaced in the last decade to navigate researchers and practitioners through this intricate research subject.

Willkommen bei den Tales Buchfreunden und -freundinnen

Jetzt zum Newsletter anmelden und tolle Angebote und Anregungen für Ihre nächste Lektüre erhalten.