Sunday, July 4, 2010

Best Face Cleanser Salicylic Acid

Crystal Clear Alternatives

A new generation of database software for high-performance and low cost is emerging very quickly coming to challenge the dominance of SQL in distributed processes and applications with large volumes of data. Some companies already replaced the high functionality of SQL with these new options that allow them to create, work and manage huge data sets. NoSQL
is that different implementations of web and business applications of cloud computing

have different requirements for their databases. Not every application requires a strict data consistency, for example.
addition, when an application uses distributed data in hundreds or even thousands of servers, the numbers (money issue) point to the use of server software at no cost rather than paying licenses per processor. After resolving the issue of licensing costs, can be scaled horizontally with commercial hardware or opt for cloud computing services and avoid a big payout of entry. The previous tools do not always facilitate this. Challenges to the hegemony of SQL originate specialized products built from scratch for large-scale analysis and storage of documents as well as for the construction of systems that require high availability rather than consistency when it comes to partition data.
Applications such as those of online transaction processing, business intelligence, customer relationship management, document processing and social networks do not have the same needs for information, questions or types of indexes, or have equivalent requirements of consistency , scalability and security.

For example, applications
BI (business intelligence)
running queries for analysis and decision- decisions that can take advantage of bitmap indices for operations gigas databases or terabytes in size. Web analysis, drug discovery, financial modeling and other similar applications turn toward distributed systems for their efficient processing of data sets gigas or balcony.

OLTP stands for reliability. And social networking applications like Facebook and Amazon.com have adopted properties BASE (basically available, flexible, eventually consistent) above the known ACID (atomicity, consistency, isolation, durability) to support their communities mass million Web users. These differences are one reason why no NoSQL databases relational databases focused on vertical storage documents and have gained strength. They are more like specialized tools instead of Swiss Army Knives with SQL platform functionality. systems architects should consider the characteristics and specialized functions needed by an application when choosing a database. NoSQL The database can be built specifically for functions such as BI, OLTP, CRM

, social networks and data warehousing, and include features such as scalability, partitioning, security and versatility.

Scalability and High Availability to cloud computing and web sites with high volumes of data such as eBay, Amazon, Twitter and Facebook, scalability and high availability are essential. In fact, they are the reason why in the distributed databases have relaxed the requirements for consistency.

systems operating in high availability environments must survive failure of software, hardware and network, and be ready to climb despite the unpredictable demand for computing resource. One approach to building systems is the use of distributed database with a shared-nothing architecture

and horizontal partitioning

. Elasticity and fragmentation (partitioning) - both caracteristics NoSQL - are horizontal scaling solutions that provide availability and processing of large volumes of data. A variety of data stores are gaining popularity for creating web site applications scalable and resilient environments such as public or private tag. The stores distributed key-value are great when you do not need to apply SQL rules, stiff consistency, complex queries, integrated queuing or the ability to run database operations that exceed the available RAM. New data stores offer low-latency scalability Applications that do not require elaborate consultations and analytical skills. Amazon SimpleDB and Google has developed Bigtable. Other low-latency options open source include Cassandra, Hypertable, MongoDB, Project Voldemort, Redis, Tokyo Tyrant and Dynamo database used to Amazon until March V3 contained 102 billion objects.


Options Bigtable
Google has developed to distribute data across thousands of servers and scale to data sets in the order of peta bytes. Applications such as web indexing, Google Earth, Google Maps, Blogger, YouTube and Gmail use it. The collection of 100 million videos on YouTube requires 600 TB of space. Bigtable is owner, but the data model exists in open source implementations Hypertable, Cassandra and HBase. Bigtable can be used as input or output of MapReduce, which allows distributed processing of files or databases using reduction mappings or functions.

Dynamo was created to provide a data store key-value high-availability, change without losing the data due to server failures or network problems. Amazon SimpleDB later built as a key-value store available to customers of Amazon Web Services. SimpleDB is limited to no more than 256 attributes of name-value pairs, domains larger than 10 GB and databases of no more than 1 TB. Amazon said that copies of the data is updated in a second to maintain consistency. SimpleDB uses a query language like SQL. Project

Voldemort, an open source clone of Amazon's Dynamo is a data warehouse that supports key-value versions eventual consistency (where the database sometimes returns the wrong answer in order to maintain its size), and Automatic partitioning and replication. Keys and values \u200b\u200bcan be complex objects such as maps or lists. Voldemort Project supports the construction of distributed data stores in shutdown mode. LinkedIn developers created it, and sites like Lookery is used. Cassandra

integrates the data model Bigtable distributed design with Dynamo. Provides eventual consistency, not rigid consistency that e-commerce transactions and transactions in the stock required. Instead of storing the data in sequence row-major or column-major, Cassandra uses the order inspired by Bigtable ColumnFamily.

Cassandra is geographically distributed across multiple data centers, as are the areas of availability of Amazon EC2. Bulk can be made with Hadoop.


Climbing Cost of

SimpleGeo, a provider of geographic data, Cassandra uses Apache, Open Source NoSQL to avoid licensing costs managers of commercial databases as part of its effort to scale to an architecture of multi-database.
"Run a cluster of 50 nodes, covering three data centers Amazon EC2 service paying about $ 10,000 a month," says chief technology officer of the company Joe Stump, who previously used Cassandra on Digg. "In contrast, the premium support of MySQL would cost about USD $ 5.000 per node per year, or $ 250,000 for every year - more than double by the implementation of Cassandra" Added Stump, "and Microsoft SQL Server can cost up to $ 55,000 per processor a year. "

"USD $ 10.000 an operating expense is the opposite of spending older, and that is 'a nice little tax,' "he says.

Cassandra provides availability and scalability to a number of well known sites, including large communities of Twitter and Facebook. When the number of Twitter users off, migrated a combination MySQL MySQL / memcached running over 45 nodes Cassandra. This mixed environment is now responsible for 50 million tweets per day. Facebook adds about 60 million photos a week using Cassandra. In Digg, Cassandra manages about 3 TB of information.

Digg announced with great fanfare its move from MySQL to Cassandra. The main reason for moving Digg platform was "the problematic he turned to the application had a high performance when writing (in the database) became intense in a data set that is growing rapidly, and which does not look so, "said John Quinn, vice president of engineering Digg. Growth Digg forced to take strategies horizontal and vertical partitioning that eliminated most of the concepts of a relational database, and yet there was overload, "says Quinn.

"Our system is growing rapidly and needs to be provided with performance and redundancy with multiple data centers and to add capacity or replace faulty nodes immediately. As for the consistency of information Digg engineers can implement application-level controls more efficiently with MySQL Cassandra, "said Quinn.

Tokyo Tyrant is a database server open source, accompanied by a text search engine, which is tracking NoSQL community. It is a database of key-value with a hash index structure and b-tree, able to insert 1 million records in 0.4 seconds and run record 58.000 queries per second. Supports asynchronous replication and transaction processing
ACID properties and transaction log
premature. Can be used with various programming languages, including Perl, Java, Ruby and PHP. Deployments Products include Scribd and Mixi, the Japanese equivalent of Facebook. Tokyo Tyrant LightCloud modified in a distributed database by adding a layer horizontally scalable universal hash. The daily social Plurk LightCloud use this option Tokyo Tyrant.

Database Stores Records MongoDB CouchDB and are examples of databases of documents JSON class, while a large number of products that store encrypted documents in XML format. MongoDB is a popular product architecture based on the database client-server b-tree indexes and communication over TCP / IP.

MongoDB JSON object manages collections and provides scalability

fragmentation and replication. Consultations are JSON objects in addition to providing geospatial search also in 2-D. There are APIs for various languages \u200b\u200bincluding controllers JavaScript, Java, Perl, PHP, Python, Ruby and C + +. Among the products they use MongoDB implementations include Justin.tv, The New York Times, Disqus, Electronic Arts and Business Insider.

CouchDB is a data warehouse that has no outline and provides a REST-style API for CRUD (create, retrieve, update, delete) on documents. CouchDB can make recoveries using key values \u200b\u200band can operate with Hadoop MapReduce for trivial queries. You may also generate views using JavaScript. Creating a vision can take time, but subsequent queries that are used are very fast. CouchDB supports multimaster replication and distribution of data across multiple instances. Manage documents in JSON format and uses the SpiderMonkey JavaScript engine and it is very appropriate for web applications, as Erlang, HTTP, JavaScript, PHP, Python and Ruby.

are preferred for applications where XML documents and XQuery on documents and queries JSON in JavaScript, there are a number of open source products and commercial. In addition to the XML document repositories, there are dozens of XQuery processors. The list includes Apache Xindice, Berkeley DB, eXist-db, IBM DB2, MonetDB, Mark Logic, Sedna, Tamino and TigerLogic XDMS WebMethods. Distributed Processing



When it comes to distributed processing of massive data sets, Hadoop MapReduce has become the ultimate technology. Researchers at Yahoo, for example, used it in 3.800 nodes to order a petabyte of information in 16.25.

Google MapReduce recently developed and patented. The mapping function produces a list of key-value pairs that MapReduce makes a list of values.

Apache Hadoop Project includes the Hadoop Distributed File System (HDFS), MapReduce, Database HBase, Pig analytical language, the query and analysis tool Hive, among others. HBase is a vertical storage (per column) distributed, modeled after Google Bigtable that can serve as input or output for MapReduce.

HBase is one of the many stores competing vertical market analysis and business intelligence. Store tables in order to column-major provides substantial performance improvements over the tables stored in row-major. Benefits such as improved location and performance of the cache makes the reading performance improves, but the write performance is poor. Other stores include vertical Sybase IQ, Vertica and CStore which is a collaboration among several universities and is open source.

The growing interest in semantic search and related information has been in the spotlight to triple stores
RDF (Resource Description Framework)
. These options include AllegroGraph, BigDate, Garlik, Jena, Big-OWLIM Ontotext, OpenLink Virtuoso, Oracle 11g and Sesame. Several of these have been deployed to Amazon EC2 to exploit the distributed processing power of the cloud. Raytheon BBN researchers also used to create a Hadoop MapReduce distributed RDF store that supports SPARQL query processing

. Restrictions and Best Practices To ensure durability and integrity of information, provide SQL databases and transaction log replication. NoSQL options need something similar. Cassandra, for example, supports both. Tokyo Cabinet and support
HBase
register early. Tokyo Cabinet and CouchDB supports master-master replication, while MongoDB supports master-slave replication and replica pairs.

employing architects oriented databases must deal with documents and store each type of document and whether to have or not a database for each type. Instead of separate databases can include an attribute that specifies the type or use libraries. The new generation of stores data is intended to serve the needs of availability and scalability, although certain restrictions apply to achieve greater efficiency. With Amazon SimpleDB, for example, the maximum time that can last a query is 5 seconds. If the query takes longer, SimpleDB returns a partial result and the application should make further consultations to complement it. SimpleDB restricts the result of a query to a maximum of 250 items, while Google recently increased the maximum result of a data store query AppEngine to 1,000 items.

In horizontally partitioned systems queries that need to cross-fragmented joins are expensive, hence the design partitioning algorithms require skill and knowledge in data usage patterns. When you need complex queries with aggregation, NoSQL databases are not a good option, but can be a source of data for separate solutions in charge of analysis. Organizations that use data warehouses key-value capabilities sometimes need indexing and SQL query. May use other programs that support indexing and query as Apache Lucene. Whether your organization uses SQL databases or NoSQL is a good idea to use version control and separate databases for testing and production.

For all areas that address the options NoSQL, we are still left with the question of which database software database taken. The answer depends on basic things: how much and what type of data is stored? Will it be used for complex queries? How many concurrent users will you have? "Climbing the database as increase the number of users and information? SQL or NoSQL, is first defined. Via


Dr. Dobbs


0 comments:

Post a Comment