Skills Matter is hosting the first annual NoSQL eXchange this November. Jim Webber has kindly agreed to be the Program Lead for this conference, and together we are working on a great day of learning and sharing of skills on NoSQL technologies.
Entries in Cassandra (86)
Acunu has just released in public beta the Acunu Storage Platform, which includes the Acunu Storage Core, a modified version of Apache Cassandra (back end modified to use the Acunu Storage Core), and an enterprise-level management stack for deploying and managing the entire system.
Berlin Buzzwords 2011 is just around the corner June 6th through the 10th. The conference which bills itself as the "Conference of High Scalability" has a rather large and impressive speaker list. In addition to the talks, there will also be several hackathon's that will be going on throughout the conference. One that was drawn to my attention was the Cassandra hackathon, which will be taking placy June 8th/9th. There is limited space so it's critical that you sign up now.
For more information visit: Cassandra Hackathon
DataStax’ Brisk is Open-Source and Available Now for Download
BURLINGAME, Calif. – May 9, 2011 – Today, DataStax, the commercial leader in Apache Cassandra™, released DataStax’ Brisk – a second-generation open-source Hadoop distribution that eliminates the key operational complexities with deploying and running Hadoop and Hive in production. Brisk is powered by Cassandra and offers a single platform containing a low-latency database for extremely high-volume web and real-time applications, while providing tightly coupled Hadoop and Hive analytics.
GeoIQ announced today at the Data 2.0 conference the availability of GeoIQ Connect.
GeoIQ Connect is the first and only geospatial product that connects to both traditional relational databases like Oracle and mySQL, and multiple NoSQL object stores like HBase and MongoDB for geospatial visualization and analysis.
Here is the list of fully supported databases:
- Oracle (Spatial & Non-spatial)
Jake Luciani, an engineer at DataStax, has a short but interesting presentation that compares a PostgreSQL deployment to a Cassandra deployment. One interesting tidbit from the presentation is that version 0.8 of Cassandra will offer CQL (Cassandra Query Language) a SQL like query language.
I personally always enjoy these types of posts. Dave Beckett of Digg has written a post describing what technologies are used to build Digg. Now Digg, as of late, has not had an easy go of things since v4 was released. You'll recall that Kevin Rose through Cassandra under the bus during that whole fiasco.
Well despite that Cassandra is listed as one of the datastore technologies used by the site. In addition, Redis makes an appearance as well. Here is what they use each data store for:
A 119-page March 2011 report from the President's Council of Advisors on Science and Technology (PCAST) titled "Designing a Digital Future" notes that the Zettabyte Age is upon us because of a "proliferation of sensors and new data sources" that are causing exponential growth in data volumes, and concludes that "every Federal agency needs to have a 'Big Data' strategy."
Yesterday DataStax announced a new product Brisk which has put a target directly on the back of Hadoop, well not so much Hadoop but rather HDFS. The Register has an interesting article discussing the mariage between two unlikely bedfellows.
Couple of interesting points/quotes from the article:
- The idea is to offer a single platform that provides both a low-latency database for "realtime" web-scale applications and the sort of heavy data analysis you get with Hadoop.
- Brisk includes both Hadoop MapReduce and Hive, letting you run epic-number-crunching jobs across commodity-hardware clusters
- DataStax promising to open source the platform under an Apache licence within 45 days.
DataStax' Brisk Provides Unified Big-Data Platform for Low-Latency Applications and Hadoop/Hive Analytics
BURLINGAME, Calif., March 23, 2011 /PRNewswire/ -- GIGAOM STRUCTURE CONF. -- Today, industry leaders in the big data community converged on New York City to discuss the best technologies for managing and harnessing ever-increasing volumes of data. At the Structure Big Data conference today, DataStax the commercial leader in Apache Cassandra™, unveiled Brisk, a new distribution that enhances the Hadoop and Hive platform with scalable low-latency data capabilities. This results in a single platform that can act as the low-latency database for extremely high-volume web and real-time applications while providing tightly coupled Hadoop and Hive analytics.
MySQL is currently hosting a poll in which they have asked respondents to answer the following question:
Which open source NoSQL databases are you using?
While there is not enough of responses, currently at 112 at the time of writing, it's still interesting to see the responses. One note about the results is that multiple databases were allowed to be selected. Here are the current results, as of 3/22 at 10:45pm, of some of the various NoSQL databases.
Constant Contact, the email marketing program has integrated social media into its capabilities by using Cassandra and Puppet, allowing Constant Contact users to easily add Facebook and Twitter links to their email marketing campaigns.
The main challenge they faced was the 10x-100x greater data volume over email with social media integration, which caused them to go to NoSQL along with Puppet and DevOps to save time and money.
Last week we had a post discussing the release of Gemini Mobile Technology's Flume-Cassandra Log Processor. Gemini Mobile have released a slide deck describing the log processor in much greater detail.
Some of the key benefits that Gemini Mobile is touting about the log processor include:
SirSatish Ambati has an interesting presentation that discusses the various JVM scale issues that occur while working with big data.
FOSTER CITY, Calif., March 3, 2011 -- Gemini Mobile Technologies ("Gemini") released a Real-Time Log Processing System based on Flume and Cassandra ("Flume-Cassandra Log Processor") as open source today. The Flume-Cassandra Log Processor enables massive volumes of production system logs to be collected and processed into graphical reports, in real-time. In addition, logs from multiple data centers can be simultaneously aggregated and analyzed in a single database. With its ability for real-time analysis at unprecedented volumes, Gemini's Flume-Cassandra Log Processor enables businesses to vastly improve both the quality and timeliness of business intelligence gained from their online operations. This dramatic scalability at low cost and small footprint is enabled by NOSQL (Not Only SQL) technology, which originates from Cloud Storage technologies at Google, Facebook, and Amazon.