We have a client with a multi-data center installation of Cassandra. While, I was in another one of our data centers working on a client cluster and attempting to run a rebuild and got a stream error.
Cassandra server is running package
cassandra30-3.0.9-1.noarch
I ran the following command;
nodetool -u <username> -pw <password> rebuild
After a little while I got the following error and was trying to determine what changed. These machines were working just fine and nothing configuration wise should have changed. The usual line that everyone says, right?
java.lang.RuntimeException: Error while rebuilding node: Stream failed at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
When I ran nodetool status shows:
Datacenter: LAS01 ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack ?N xxx.xxx.76.19 24.93 GB 256 ? 826b90b3-7d30-40c9-b92a-5b26372e7698 R1 DN xxx.xxx.76.18 22.11 GB 256 ? 85849d96-c3f9-496f-a31b-96633655fc94 R1 UN xxx.xxx.76.20 16.96 GB 256 ? c3cbe83d-5d8b-4e20-9e21-5c76f0723aa2 R1 Datacenter: LAX03 ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.107.22 16.97 GB 256 ? 9c09ac3b-1540-475a-8e41-a159019b4d6a G4 UN xxx.xxx.107.23 14.16 GB 256 ? 3a4b1145-6baa-49b0-9675-2e262d3deda0 G4 UN xxx.xxx.107.21 17.58 GB 256 ? 80a87349-ab97-4513-9985-20ebb7b96cee G4
The Cassandra cluster is in Los Angeles and Las Vegas data centers. You’ll notice that the first node in Las Vegas data center is giving a “?N” and another is showing “DN” that it’s kinda a problem. After doing some investigations it looks like the servers were rebooted during a power upgrade in the cabinet.
The issue was iptables and apf firewall got reactivated on the server and was blocked traffic. They were fine when connecting remotely, but on the local network they were blocking messages from each of the Cassandra servers. After disabling the firewalls the issue cleared up.
Running nodetool status after turning off firewall, everything looks normal:
Datacenter: LAS01 ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.76.19 24.93 GB 256 ? 826b90b3-7d30-40c9-b92a-5b26372e7698 R1 UN xxx.xxx.76.18 22.11 GB 256 ? 85849d96-c3f9-496f-a31b-96633655fc94 R1 UN xxx.xxx.76.20 16.96 GB 256 ? c3cbe83d-5d8b-4e20-9e21-5c76f0723aa2 R1 Datacenter: LAX03 ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.107.22 16.97 GB 256 ? 9c09ac3b-1540-475a-8e41-a159019b4d6a G4 UN xxx.xxx.107.23 14.16 GB 256 ? 3a4b1145-6baa-49b0-9675-2e262d3deda0 G4 UN xxx.xxx.107.21 17.58 GB 256 ? 80a87349-ab97-4513-9985-20ebb7b96cee G4
If you have more questions about Cassandra hosting.