Java锁性能分析 | 褚哥说|

文章目录

Java的锁是多线程编程中必须理解的概念，其synchronized关键字本质上就是一个互斥锁的实现。

本文比较在Java中使用锁和使用atomic包中类型在多线程环境下的性能差异，并探讨JVM获取锁的一般步骤。

假设要对一个整数做++操作1000000次。在使用锁（本文使用的是synchronized关键字，也可以用Java中其他的锁Reentrantlock，Readwritelock等）的情况下，可以这样实现。先新建一个MyInt类，表示并发情况下的有数据竞争的对象，该对象需要加锁，代码如下，

package problem1;
public class MyInt {
    private int intValue;
    public MyInt(int _intValue) {
        intValue = _intValue;
    }
    public void inc(){
    	synchronized(this) {
    	    intValue ++;
    	}
    }
    public boolean incWithLimit(int limit){
    	synchronized(this) {
			if(intValue < limit){
			  intValue ++;
			  return true;
			}
			return false;
		}
    }
	public int getIntValue() {
		return intValue;
	}
	public void setIntValue(int intValue) {
		this.intValue = intValue;
	}
}

注意，inc方法，虽然只有简单的一步++操作，也需要加锁，因为这个在JVM中是分成2步来做的，先读再做+1运算，即与x=x+1这样的代码等价。因为要限制inc的上限1000000次，实际使用的是incWithLimit方法。还要有一个类，用于包装MyInt，并设置上限，代码如下，

package problem1;
public class IncWithLock implements Runnable {
    private MyInt intValue;
	private final int maxM;
	private int count;
    public IncWithLock(MyInt _intValue) {
        intValue = _intValue;
        maxM     = 1000000;
        count    = 0;
      }
    public IncWithLock(MyInt _intValue, int _maxM) {
        intValue = _intValue;
        maxM     = _maxM;
        count    = 0;
      }
    @Override
    public void run() {
//	  	while(true) {
//		  synchronized(intValue) {
//			if(intValue.getIntValue() < maxM){
//			  intValue.inc();
//			}else{
//			  break;
//			}
//		  }
//		}
    	while(intValue.incWithLimit(maxM)) {count++;}
//	  	String threadName = Thread.currentThread().getName();
//	  	System.out.println("Thread ["+threadName+"]'s value: "
//	  					+intValue.getIntValue()+"|"+count);
    }
    public int getIntValue() {
		return intValue.getIntValue();
	}
	public void setIntValue(MyInt intValue) {
		this.intValue = intValue;
	}
	public int getMaxM() {
		return maxM;
	}
}

以上代码中，注释掉的while(true)那一段，是使用inc方法来实现的，测试中也是可用的。最后是main方法的代码，有了之前的准备，main的就很简单了，

package problem1;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Problem1 {
    public static void main(String[] args) {
        int numN = 3;    // could be 3,30,300,1000
        MyInt intVal = new MyInt(0);
        ExecutorService exec = Executors.newFixedThreadPool(numN);
        final long start = System.currentTimeMillis();
        for(int i = 0; i < numN; i++) {
            exec.execute(new IncWithLock(intVal));
        }
        while(!exec.isTerminated()) {
            exec.shutdown();
        }
        long t = System.currentTimeMillis()-start;
        System.out.println("Use ["+t+"] ms to get "+intVal.getIntValue());
        //System.out.println(t);
    }
}

以上代码中，使用ExecutorService实现线程池，这样比较便于控制线程个数；intVal是有数据竞争的数据，将被N（N分别取值3,30,300,1000）个线程竞争；最后在全部分支线程执行完后，打印执行所需的时间。

上述的是有锁情况下的代码，使用atomic包中的AtomicInteger可以完全避免使用锁，AtomicInteger使用了compareAndSet方法，在一个指令中完成比对原值并设置新值，因此无需使用锁。MyInt的代码如下，

package problem2;
import java.util.concurrent.atomic.AtomicInteger;
public class MyInt {
    private AtomicInteger intValue;
    public MyInt(AtomicInteger _intValue) {
        intValue = _intValue;
    }
    public void inc(){
    	int curr = intValue.get();
    	intValue.compareAndSet(curr, curr+1);
    }
    public boolean incWithLimit(int limit){
    	int curr = intValue.get();
    	while (curr < limit) {
    	  if(intValue.compareAndSet(curr, curr+1)){
    	    return true;
    	  }else{
    		curr = intValue.get();
    	  }
    	}
    	return false;
    }
	public AtomicInteger getIntValue() {
		return intValue;
	}
	public void setIntValue(AtomicInteger intValue) {
		this.intValue = intValue;
	}
}

无锁版本的MyInt与有锁版本的有一些差异。首先，对象不再是Integer类型，而是AtomicInteger类型；其次，没有再使用锁，synchronized关键字没有出现。由于很多实现已经包装在MyInt中，MyInt的包装类IncWithoutLock与有锁版本的IncWithLock差异很小，如下，

package problem2;
public class IncWithoutLock implements Runnable{
    private MyInt intValue;
	private final int maxM;
	private int count;
    public IncWithoutLock(MyInt _intValue) {
        intValue = _intValue;
        maxM     = 1000000;
        count    = 0;
      }
    public IncWithoutLock(MyInt _intValue, int _maxM) {
        intValue = _intValue;
        maxM     = _maxM;
        count    = 0;
      }
    @Override
    public void run() {
    	// should not use following code,
    	// since get() and inc() are seperated, may cause data racing
//	  	while(true) {
//			if(intValue.getIntValue().get() < maxM){
//			  intValue.inc();
//			}else{
//			  break;
//			}
//		}
    	while(intValue.incWithLimit(maxM)) {count++;}
//	  	String threadName = Thread.currentThread().getName();
//	  	System.out.println("Thread ["+threadName+"]'s value: "
//	  					+intValue.getIntValue()+"|"+count);
    }
    public Integer getIntValue() {
		return intValue.getIntValue().get();
	}
	public void setIntValue(MyInt intValue) {
		this.intValue = intValue;
	}
	public int getMaxM() {
		return maxM;
	}
}

注意，注释掉的while(true)那段代码，在无锁版本下是有可能会出错的，因为get()和inc()方法虽然本身都是线程安全的，但是它们被分别调用，其间可能插入其他的操作，因此，这段代码可能会有数据竞争的风险。所以，对于无锁版本，使用incWithLimit方法是唯一选择。main方法就几乎完全相同了，如下，

package problem2;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
public class Problem2 {
    public static void main(String[] args) {
        int numN = 3;      // could be 3,30,300,1000
        MyInt intVal = new MyInt(new AtomicInteger(0));
        ExecutorService exec = Executors.newFixedThreadPool(numN);
        final long start = System.currentTimeMillis();
        for(int i = 0; i < numN; i++) {
            exec.execute(new IncWithoutLock(intVal));
        }
        while(!exec.isTerminated()) {
            exec.shutdown();
        }
        long t = System.currentTimeMillis()-start;
        System.out.println("Use ["+t+"] ms to get "+intVal.getIntValue());
        //System.out.println(t);
    }
}

运行上述两个版本的程序，分别取线程数为3,30,300,1000，在我的机器上执行耗时（每一种情况执行5次取平均值）如下，

线程数	有锁（ms）	无锁（ms）
3	78.8	59.8
30	77.6	62.4
300	184.2	179.4
1000	2186.0	2077.6

可见，无锁版本的程序始终要比有锁的略快一些，在线程较少（<=CPU个数）的时候，尤其明显。同时，也可以看到，线程并不是越多越好，过多的线程本身就会有很多线程消耗，另外引起的锁竞争现象也会愈发明显。

以上是有锁和无锁情况下的程序性能对比。下面探讨JVM获取锁的一般步骤，主要的步骤是：偏向锁->轻量级锁->自旋锁->重量级锁。在执行过程中，还有执行期间的锁粗化和锁消除等优化。

首先说一下重量级锁，重量级锁类似于操作系统中的锁操作，不过是JVM的对象监视器Monitor实现的。当线程需要竞争某个资源的时候，先进入竞争队列，如果竞争不到锁，本线程就在竞争队列中等待下一次竞争锁；如果竞争到资源，就成为这个资源的Owner，并与其他获取资源的线程互斥；如果调用了wait，说明线程被阻塞，等待阻塞被notify，然后重新进入竞争队列。以上的竞争队列都是利用CAS实现的的无锁队列。由于需要维护多个队列，重量锁的性能消耗实际上是比较大的，但是也更加保险。

既然重量级锁资源消耗比较多，那么稍微轻量级一些的自旋锁就避免了维护多个队列的情况。自旋锁的实现原理是，在竞争不到对象的时候，线程空转几周，即忙等，然后重新尝试获取锁。如果另一个线程对锁的占用时间很短，那么第二次的尝试，很有可能就能拿到锁。过程中，没有复杂的数据结构，因此，在第二次可以拿到锁的情况下，性能要好于重量级锁。但是，对于自旋后仍然不能获取到锁的情况，就把任务交给重量级锁，这样，比普通的重量级锁要多出几次自旋的消耗。为了避免这样的问题，JDK 1.6中引入了自适应自旋锁，对第二次尝试的间隔时间做了自适应处理，比如，默认情况下自旋10次，但是这个对象被占用的时间较长，那么JDK会逐渐延长这个自适应的时间，比如100，发现获取到对象的成功率提高了，那么对这个对象就会间隔100次自旋再去获取。如果无论怎么调整自旋的次数获取到锁的概率都很低，那么就跳过获取自旋锁。

自旋锁的问题是每次仍然要去获取锁然后再进行对象的读写，轻量级锁用CAS操作进一步优化了锁获取。CAS（compareAndSet）在一个指令内完成了对象的读和写，轻量级锁在需要对象资源的时候，先去用CAS读写对象头的轻量级锁标志位，如果成功，说明对象没有被其他线程占用，那么本线程占有了这个资源，并设置本线程为对象的Owner和对象头指向本线程的栈帧，即线程栈和对象互相有指针指向对方，然后执行同步块；如果不成功，检查是否对象头指向本线程，如果是，那么说明本线程已经占有了这个对象，是可重入的锁，继续执行同步块；否则，说明该对象已经被其他线程占有，轻量级锁获取不成功，升级到上一级自旋锁来处理。

轻量级锁无论在有没有竞争的时候，都会需要CAS操作，并且在CAS失败之后，还需要检查是否可重入，偏向锁减少了这两部分的消耗。在获取竞争对象的时候，偏向锁会偏向第一个获取到这个锁的线程。第一个线程在首次获取到偏向锁的时候，将对象头设置为偏向锁模式，并写入本线程的线程号，在下次获取锁的时候，如果仍然是本线程的请求，那么本线程直接获取到竞争对象，无需做同步，也无需CAS操作；如果是另一个线程的请求，因为线程号和之前不同，偏向锁获取不成功，升级到轻量级锁模式来处理。偏向锁的使用也是自适应的，当JVM发现每次偏向锁的请求多数都会发生锁升级，那么就会禁止偏向锁。

由上述分析可以发现，越轻量级的锁，违反锁成立的条件也越容易达到。重量级锁，无法违反；自旋锁，第二次请求获取（自旋）失败即违反；轻量级锁，第一次请求获取对象失败即违反；偏向锁，有第二个线程请求资源即违反。每一次违反，都需要将锁升级到更重量级一级的锁来处理，这个步骤也叫做锁膨胀。

除去以上的各种锁，JVM还引入了锁消除和锁粗化技术，这两个技术都用到了JVM的即时编译器。锁消除是，在程序运行过程中，发现一些在同步块中的对象，不可能被外部的其他线程使用，那么就将这部分对象的代码移动到同步块外部执行，减少同步块的代码量。判定无法被其他线程使用的依据主要是来自逃逸分析，检查堆上的数据是否会逃逸出去被其他线程访问，该技术仍然在发展中。锁粗化是，在一系列连续操作中，发现对某个对象反复的加锁，因为加锁本身也有性能消耗，所以只加一次锁，把中间的代码都放入到同步块内，会减少加锁的次数，从而提高性能。锁粗化的思路和锁消除某种程度上是相反的。

本文主要参考了以下文章¹，²，³，⁴。