[TensorFlow & Python] 파이썬 문장을 분해해서 보고 싶습니다.
글쓴이: HDNua / 작성시간: 토, 2016/12/17 - 1:37오후
프로그래밍에 익숙한 컴공과 대학생입니다. Python을 새로 배우면서 동시에 TensorFlow를 배우는 중입니다. "텐서플로 첫걸음 - 조르디 토레스"라는 책의 3장 "군집화" 파트를 보고 있습니다.
Python을 배운 게 며칠 안 돼서 문장을 다른 식으로 분해하는 것이 약간 곤란한데, 도움을 얻고자 글을 올립니다.
다음은 제가 분해하려는 문장입니다.
# find mean value. ## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)])
문장이 길고 복잡하여 이것을 다음과 같이 분해하고자 합니다.
print "Finding mean value..." # find mean value. ## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)]) for c in xrange(k): tf_equal = tf.equal(assignments, c) # tf_where = tf.where(tf_equal) # tf_reshape = tf.reshape(tf_where, [1, -1]) # tf_gather = tf.gather(vectors, tf_reshape) # tf_reduce_mean = tf.reduce_mean(tf_gather, reduction_indices=[1]) # tf_concat_range = tf_reduce_mean # means = tf.concat(0, tf_concat_range) #
일단 딱 봐도 tf_concat_range 근처는 잘못 분해가 되었는데, 아직 파이썬에 익숙하지 않아서 어떻게 이것을 고쳐야 원래 문장과 같게 될지 고민스럽습니다.
(의미는 설명해주지 않으셔도 괜찮습니다.)
원문은 다음과 같습니다.
https://github.com/…/ma…/chapter2-6/chapter3_kmeans.py.ipynb
# import numerical python library. import numpy as np # preparations. num_points = 8 # 2000 vectors_set = [] for i in xrange(num_points): # vectors_set appends 2000 2-dimensional vectors. if np.random.random() > 0.5: vectors_set.append([np.random.normal(0.0, 0.9), np.random.normal(0.0, 0.9)]) else: vectors_set.append([np.random.normal(3.0, 0.5), np.random.normal(1.0, 0.5)]) # use seaborn visualizer package and pandas data controller package. import matplotlib.pyplot as plot import pandas as pd import seaborn as sns # df = pd.DataFrame({"x": [v[0] for v in vectors_set], "y": [v[1] for v in vectors_set]}) # sns.lmplot("x", "y", data=df, fit_reg=False, size=6) # plot.show() # use TensorFlow. import tensorflow as tf vectors = tf.constant(vectors_set) # create constant tensor with randomly created data. k = 4 # define K as 4. centroides = tf.Variable(tf.slice(tf.random_shuffle(vectors), [0, 0], [k, -1])) # TensorFlow selects K centroides. vectors_set = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [1.1, 2.2], [3.3, 4.4], [5.5, 6.6], [7.7, 8.8]] vectors = tf.constant(vectors_set) centroides = tf.Variable([[3.0, 4.0], [5.0, 6.0], [1.0, 2.0], [5.5, 6.6]]) ## !!!!! NOTE !!!!! ## vectors is from tf.constant(vectors_set) which makes constant tensor (it means that it does not change input vector's dimension) ## tf.random_shuffle does shuffle input vectors based on first dimension. ## tf.slice removes tensor's designated parts. It makes centroides as [k, 2] dimensional matrix. # expand tensors' dimensions to make tensors 3-dimensional data, which makes it possible to operate subtraction between 2 tensors. expanded_vectors = tf.expand_dims(vectors, 0) # change 2-dimension matrix vectors(2000x2) to 3-dimension matrix(D0x2000x2). expanded_centroides = tf.expand_dims(centroides, 1) # change 2-dimension matrix centroides(4x2) to 3-dimension matrix(4xD1x2). ## !!!!! NOTE !!!!! ## tf.sub function can know how to subtract both tensors elements itself using its broadcasting feature. ## 1-dimension tensor is operated iteratively based on other tensors correspond to matching dimension. ## In this case, expanded_vectors has 3-dimension matrix which size is (D0x2000x2) and expanded_centroides's one is (4xD1x2). ## tf.sub calculates with these tensors supposing that D0 is 4 and D1 is 2000. Third dimension's size is both equal and can be calculated as-is. # find Euclidean distances. ## assignments = tf.argmin(tf.reduce_sum(tf.square(tf.sub(expanded_vectors, expanded_centroides)), 2), 0) diff = tf.sub(expanded_vectors, expanded_centroides) # subtract 2 tensors element-wise. square_diff = tf.square(diff) # get squared difference value. It is also element-wise. distances = tf.reduce_sum(square_diff, 2) # reduce dimension 2(D2); ((xc-xi)^2, (yc-yi)^2) -> ((xc-xi)^2+(yc-yi)^2) for all (4x2000) elements. assignments = tf.argmin(distances, 0) # tf.argmin returns the index with the smallest value across dimensions of a tensor. print "Finding mean value..." # find mean value. ## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)]) for c in xrange(k): tf_equal = tf.equal(assignments, c) # tf_where = tf.where(tf_equal) # tf_reshape = tf.reshape(tf_where, [1, -1]) # tf_gather = tf.gather(vectors, tf_reshape) # tf_reduce_mean = tf.reduce_mean(tf_gather, reduction_indices=[1]) # tf_concat_range = tf_reduce_mean # means = tf.concat(0, tf_concat_range) # # update_centroides = tf.assign(centroides, means) # initializations. init_op = tf.initialize_all_variables() session = tf.Session() session.run(init_op) for step in xrange(3): ## use these lines to debug each tensor variables' outputs. ## print step, "th loop time; " ## re-write this line into one line. ## _, centroid_values, assignment_values, ## vs, cs, evs, ecs, out_diff, out_sdiff, out_dist = ## session.run([update_centroides, centroides, assignments, ## vectors, centroides, expanded_vectors, expanded_centroides, diff, square_diff, distances]) ## ================================================= ''' print ">>>>>>>>>>>>>>>>>>>>>>>>" print "vectors: " print vs print ">>>>>>>>>>>>>>>>>>>>>>>>" print "centors: " print cs print ">>>>>>>>>>>>>>>>>>>>>>>>" print "expanded_vectors: " print evs print ">>>>>>>>>>>>>>>>>>>>>>>>" print "expanded_centors: " print ecs print ">>>>>>>>>>>>>>>>>>>>>>>>" print "diff: " print out_diff print ">>>>>>>>>>>>>>>>>>>>>>>>" print "square_diff: " print out_sdiff print ">>>>>>>>>>>>>>>>>>>>>>>>" print "distances: " print out_dist ''' ## ================================================= print ">>>>>>>>>>>>>>>>>>>>>>>>" print centroid_values print ">>>>>>>>>>>>>>>>>>>>>>>>" print assignment_values print "" # executions. data = {"x": [], "y": [], "cluster": []} for i in xrange(len(assignment_values)): data["x"].append(vectors_set[i][0]) data["y"].append(vectors_set[i][1]) data["cluster"].append(assignment_values[i]) # end of program. df = pd.DataFrame(data) sns.lmplot("x", "y", data=df, fit_reg=False, size=6, hue="cluster", legend=False) plot.show()
읽어주셔서 감사합니다.
Forums:
대충 해봤습니다.
저는 TensorFlow는 잘 몰라서 시맨틱을 전혀 알 수가 없으니 임시 변수 이름은 그냥 대충 지었습니다.
의미에 맞게 바꿔서 사용하세요.
답변 감사합니다.
답변 달아주셔서 감사합니다. 해볼게요.
저는 이렇게 생각했습니다.
댓글 달기