[TensorFlow & Python] 파이썬 문장을 분해해서 보고 싶습니다.
글쓴이: HDNua / 작성시간: 토, 2016/12/17 - 1:37오후
프로그래밍에 익숙한 컴공과 대학생입니다. Python을 새로 배우면서 동시에 TensorFlow를 배우는 중입니다. "텐서플로 첫걸음 - 조르디 토레스"라는 책의 3장 "군집화" 파트를 보고 있습니다.
Python을 배운 게 며칠 안 돼서 문장을 다른 식으로 분해하는 것이 약간 곤란한데, 도움을 얻고자 글을 올립니다.
다음은 제가 분해하려는 문장입니다.
# find mean value. ## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)])
문장이 길고 복잡하여 이것을 다음과 같이 분해하고자 합니다.
print "Finding mean value..." # find mean value. ## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)]) for c in xrange(k): tf_equal = tf.equal(assignments, c) # tf_where = tf.where(tf_equal) # tf_reshape = tf.reshape(tf_where, [1, -1]) # tf_gather = tf.gather(vectors, tf_reshape) # tf_reduce_mean = tf.reduce_mean(tf_gather, reduction_indices=[1]) # tf_concat_range = tf_reduce_mean # means = tf.concat(0, tf_concat_range) #
일단 딱 봐도 tf_concat_range 근처는 잘못 분해가 되었는데, 아직 파이썬에 익숙하지 않아서 어떻게 이것을 고쳐야 원래 문장과 같게 될지 고민스럽습니다.
(의미는 설명해주지 않으셔도 괜찮습니다.)
원문은 다음과 같습니다.
https://github.com/…/ma…/chapter2-6/chapter3_kmeans.py.ipynb
# import numerical python library.
import numpy as np
# preparations.
num_points = 8 # 2000
vectors_set = []
for i in xrange(num_points): # vectors_set appends 2000 2-dimensional vectors.
if np.random.random() > 0.5:
vectors_set.append([np.random.normal(0.0, 0.9), np.random.normal(0.0, 0.9)])
else:
vectors_set.append([np.random.normal(3.0, 0.5), np.random.normal(1.0, 0.5)])
# use seaborn visualizer package and pandas data controller package.
import matplotlib.pyplot as plot
import pandas as pd
import seaborn as sns
# df = pd.DataFrame({"x": [v[0] for v in vectors_set], "y": [v[1] for v in vectors_set]})
# sns.lmplot("x", "y", data=df, fit_reg=False, size=6)
# plot.show()
# use TensorFlow.
import tensorflow as tf
vectors = tf.constant(vectors_set) # create constant tensor with randomly created data.
k = 4 # define K as 4.
centroides = tf.Variable(tf.slice(tf.random_shuffle(vectors), [0, 0], [k, -1])) # TensorFlow selects K centroides.
vectors_set = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [1.1, 2.2], [3.3, 4.4], [5.5, 6.6], [7.7, 8.8]]
vectors = tf.constant(vectors_set)
centroides = tf.Variable([[3.0, 4.0], [5.0, 6.0], [1.0, 2.0], [5.5, 6.6]])
## !!!!! NOTE !!!!!
## vectors is from tf.constant(vectors_set) which makes constant tensor (it means that it does not change input vector's dimension)
## tf.random_shuffle does shuffle input vectors based on first dimension.
## tf.slice removes tensor's designated parts. It makes centroides as [k, 2] dimensional matrix.
# expand tensors' dimensions to make tensors 3-dimensional data, which makes it possible to operate subtraction between 2 tensors.
expanded_vectors = tf.expand_dims(vectors, 0) # change 2-dimension matrix vectors(2000x2) to 3-dimension matrix(D0x2000x2).
expanded_centroides = tf.expand_dims(centroides, 1) # change 2-dimension matrix centroides(4x2) to 3-dimension matrix(4xD1x2).
## !!!!! NOTE !!!!!
## tf.sub function can know how to subtract both tensors elements itself using its broadcasting feature.
## 1-dimension tensor is operated iteratively based on other tensors correspond to matching dimension.
## In this case, expanded_vectors has 3-dimension matrix which size is (D0x2000x2) and expanded_centroides's one is (4xD1x2).
## tf.sub calculates with these tensors supposing that D0 is 4 and D1 is 2000. Third dimension's size is both equal and can be calculated as-is.
# find Euclidean distances.
## assignments = tf.argmin(tf.reduce_sum(tf.square(tf.sub(expanded_vectors, expanded_centroides)), 2), 0)
diff = tf.sub(expanded_vectors, expanded_centroides) # subtract 2 tensors element-wise.
square_diff = tf.square(diff) # get squared difference value. It is also element-wise.
distances = tf.reduce_sum(square_diff, 2) # reduce dimension 2(D2); ((xc-xi)^2, (yc-yi)^2) -> ((xc-xi)^2+(yc-yi)^2) for all (4x2000) elements.
assignments = tf.argmin(distances, 0) # tf.argmin returns the index with the smallest value across dimensions of a tensor.
print "Finding mean value..."
# find mean value.
## means = tf.concat(0, [tf.reduce_mean(tf.gather(vectors, tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]) for c in xrange(k)])
for c in xrange(k):
tf_equal = tf.equal(assignments, c) #
tf_where = tf.where(tf_equal) #
tf_reshape = tf.reshape(tf_where, [1, -1]) #
tf_gather = tf.gather(vectors, tf_reshape) #
tf_reduce_mean = tf.reduce_mean(tf_gather, reduction_indices=[1]) #
tf_concat_range = tf_reduce_mean #
means = tf.concat(0, tf_concat_range) #
#
update_centroides = tf.assign(centroides, means)
# initializations.
init_op = tf.initialize_all_variables()
session = tf.Session()
session.run(init_op)
for step in xrange(3):
## use these lines to debug each tensor variables' outputs.
## print step, "th loop time; "
## re-write this line into one line.
## _, centroid_values, assignment_values,
## vs, cs, evs, ecs, out_diff, out_sdiff, out_dist =
## session.run([update_centroides, centroides, assignments,
## vectors, centroides, expanded_vectors, expanded_centroides, diff, square_diff, distances])
## =================================================
'''
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "vectors: "
print vs
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "centors: "
print cs
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "expanded_vectors: "
print evs
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "expanded_centors: "
print ecs
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "diff: "
print out_diff
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "square_diff: "
print out_sdiff
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print "distances: "
print out_dist
'''
## =================================================
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print centroid_values
print ">>>>>>>>>>>>>>>>>>>>>>>>"
print assignment_values
print ""
# executions.
data = {"x": [], "y": [], "cluster": []}
for i in xrange(len(assignment_values)):
data["x"].append(vectors_set[i][0])
data["y"].append(vectors_set[i][1])
data["cluster"].append(assignment_values[i])
# end of program.
df = pd.DataFrame(data)
sns.lmplot("x", "y", data=df, fit_reg=False, size=6, hue="cluster", legend=False)
plot.show()읽어주셔서 감사합니다.
Forums:


대충 해봤습니다.
저는 TensorFlow는 잘 몰라서 시맨틱을 전혀 알 수가 없으니 임시 변수 이름은 그냥 대충 지었습니다.
의미에 맞게 바꿔서 사용하세요.
답변 감사합니다.
답변 달아주셔서 감사합니다. 해볼게요.
저는 이렇게 생각했습니다.
댓글 달기